System Recovery and Maintenance
How is the Data Reconciliation inside senhasegura
senhasegura uses the MariaDB Galera Cluster as database high-availability cluster technology. This section will explain the unavailable scenarios and the steps to recovery safely.
Usually, in scenarios of temporary interruption of data replication between cluster nodes with standard configurations, there is a tolerance of approximately 3 hours of disruption in which the cluster only needs an Incremental State Transfer (IST) to resolve the reconciliation, just sending the incremental data. In this case, no intervention is necessary since the cluster automatically solves the reconciliation problem.
More prolonged outages usually require a complete data transfer, State Snapshot Transfers (SST).
In most cases, the senhasegura cluster is resilient and intelligent enough to resolve the reconciliation by performing an SST automatically.
Manual intervention to perform an SST in the cluster
First check the syncronization status login the database and verifying the following variables control:
sudo orbit cluster status
Stop the MariaDB process;
sudo systemctl stop mariadb.service
Disable the replication changing the
galera.cnf
configuration file;Edit the
/etc/mysql/conf.d/galera.cnf
configuration file;Locate the
wsrep_on
parameter and change it value toOFF
Save the file and exit the editor;
Delete the old cluster control files;
sudo rm /var/lib/mysql/galera.cache
;sudo rm /var/lib/mysql/grastate.dat
;sudo rm /var/lib/mysql/multimaster.info
;Start the MariaDB process;
sudo systemctl start mariadb.service
First steps at the secondary node (other members)
Stop the MariaDB process;
sudo systemctl stop mariadb.service
Rename the current database data folder for a backup purpose;
sudo mv /var/lib/mysql /var/lib/mysql-$(date +%d%m%y%H%M)
Create a new database data folder;
sudo install -d /var/lib/mysql -o mysql -g mysql
Second steps at primary node (Primary Member)
Stop the MariaDB process;
sudo systemctl stop mariadb.service
Enable the replication:
Edit the
/etc/mysql/conf.d/galera.cnf
configuration file;Locate the
wsrep_on
parameter and change it value toON
Save the file and exit the editor;
Into another terminal, keep your attention to the database logs:
sudo tailf /var/log/mysql/mysql-error.log
Recreate the cluster;
sudo galera_new_cluster
Wait for the complete initialization;
Second steps at secondary node (other members)
Confirm that the replication is enabled at the
galera.cnf
configuration file;Edit the
/etc/mysql/conf.d/galera.cnf
configuration file;Locate the
wsrep_on
parameter and change it value toON
Save the file and exit the editor;
Into another terminal, keep your attention to the database logs:
sudo tailf /var/log/mysql/mysql-error.log
Start the MariaDB process;
sudo systemctl start mariadb.service
Check if the number of cluster members are correct at database log (E.g.: if there is 2 members, the message
members = 2/2(joined/total)
should be printed);Check if the sync confirmation appears
WSREP: Member 0.0 (vsrv-senhasegura-cert05) synced with group.
Application status and services
All the services used by senhasegura platform can be managed by orbit
command line.
Restarting primary instance
A primary instance is an instance that centers all services execution. And also used as a primary member of the cluster schema.
You can check how the instance is configured using the orbit status
command.
To switch a instance to primary and activate it usage, use the following command sequence to grant a correctly usage:
sudo orbit application stop
;sudo orbit application master
;sudo orbit application start
;sudo orbit proxy fajita restart
;sudo orbit proxy rdpgate restart
;
The orbit application stop
and orbit application start
will also restart the basic web server services NGINX and PHP-FPM.
Restarting Linux services
All services can be restarted using the orbit
command interface.
Use the sudo orbit service
command to restart a linux service.
Keep a close attention to the following services status. You can restart it by yourself if an unexpected service stop happens.
nginx: Web server service. If restarted, restart php-fpm service also;
php-fpm: PHP Wrapper service;
mariadb: Database service;
docker: Proxy isolation service;
wazuh-manager: HIDS service;
Host-Based Intrusion Detection Systems (HIDS) IP blocked
If Host-Based Intrusion Detection Systems block an IP, you can unblock the IP using the command orbit firewall
.
sudo orbit firewall –show
sudo orbit firewall unblock –host=[blocked IP]
Restarting cluster environment
Into a cluster environment you should restart or shutdown instances into the right order to avoid problems.
Use the sudo orbit shutdown
into cluster members, one instance at time, waiting for the complete shutdown to start the process into another member.
This way, the available cluster members will understand that members are going down. Keep the primary node the last one to be shut down and the first to turn on again.
Orbini services and task execution
Orbini services is the senhasegura abstraction layer for services executed by senhasegura modules.
You can control its execution into the menu Settings ➔ Execution processes ➔ Processes.
Every process has an execution timeout configuration, and sometimes multiple processes can be accumulated waiting to be executed.
To understand why the oldest process is stuck on the task list, execute the process manually.
sudo orbit execution --code ID --verbose --debug