Version: 3.25

System Recovery and Maintenance

How is the Data Reconciliation inside senhasegura

senhasegura uses the MariaDB Galera Cluster as database high-availability cluster technology. This section will explain the unavailable scenarios and the steps to recovery safely.

Usually, in scenarios of temporary interruption of data replication between cluster nodes with standard configurations, there is a tolerance of approximately 3 hours of disruption in which the cluster only needs an Incremental State Transfer (IST) to resolve the reconciliation, just sending the incremental data. In this case, no intervention is necessary since the cluster automatically solves the reconciliation problem.

More prolonged outages usually require a complete data transfer, State Snapshot Transfers (SST).

In most cases, the senhasegura cluster is resilient and intelligent enough to resolve the reconciliation by performing an SST automatically.

Manual intervention to perform an SST in the cluster

First check the syncronization status login the database and verifying the following variables control:

sudo orbit cluster status

Stop the MariaDB process;
```
sudo systemctl stop mariadb.service 
```
Disable the replication changing the galera.cnf configuration file;
Edit the /etc/mysql/conf.d/galera.cnf configuration file;
Locate the wsrep_on parameter and change it value to OFF
Save the file and exit the editor;
Delete the old cluster control files;
sudo rm /var/lib/mysql/galera.cache;
sudo rm /var/lib/mysql/grastate.dat;
sudo rm /var/lib/mysql/multimaster.info;
Start the MariaDB process;
```
sudo systemctl start mariadb.service 
```

First steps at the secondary node (other members)

Stop the MariaDB process;
```
sudo systemctl stop mariadb.service 
```

Rename the current database data folder for a backup purpose;

sudo mv /var/lib/mysql /var/lib/mysql-$(date +%d%m%y%H%M) 

Create a new database data folder;

sudo install -d /var/lib/mysql -o mysql -g mysql 

Second steps at primary node (Primary Member)

Stop the MariaDB process;
```
sudo systemctl stop mariadb.service 
```
Enable the replication:
Edit the /etc/mysql/conf.d/galera.cnf configuration file;
Locate the wsrep_on parameter and change it value to ON
Save the file and exit the editor;
Into another terminal, keep your attention to the database logs:
```
sudo tailf /var/log/mysql/mysql-error.log 
```
Recreate the cluster;
```
sudo galera_new_cluster 
```
Wait for the complete initialization;

Second steps at secondary node (other members)

Confirm that the replication is enabled at the galera.cnf configuration file;
Edit the /etc/mysql/conf.d/galera.cnf configuration file;
Locate the wsrep_on parameter and change it value to ON
Save the file and exit the editor;
Into another terminal, keep your attention to the database logs:
```
sudo tailf /var/log/mysql/mysql-error.log 
```
Start the MariaDB process;
```
sudo systemctl start mariadb.service 
```
Check if the number of cluster members are correct at database log (E.g.: if there is 2 members, the message members = 2/2(joined/total) should be printed);

Check if the sync confirmation appears

WSREP: Member 0.0 (vsrv-senhasegura-cert05) synced with group.

Application status and services

All the services used by senhasegura platform can be managed by orbit command line.

Restarting primary instance

A primary instance is an instance that centers all services execution. And also used as a primary member of the cluster schema.

You can check how the instance is configured using the orbit status command.

To switch a instance to primary and activate it usage, use the following command sequence to grant a correctly usage:

sudo orbit application stop;
sudo orbit application master;
sudo orbit application start;
sudo orbit proxy fajita restart;
sudo orbit proxy rdpgate restart;

The orbit application stop and orbit application start will also restart the basic web server services NGINX and PHP-FPM.

Restarting Linux services

All services can be restarted using the orbit command interface.

Use the sudo orbit service command to restart a linux service.

Keep a close attention to the following services status. You can restart it by yourself if an unexpected service stop happens.

nginx: Web server service. If restarted, restart php-fpm service also;
php-fpm: PHP Wrapper service;
mariadb: Database service;
docker: Proxy isolation service;
wazuh-manager: HIDS service;

Host-Based Intrusion Detection Systems (HIDS) IP blocked

If Host-Based Intrusion Detection Systems block an IP, you can unblock the IP using the command orbit firewall.

sudo orbit firewall –show
sudo orbit firewall unblock –host=[blocked IP]

Restarting cluster environment

Into a cluster environment you should restart or shutdown instances into the right order to avoid problems.

Use the sudo orbit shutdown into cluster members, one instance at time, waiting for the complete shutdown to start the process into another member.

This way, the available cluster members will understand that members are going down. Keep the primary node the last one to be shut down and the first to turn on again.

Orbini services and task execution

Orbini services is the senhasegura abstraction layer for services executed by senhasegura modules.

You can control its execution into the menu Settings ➔ Execution processes ➔ Processes.

Every process has an execution timeout configuration, and sometimes multiple processes can be accumulated waiting to be executed.

To understand why the oldest process is stuck on the task list, execute the process manually.

sudo orbit execution --code ID --verbose --debug

System Recovery and Maintenance

How is the Data Reconciliation inside senhasegura​

Manual intervention to perform an SST in the cluster​

Application status and services​

Restarting primary instance​

Restarting Linux services​

Host-Based Intrusion Detection Systems (HIDS) IP blocked​

Restarting cluster environment​

Orbini services and task execution​