How to troubleshoot Galera cluster joiner node service
There are some situations in Galera cluster that joiner node fails to start and in this article we want to show How to troubleshoot Galera cluster joiner node service.
If you want to know how to setup Galera cluster, please refer to article Database replication with mariadb on CentOS 7 linux.
1- Time and Date
If time and date between joiner and donor nodes is different, we must correct it. the best practice is to use NTP servers.
2- Database size
In some situations where joiner node wants to join cluster, due to data size, this operation may gets failed. consider the following situation:
DB size: 3 GB
we start first node by issuing:
galera_new_cluster
while we want to start second node by issuing:
systemctl start mariadb
due to large db size, transfer may take a while. so in order to prevent service start operation failure, we have to increase timeout in mariadb service on second node.
vim /lib/systemd/system/mariadb.service
then put the following line under “Service” section:
TimeoutSec=600
and reload systemctl daemon:
systemctl daemon-reload
also consider to update rsync package to the latest version:
yum update rsync
3- Donor node options
In donor node, if we put sst-log-archive=1 in /etc/my.cnf.d/server.conf file, every time a new node wants to join the cluster, the donor node tries to rename sst log file.
For example if the log file is like the following:
/var/lib/mysql/mariabackup.backup.log
it tries to rename it to another name such as:
/var/log/mysql/mariabackup.backup.log.2021.05.05-13.21.09.599455704.
If there be an extra “/” in the destination path, such as:
/var/log/mysql//mariabackup.backup.log.2021.05.05-13.21.09.599455704
or in the source path such as
/var/lib/mysql//mariabackup.backup.log
mariadb will be unable to rename the file and shows an error like the following:
mv: cannot move ‘/var/lib/mysql//mariabackup.backup.log’ to ‘/var/log/mysql//mariabackup.backup.log.2021.05.05-13.21.09.599455704’: Permission denied
in such situation the easiest way is to set sst-log-archive=0 and then shutdown all cluster and start it again.
4- Cluster statement
When donor node start, its state in database should be “Synced” and NOT be equal to “Initialized”.
we check cluster statement by issuing the following command:
mysql -u root -p
then enter password and run this command:
SHOW STATUS LIKE 'wsrep_local_state_comment'; +---------------------------+------------+ | Variable_name | Value | +---------------------------+------------+ | wsrep_local_state_comment | Initialized| +---------------------------+------------+
Here we have to shutdown cluster and start it again by issuing:
galera_new_cluster
If we run previous command again we get desired output:
SHOW STATUS LIKE 'wsrep_local_state_comment'; +---------------------------+--------+ | Variable_name | Value | +---------------------------+--------+ | wsrep_local_state_comment | Synced | +---------------------------+--------+