Corosync and pacemaker clusters allow two machines to share a high availability address. Each machine has its own normal IP address used to administer the machine. There is then a third "service" IP address that Kamailio clients connect to. This normally runs on the primary machine as an IP alias eth0:0. The backup machine then monitors the health of the primary machine, and if it crashes the backup takes over the service IP address, again as eth0:0. Each machine monitors the other through the network.
Install corosync and pacemaker
Do the steps below on both machines. First, install the packages:
Debian / Ubuntu:
- apt-get install ntp
- [If the above fails, try apt-get install chrony instead]
- apt-get install pacemaker crmsh
- systemctl enable corosync.service
- systemctl enable pacemaker.service
Devuan Ascii and later:
- apt-get install ntp
- [If the above fails, try apt-get install chrony instead]
- apt-get install pacemaker crmsh
- update-rc.d corosync defaults
- update-rc.d pacemaker defaults
Devuan Jessie:
Edit /etc/apt/sources.list and add the following lines:
- deb http://auto.mirror.devuan.org/merged jessie-backports main
- deb-src http://auto.mirror.devuan.org/merged jessie-backports main
Then do:
- apt-get install ntp
- [If the above fails, try apt-get install chrony instead]
- apt-get install pacemaker crmsh -t jessie-backports
- update-rc.d corosync defaults
- update-rc.d pacemaker defaults
CentOS 6:
- cd /etc/yum.repos.d
- wget https://download.opensuse.org/repositories/network:ha-clustering:Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo
- yum -y install ntp
- yum -y install pacemaker crmsh
- chkconfig corosync on
- chkconfig pacemaker on
CentOS 7:
- cd /etc/yum.repos.d
- wget https://download.opensuse.org/repositories/network:ha-clustering:Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
- yum -y install ntp
- yum -y install pacemaker crmsh
- systemctl enable corosync.service
- systemctl enable pacemaker.service
CentOS 8 and later:
- cd /etc/yum.repos.d
- wget https://download.opensuse.org/repositories/network:ha-clustering:Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
- dnf install -y chrony
- dnf install --enablerepo HighAvailability -y pacemaker corosync
- dnf install -y crmsh
- systemctl enable corosync.service
- systemctl enable pacemaker.service
On machines running Rocky Linux:
- dnf config-manager --set-enabled HighAvailability
- dnf install -y chrony
- dnf install -y pacemaker pcs
- systemctl enable corosync.service
- systemctl enable pacemaker.service
Others:
- yum -y install ntp
- yum -y install pacemaker crmsh
- update-rc.d corosync defaults
- update-rc.d pacemaker defaults
Configure corosync and pacemaker
Verify that the directory /var/log/corosync has been created on both machines. If not, then do:
- mkdir /var/log/corosync
Do the steps below on the primary node only:
Debian / Ubuntu / Devuan:
- apt-get install haveged
- corosync-keygen
- apt-get remove --purge haveged
- scp /etc/corosync/authkey username@<ip address of the secondary node>:/etc/corosync/.
Others:
- yum install haveged
- corosync-keygen
- yum remove haveged
- scp /etc/corosync/authkey username@<ip address of the secondary node>:/etc/corosync/.
Do the steps below on the secondary node only:
- chown root: /etc/corosync/authkey
- chmod 400 /etc/corosync/authkey
On both nodes do:
- cp /opt/enswitch/current/install/etc/corosync/corosync.conf /etc/corosync/corosync.conf
- mkdir /etc/corosync/service.d
- ln -s /opt/enswitch/current/etc/corosync/service.d/pcmk /etc/corosync/service.d/pcmk
- ln -s /opt/enswitch/current/etc/corosync/corosync /etc/default/corosync
- vi /etc/corosync/corosync.conf
Edit /etc/corosync/corosync.conf as follows:
- Set the first IP address of your network in bindnetaddr under interface, under totem.
- Set the IP addresses of the primary and secondary node under nodelist.
- Set the hostnames of the primary and secondary node under nodelist.
On both nodes start the corosync and pacemaker services (or restart them if they are already started):
- service corosync start
- service pacemaker start
On only one of the nodes check whether there are any resources already configured by default:
- crm status
If there are these must be removed:
- crm resource stop <resource name>
- crm configure delete <resource name>
On only one of the nodes run:
- crm configure property stonith-enabled=false
- crm configure property no-quorum-policy=ignore
To add the floating IP resource where Kamailio will be listening on run on either one of the nodes the following command after replacing the IP address and netmask with the appropriate values:
- crm configure primitive ip ocf:heartbeat:IPaddr2 params ip=192.168.1.1 cidr_netmask="24" op monitor interval="30s"
Add the Enswitch resources that will be running on the nodes and omit those that are not needed. Then group them together by running the following command on only one of the nodes:
Systems using systemd:
- crm configure primitive kamailio systemd:kamailio op monitor interval="30s"
- crm configure primitive enswitch_messaged systemd:enswitch_messaged op monitor interval="30s"
- crm configure primitive enswitch_sipd systemd:enswitch_sipd op monitor interval="30s"
- crm configure primitive enswitch_blfd systemd:enswitch_blfd op monitor interval="30s"
- crm configure group enswitch ip kamailio enswitch_messaged enswitch_sipd enswitch_blfd
Systems using System V init scripts:
- crm configure primitive kamailio lsb:kamailio op monitor interval="30s"
- crm configure primitive enswitch_messaged lsb:enswitch_messaged op monitor interval="30s"
- crm configure primitive enswitch_sipd lsb:enswitch_sipd op monitor interval="30s"
- crm configure primitive enswitch_blfd lsb:enswitch_blfd op monitor interval="30s"
- crm configure group enswitch ip kamailio enswitch_messaged enswitch_sipd enswitch_blfd
Disable resources which are managed by Corosync/Pacemaker so that they aren't started on system boot by systemd or init:
Systems using systemd:
- systemctl disable kamailio
- systemctl disable enswitch_messaged
- systemctl disable enswitch_sipd
- systemctl disable enswitch_blfd
Systems using Devuan (and older Debian-like systems):
- update-rc.d kamailio remove
- update-rc.d enswitch_messaged remove
- update-rc.d enswitch_sipd remove
- update-rc.d enswitch_blfd remove
Systems using CentOS, Fedora, or Redhat:
- chkconfig kamailio off
- chkconfig enswitch_messaged off
- chkconfig enswitch_sipd off
- chkconfig enswitch_blfd off
On only one of the nodes check whether the resources have started and are active on the correct node:
- crm status
If any of the resources have not started, then run on any one of the nodes:
- crm resource start <resource name>