Archive

Posts Tagged ‘cluster’

Configure a Linux-HA high avaliability heartbeat cluster

December 23, 2011 1 comment

In this example we will configure a webserver using apache and we will cluster it. It can be implemented on centos, fedora and other redhat flavors.

Pre-Configuration Requirements

Following are the hostnames and ipv4 addresses that will be used:

  • 192.168.1.15 prime ( webserver)
  • 192.168.1.16 calc (webserver)
  • 192.168.1.20 sigma (ha address)

Configuration

1. Download and install the heartbeat package. In our case we are using CentOS so we will install heartbeat with yum:

yum install heartbeat

or download these packages:

heartbeat-2.08
heartbeat-pils-2.08
heartbeat-stonith-2.08

2. Now we have to configure heartbeat on our two node cluster. We will deal with three files. These are:

  1. /etc/ha.d/ha.cf: protocol, server options and servers.
  2. /etc/ha.d/authkeys: shared keysfile
  3. /etc/ha.d/resources: resource definitions

ha.cf

For the example setup the ha.cf file looks like the following:

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 10
udpport 694
bcast     eth0
node    prime
node    calc
auto_failback on

The above options are pretty straightforward; where the debuglog is, logfile, what level, tcp keepalive in seconds, deadtime in between in seconds, what udp port, what interface to broadcast on then the nodes in the cluster.

authkeys

The documentation explains the various options but for this example we are using sha1 algorithm:

#vi authkeys
edit as follows
auth 2
#1 crc
2 sha1 test-ha
#3 md5 Hello!

Also the authkeys file must be read only root:

chmod 0600 authkeys

haresources

The resources file dictates the shared address and services in init to startup (or shutdown as the case may be):

prime 192.168.1.20 apache2

The starting or primary server is put as the first argument. Now the the configuration is done on the primary server – the exact same settings can be used on the secondary one.

 Copy the /etc/ha.d/ directory from node01 to node02:

scp -r /etc/ha.d/ root@calc:/etc/

3.  Now exchange and save authorized keys between node1 and node2.
Key exchange:

On node1:

Generate the key:

[root@prime ~]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
9f:5d:47:6b:2a:2e:c8:3e:ee:8a:c2:28:5c:ad:57:79 root@prime

Pass the key to node2:
[root@prime ~]# scp .ssh/id_dsa.pub calc:/root/.ssh/authorized_keys

On node2:

Generate the key:

[root@calc ~]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
40:66:t8:bd:ac:bf:68:38:22:60:d8:9f:18:7d:94:21 root@calc

Pass the key to node1:
[root@calc ~]# scp .ssh/id_dsa.pub prime:/root/.ssh/authorized_keys

NOTE: We don’t need to create a virtual network interface and assign an IP address (192.168.1.20) to it. Heartbeat will do this for you, and start the service (httpd) itself. So don’t worry about this.

4. A basic apache server for the test is required as well:

 #yum install httpd*

To illustrate the test, a simple page on each webserver with its hostname can be used and put into /var/www/html/index.html:

<html><head></head<body>prime</body></html>
<html><head></head<body>calc</body></html>

Next – startup and set to start at boot the webservers (run on both systems):

service apache2 start
chkconfig apache2 on

Now time to test the systems separately with lynx --dump:

# lynx --dump prime
   prime

# lynx --dump calc
   calc

5. On both nodes:

#vi /etc/httpd/conf/httpd.conf
 Listen 192.168.1.20:80

Firing it Up

Starting up is pretty simple:

# chkconfig heartbeat on
# service heartbeat start
Starting High-Availability services2009/07/25_21:04:30 INFO:  \
        Resource is stopped
heartbeat[4071]: 2009/07/25_21:04:30 info: Version 2 support: false
heartbeat[4071]: 2009/07/25_21:04:30 info: **************************
heartbeat[4071]: 2009/07/25_21:04:30 info: \
        Configuration validated. Starting heartbeat 2.99.3

Now a litmus test of the shared address:

#  lynx --dump 192.168.1.20
   prime

Testing

Testing can be a little tricky – the simplest way is to stop the heartbeat service on the active node and let the other one take over, observe the log entries on the calc node:

IPaddr[5106]:   2009/07/25_21:32:55 INFO: eval \
        ifconfig eth0:0 192.168.1.20 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[5089]:   2009/07/25_21:32:55 INFO:  Success
ResourceManager[5006]:  2009/07/25_21:32:55 \
        info: Running /etc/init.d/apache2  start
mach_down[4980]:        2009/07/25_21:32:58 info: \
        mach_down takeover complete for node prime.
heartbeat[4241]: 2009/07/25_21:33:05 WARN: node prime: is dead
heartbeat[4241]: 2009/07/25_21:33:05 info: Dead node prime gave up resources.
heartbeat[4241]: 2009/07/25_21:33:05 info: Resources being acquired from prime.
heartbeat[4241]: 2009/07/25_21:33:05 info: Link prime:eth0 dead.
harc[5258]:     2009/07/25_21:33:06 info: Running /etc/ha.d/rc.d/status status
heartbeat[5259]: 2009/07/25_21:33:06 info: \
        No local resources [/usr/share/heartbeat/ResourceManager \
        listkeys calc] to acquire.
mach_down[5287]:        2009/07/25_21:33:06 info: \
        Taking over resource group 192.168.1.20
ResourceManager[5313]:  2009/07/25_21:33:06 \
        info: Acquiring resource group: prime 192.168.1.20 apache2
IPaddr[5340]:   2009/07/25_21:33:06 INFO:  Running OK
mach_down[5287]:        2009/07/25_21:33:07 \
        info: mach_down takeover complete for node prime.

And a quick check with lynx:

#  lynx --dump 192.168.1.20
   calc

Note that once prime is back online that calc gives control back:

ResourceManager[5515]:  2009/07/25_21:33:43 info: \
        Releasing resource group: prime 192.168.1.20 apache2
ResourceManager[5515]:  2009/07/25_21:33:43 info: \
        Running /etc/init.d/apache2  stop
ResourceManager[5515]:  2009/07/25_21:33:44 info: \
        Running /etc/ha.d/resource.d/IPaddr 192.168.1.20 stop
IPaddr[5592]:   2009/07/25_21:33:44 INFO: ifconfig eth0:0 down
IPaddr[5575]:   2009/07/25_21:33:44 INFO:  Success

Don’t use the IP addresses 192.168.1.15 and 192.168.1.16 for services. These addresses are used by heartbeat for communication between node01 and node02. When any of them will be used for services/resources, it will disturb hearbeat and will not work. Be carefull!!!

Categories: centos, Linux Tags: , , , ,