Jump to content

Debian Lenny High Available Cluster


Billy

Recommended Posts

Distribution used for pbnxsip cluster:

 

Debian 5.0 Lenny

 

Download at http://cdimage.debian.org/cdimage/daily-bu...386-netinst.iso

 

This tutorial assumes you have a basic familiarity with Linux:

 

1. Know how to install the Linux operating system (if you can read, you will have no issues with some of the tutorials on the net.)

2. Logging in

3. Virtual consoles

4. Shells and commands

5. Files and directories

6. The directory tree

7. Partitioning of hard drives

8. Installing from source

9. Etc.

 

I have 2 nodes:

Each node has 2 network cards.

Each nodes have 2 partitions. (1 8gig and 1 11gig: [sda and sdb])

Cluster configuration is Active/Standby

 

Note: Active/Active is possible with OCFS2 with DRBD primary/primary. After you tackle this, have read through

all the documentation and with a little tweaking, you can have Active/Active pbxnsip Linux cluster.

 

sip01: 10.1.10.201 (Primary)

sip02: 10.1.10.202 (Secondary)

Virtual IP (VIP) 10.1.10.210

default gateway: 10.1.10.1

 

Cluster software packages:

pacemaker (includes the cluster resource manager and corosync; You do not have to install corosync separately.)

drbd 8.3.4 (source only)

drbd 8.2.x (Deb package)

 

Resources for help; I highly recommend that you spend a few hours a day for about a week reviewing this documentation. There is

a lot of information and configuration options to select. When something goes wrong or not as expected and you request

for help, you will want to know what you are talking about (Or at least sound like you do). It will make everyone's life

easier. "I started my cluster, but it is not working" will not cut it.

 

 

pacemaker: http://clusterlabs.org/wiki/Main_Page

pacemaker: http://oss.clusterlabs.org/mailman/listinfo/pacemaker

pacemaker: http://clusterlabs.org/mediawiki/images/f/...n_Explained.pdf

corosync (Cluster Engine) http://www.corosync.org

openais: (Cluster Framework - designed to work with corosync) http://www.openais.org/doku.php?id=faq

openais: (Cluster Framework - designed to work with corosync) http://www.openais.org/doku.php?id=support

drbd: TCP/IP Block based replication http://www.drbd.org/users-guide/users-guide.html

drbd: TCP/IP Block based replication http://www.nabble.com/DRBD-f14286.html

drbd: TCP/IP Block based replication http://www.drbd.org/fileadmin/drbd/publica...onf.eu.2007.pdf

 

Lets get started (I will not reinvent the wheel, there is plenty of documentation on getting Debian installed and configured.)

I will paste outputs of my configs and such.

 

 

1. Install Debian: (Assumes you have a i386 architecture.)

 

http://www.debian.org/releases/stable/i386/index.html.en

 

2. Log in as root (Or log in as a normal user and su - to root, or use sudo)

 

Disclaimer: This document assume that you can work with Debian Linux already and know the security implications of working as root and so on.

 

3. Configure networking

 

http://qref.sourceforge.net/Debian/referen...gateway.en.html

 

my configs:

 

sip01:~# cat /etc/network/interfaces

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see interfaces(5).

 

# The loopback network interface

auto lo

iface lo inet loopback

 

# The primary network interface

#Network access

auto eth0

iface eth0 inet static

address 10.1.10.201

netmask 255.255.255.0

gateway 10.1.10.1

 

 

# Used for heartbeat (Crossover cable)

auto eth1

iface eth1 inet static

address 192.168.10.201

netmask 255.255.255.0

 

 

 

sip02:~# cat /etc/network/interfaces

# This file describes the network interfaces available on your system

# and how to activate them. For more information, see interfaces(5).

 

# The loopback network interface

auto lo

iface lo inet loopback

 

# The primary network interface

#Network access

auto eth0

iface eth0 inet static

address 10.1.10.202

netmask 255.255.255.0

gateway 10.1.10.1

 

 

 

# Used for heartbeat (Crossover cable)

auto eth1

iface eth1 inet static

address 192.168.10.202

netmask 255.255.255.0

 

==========================================================================

 

 

 

4. Test local network and heartbeat network

 

 

sip01:~# ping sip02 -c 1

PING sip02.local.dom (10.1.10.202) 56(84) bytes of data.

64 bytes from sip02.local.dom (10.1.10.202): icmp_seq=1 ttl=64 time=4.56 ms

 

--- sip02.local.dom ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 4.566/4.566/4.566/0.000 ms

sip01:~# ping hb02 -c 1

PING hb02.local.dom (192.168.10.202) 56(84) bytes of data.

64 bytes from hb02.local.dom (192.168.10.202): icmp_seq=1 ttl=64 time=2.83 ms

 

--- hb02.local.dom ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 2.830/2.830/2.830/0.000 ms

sip01:~#

 

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 

Note on using subinterfaces (Virtual IP address on physical interface, e.g. eth0:0)

 

I would like to add; NAT breaks the end to end UDP/IP model, more specifically TCP/IP. It was a good thought, but a bad idea! In order to understand

what I mean, a mention of RFC 1631, rfc 1180, and rfc 768 is in order. You will want to review those RFCs to get an understanding of the examples

I am about to explain.

 

When we read/hear about TCP/IP, it is not one protocol that is being discussed, explained, or criticized; Do not even get me started on TCP, TCP is flawed

in itself. At any rate, that is for another discussion (But hey, do not take my word for it: http://www.linuxsecurity.com/resource_file...-security.html).

The good thing is that we do not have to worry too much with TCP in VOIP. The focus here is UDP as this is the main protocol used in the PBXNSIP application.

As I was saying before ripping on TCP (I like ripping on Windows too!), TCP/IP is a suite of protocols> more specifically, TCP, UDP, and IP. I like to refer the layer 4 protocol

that I am discussing as either TCP/IP or UDP/IP for TCP and UDP respectively.

 

As I stated, NAT breaks the end to end IP protocol. A NAT device (Firewall, router, any device running the translation) needs to track all the translations. NAT tracks the translations utilizing

a session (NAT Table).

 

Understanding packet flow is not easy in the beginning; I am a Network Engineer by trade not a Linux System Engineer. I know enough about Linux to get myself into a little trouble.

 

When a client is sending UDP packets to a PBXNSIP server using the VIP, the packet returning back to the client will be the IP address of the physical interface and NOT that of

the of the VIP. When a packet is sent back to the client, the packet will need to traverse the NAT device, and the NAT device will execute a NAT lookup to see if there is a session already

setup for the client, if not, the packet is dropped.

 

The easiest way to look at this is:

 

Server physical IP on eth0: 24.60.75.20

Server VIP: 25.60.75.25

 

NAT session table

 

Inside Source IP - Destination IP - Inside Source Port - Destination Port - Outside NAT IP

Inside 192.168.1.2 74.90.100.25 Inside 32540 5060 24.60.75.25

 

The Outside NAT IP is the IP addressed assigned by the clients ISP. The destination IP and Port are the PBXNSIPs UDP SIP port (5060) and VIP.

The client sends a UDP Registration SIP packet to 74.90.100.25 on port 5060

The server will then send a UDP authentication packet back to 74.90.100.25 port 32450 with a source of 24.60.75.20 port 5060

The NAT device of the client will execute a NAT lookup for the return packet of the UDP authentication SIP packet and as you can see,

there is no session for 24.60.75.20 to 74.90.100.25. Most NAT devices will drop the packet by default. There are other options that you can

enable on most NAT devices not to drop a packet if there is no session.

 

Remember, nothing on the SIP server is broken, this is how UDP communications work (UDP is connectionless). Most TCP/IP based hosts will use

the IP address of the physical interface to build the IP packet before sending the packet down to the data link layer where the data link layer

will encapsulate the IP packet into a frame.

 

As I said, NAT breaks the end to end IP model, so to accommodate you customers that are using NAT, you will need to add routes to your

routing table with different metrics and source addresses.

 

There are several ways to accomplish on how we want to resolve this. If you find that binding pbxnsip to the Virtual IP address

is not an option you can utilize this method:

 

 

ip route delete default via 10.1.10.1

ip route add default via 10.1.10.1 dev eth0 metric 5

exit

 

What this will do is add a route with a metric of 5, still there is the only default route that will be in the routing table, but later on as you will see

we will add a cluster resource to add an additional default route with a source of 10.1.10.210 and a metric of 0.

 

When the cluster adds the second route, there will be 2 routes that will be installed into the table, one with a metric of 0 and one with a metric

of 5. (I will show on how to add a resource to the cluster via the CRM to add a default route)

 

The one that will be preferred is that of the route that has a metric of 0. The end result will look like this:

 

sip01:~# ip route

10.1.10.0/24 dev eth0 proto kernel scope link src 10.1.10.201

10.0.136.0/24 dev eth1 proto kernel scope link src 10.1.10.201

default via 10.1.10.1 dev eth0 src 10.1.10.210

default via 10.1.10.1 dev eth0 metric 5

 

 

sip01:~# route

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

localnet * 255.255.255.240 U 0 0 0 eth0

192.168.10.0 * 255.255.255.0 U 0 0 0 eth1

default 10.1.10.1 0.0.0.0 UG 0 0 0 eth0

default 10.1.10.1 0.0.0.0 UG 5 0 0 eth0

 

Now what happens :

 

Any traffic destined to the VIP: 10.1.10.210 and any return traffic from the server will be sourced as 10.1.10.210.

 

or you can just bind the pbxnsip application to the VIP for 5060 and any other ports that you will need.

 

 

 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 

 

 

 

======================================

Initial Configuration and installation

======================================

5. Install the clustering software

 

A. update apt sources

 

 

sip01:~# echo "deb http://people.debian.org/~madkiss/ha lenny main" >> /etc/apt/sources.list

sip01:~# apt-key adv --keyserver pgp.mit.edu --recv-key 1CFA3E8CD7145E30

sip01:~# apt-get update

sip01:~# apt-get install pacemaker

sip01:~# apt-get install psmisc

 

 

sip02:~# echo "deb http://people.debian.org/~madkiss/ha lenny main" >> /etc/apt/sources.list

sip02:~# apt-key adv --keyserver pgp.mit.edu --recv-key 1CFA3E8CD7145E30

sip02:~# apt-get update

sip02:~# apt-get install pacemaker

sip02:~# apt-get install psmisc

 

The next 2 commands, execute on the primary node:

 

sip01:~# corosync-keygen ( you will need to press keys until it reaches 1024)

sip01:~# scp /etc/corosync/authkey root@sip02:/etc/corosync

sip01:~# vi /etc/default/corosync

Change start=no to start=yes

 

sip02:~# vi /etc/default/corosync

Change start=no to start=yes

 

 

 

 

=======================

Edit configfile

=======================

 

Most of the options in the /etc/corosync/corosync.conf file are ok to start with, you must however make sure that it can communicate

so make sure to adjust this section [Note: Read - http://www.corosync.org/doku.php?id=faq:configure_openais)

 

interface {

# The following values need to be set based on your environment

ringnumber: 0

bindnetaddr: 192.168.10.0

mcastaddr: 226.94.1.1

mcastport: 5405

}

Save the file and start the cluster:

 

sip01:# /etc/init.d/corosync start

sip02:# /etc/init.d/corosync start

 

Check the status of the cluster:

 

sip01:/etc/corosync# crm_mon --one-shot -V

sip02:/etc/corocync# crm_mon --one-shot -V

 

You should have a similar output if you followed on the directions:

 

 

 

 

sip01:~# crm_mon --one-shot -V

crm_mon[5635]: 2009/10/21_15:15:06 ERROR: unpack_resources: No STONITH resources have been defined

crm_mon[5635]: 2009/10/21_15:15:06 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

crm_mon[5635]: 2009/10/21_15:15:06 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

 

 

============

Last updated: Wed Oct 21 15:15:06 2009

Stack: openais

Current DC: sip01 - partition WITHOUT quorum

Version: 1.0.5-unknown

2 Nodes configured, 3076020514 expected votes

0 Resources configured.

============

 

Online: [ sip01 sip02 ]

 

 

 

sip02:/etc/corosync# crm_mon --one-shot -V

crm_mon[5635]: 2009/10/21_21:13:23 ERROR: unpack_resources: No STONITH resources have been defined

crm_mon[5635]: 2009/10/21_21:13:23 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option

crm_mon[5635]: 2009/10/21_21:13:23 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity

 

 

============

Last updated: Wed Oct 21 21:13:23 2009

Stack: openais

Current DC: sip01 - partition WITHOUT quorum

Version: 1.0.5-unknown

2 Nodes configured, 3076020514 expected votes

0 Resources configured.

============

 

Online: [ sip01 sip02 ]

 

As you can see the setup is complaining about STONITH, but that is ok since we have not configured that part of the cluster.

Also, this is bad: partition WITHOUT quorum. This is ok too.

 

One of the most common reasons for this is the way quorum is calculated for a 2-node cluster. Unlike Heartbeat, OpenAIS/Corosync doesn't pretend 2-node clusters always have quorum.

In order to have quorum, more than half of the total number of cluster nodes need to be online. Clearly this is not the case when a node failure occurs in a 2-node cluster.

If you want to allow the remaining node to provide all the cluster services, you need to set the no-quorum-policy to ignore.

 

All crm configure commands will always be ran from the primary (not mandatory, but best practice).

 

sip01:~# crm configure property no-quorum-policy=ignore

sip01:~# crm configure property stonith-enabled=false

 

Recheck the status if the cluster:

 

sip01:~# crm_mon --one-shot -V

 

The STONITH errors have been purged; However, the partition WITHOUT quorum is still present. This is Ok too.

 

Lets configure our ip_pbxnsip (Fail-over IP[VIP])

 

sip01:~# crm configure primitive ip_pbxnsip ocf:heartbeat:IPaddr params ip=10.1.10.210 op monitor interval=10s

 

If your local network is on eth0, then a sub-interface will be created for the virtual IP (VIP).

 

 

sip01:~# ifconfig eth0:0

eth0:0 Link encap:Ethernet HWaddr 08:00:27:c2:e8:c4

inet addr:10.1.10.210 Bcast:10.1.10.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

sip02 will not have the VIP configured as it is the standby.

 

 

sip02:/etc/corosync# ifconfig eth0:0

eth0:0 Link encap:Ethernet HWaddr 08:00:27:14:e8:8e

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

Lets test the fail-over before moving on.

 

 

sip01:~# /etc/init.d/corosync stop

 

Check that the fail-over IP started on the standby:

 

 

sip02:/etc/corosync# ifconfig eth0:0

eth0:0 Link encap:Ethernet HWaddr 08:00:27:14:e8:8e

inet addr:10.1.10.210 Bcast:10.1.10.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

 

Also, check from the clusters perspective:

 

sip02:/etc/corosync# crm_mon --one-shot

 

 

============

Last updated: Wed Oct 21 21:26:35 2009

Stack: openais

Current DC: sip02 - partition WITHOUT quorum

Version: 1.0.5-unknown

2 Nodes configured, 3076098343 expected votes

1 Resources configured.

============

 

Online: [ sip02 ]

OFFLINE: [ sip01 ]

 

ip_pbxnsip (ocf::heartbeat:IPaddr): Started sip02

 

Lets test preemption of sip01:

 

Start corosync on sip01:

sip01:~# /etc/init.d/corosync start

 

Check status from sip02:

 

sip02:/etc/corosync# crm_mon --one-shot

 

 

============

Last updated: Wed Oct 21 21:27:36 2009

Stack: openais

Current DC: sip02 - partition WITHOUT quorum

Version: 1.0.5-unknown

2 Nodes configured, 3076098338 expected votes

1 Resources configured.

============

 

Online: [ sip01 sip02 ]

 

ip_pbxnsip (ocf::heartbeat:IPaddr): Started sip01

 

 

Migrate the resource to the other node:

 

For whatever reason, if you wanted to run the resource on another node then the one it

is running on now:

 

sip01:# crm

crm(live)# resource

crm(live)resource# list

ip_pbxnsip (ocf::heartbeat:IPaddr) Started

crm(live)resource# migrate ip_pbxnsip sip02

crm(live)resource# bye

bye

 

Stop the resource:

You can also stop the resource if you wish:

 

sip01:# crm

crm(live)# resource

crm(live)resource# stop ip_pbxnsip

crm(live)resource# bye

bye

 

========================================

 

Install DRBD:

 

http://www.drbd.org/

 

I highly recommend that you install drbd from source; For the inexperienced users, if you

just want to install by package, then:

 

 

On both nodes: primary and secondary:

 

sip01:~# apt-get update drbd8-utils

sip01:~# drbd8-modules-2.6.26-2-686

 

sip02:~# apt-get update drbd8-utils

sip02:~# drbd8-modules-2.6.26-2-686

 

Then skip to the DRBD config file configuration section [Configure the DRBD config file:], then next items are related to

installing by source.

 

To install by source, first download the source tarball (latest):

 

on both nodes:

 

sip01:~# apt-get install linux-headers-$(uname -r)

sip01:~# cd ~

sip01:~# wget http://oss.linbit.com/drbd/8.3/drbd-8.3.4.tar.gz

sip01:~# tar xzvf drbd-8.3.4.tar.gz

sip01:~# cd drbd-8.3.4/drbd

sip01:~# apt-get install make gcc build-essential flex

sip01:~# make clean all

sip01:~# cd ..

sip01:~# make tools

sip01:~# make install

sip01:~# make install-tools

 

 

sip02:~# apt-get install linux-headers-$(uname -r)

sip02:~# cd ~

sip02:~# wget http://oss.linbit.com/drbd/8.3/drbd-8.3.4.tar.gz

sip02:~# tar xzvf drbd-8.3.4.tar.gz

sip02:~# cd drbd-8.3.4/drbd

sip02:~# apt-get install make gcc build-essential flex

sip02:~# make clean all

sip02:~# cd ..

sip02:~# make tools

sip02:~# make install

sip02:~# make install-tools

 

Note!!!! The commands that you ran above:

sip01:~# cd drbd-8.3.4/drbd

sip02:~# cd drbd-8.3.4/drbd

sip01:~# make clean all

sip02:~# make clean all

 

You are making the DRBD module. Any kernel upgrades will require you to rebuild the kernel module!

 

 

Configure the DRBD config file:

 

Find out what disks have been identified:

 

 

sip01:~/drbd-8.3.4# dmesg | grep Attached

[ 7.129455] sd 0:0:0:0: [sda] Attached SCSI disk

[ 7.134073] sd 2:0:0:0: [sdb] Attached SCSI disk

 

I have sda and sdb (Your configuration might be different). In the config file replace my sdb with your dev device.

 

There is a lot of information that you will want to read about in the drbd.conf file. I can not go over every piece in

detail. You will need to spend the time reading and conduct some lab tests. I would recommend that you set this environment

up using VMware or such. There will be some fine tuning needed, but most of this information will get you going with a cluster

that you can test and tweak. Once you have something that is right for your environment, you can then set this up on your

production servers.

 

I suggest you read through the drbd.conf file if you used drbd 8.4.3; It should be complete with examples and comments.

My configuration should get you up and going:

 

This will zero out the file!!!

 

sip01:# cat /dev/null > /etc/drbd.conf

 

 

Open the file and copy my config below:

 

sip01:# vi /etc/drbd.conf

 

global {

usage-count yes;

}

 

 

common {

syncer { rate 80M; }

}

 

 

resource r0 {

protocol C;

handlers {

pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";

pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";

local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";

fence-peer "/usr/lib/drbd/crm-fence-peer.sh";

}

startup {

}

 

disk {

on-io-error detach;

}

 

net {

after-sb-0pri disconnect;

after-sb-1pri disconnect;

after-sb-2pri disconnect;

rr-conflict disconnect;

}

 

syncer {

rate 80M;

}

 

on sip01 {

device /dev/drbd0;

disk /dev/sdb2;

address 192.168.10.201:7788;

flexible-meta-disk internal;

}

 

on sip02 {

device /dev/drbd0;

disk /dev/sdb2;

address 192.168.10.202:7788;

meta-disk internal;

}

}

 

after you configure the /etc/drbd.conf file, scp it to node2 (sip02)

 

 

 

sip01:# scp /etc/drbd.conf root@sip02:/etc/

 

Create device metadata. This step must be completed only on initial device creation. It initializes DRBD's metadata:

For "resource" you have the resource name you defined in drbd.conf. If this is untouched it should be r0.

 

sip01:~# drbdadm create-md r0

sip02:~# drbdadm create-md r0

 

sip01:~# /etc/init.d/drbd start

sip02:~# /etc/init.d/drbd start

 

Check the status DRBD:

 

 

sip01:~# cat /proc/drbd

version: 8.3.4 (api:88/proto:86-91)

GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@sip01, 2009-10-21 15:50:17

0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----

ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:10538280

 

At this point, this is normal until we can complete the setup.

 

As long as you see: Secondary/Secondary ds:Inconsistent/Inconsistent

Continue on!

 

On the primary node:

 

sip01:~# drbdadm -- --overwrite-data-of-peer primary r0

 

Check the status again, the data should be syncing:

 

sip01:~# cat /proc/drbd

version: 8.3.4 (api:88/proto:86-91)

GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@sip01, 2009-10-21 15:50:17

0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----

ns:55372 nr:0 dw:0 dr:59564 al:0 bm:2 lo:1 pe:290 ua:123 ap:0 ep:1 wo:b oos:10492168

[>....................] sync'ed: 0.5% (10244/10288)M

finish: 0:29:08 speed: 5,664 (4,192) K/sec

 

At this point, wait for the sync to complete!

 

I will be back in about 29 minutes! :(

 

K, I am back!

Lets check the status of DRBD

 

sip01:~# cat /proc/drbd

version: 8.3.4 (api:88/proto:86-91)

GIT-hash: 70a645ae080411c87b4482a135847d69dc90a6a2 build by root@sip01, 2009-10-21 15:50:17

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----

ns:10538280 nr:0 dw:0 dr:10538816 al:0 bm:644 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

 

 

 

At this point, we can create the mount point and filesystem for drbd0. As we are building a Active/Standby cluster

we can use ext3 or ext4. That choice is yours; However, if you are planning to run Active/Active (Primary/Primary)

you can not use ext at all. You will want to use GFS or OCFS2; However, that is out of scope for this document.

 

Create the filesystem:

 

sip01:~# mkfs.ext3 /dev/drbd0

 

sip01:~# mkdir /usr/local/pbxnsip

sip02:~# mkdir /usr/local/pbxnsip

 

sip01:~# mount /dev/drbd0 /usr/local/pbxnsip

 

Check the mount point:

 

sip01:~# mount | grep drbd

/dev/drbd0 on /usr/local/pbxnsip type ext3 (rw)

 

 

sip01:~# cd /usr/local/pbxnsip

sip01:/usr/local/pbxnsip# wget http://www.pbxnsip.com/download/pbxctrl-debian4.0-3.4.0.3201

sip01:/usr/local/pbxnsip# mv pbxctrl-debian4.0-3.4.0.3201 pbxctrl

sip01:/usr/local/pbxnsip# chmod 755 pbxctrl

 

 

sip01:/usr/local/pbxnsip# cd /etc/init.d/

sip01:/etc/init.d# vi pbxnsip

 

 

Copy and paste:

 

#!/bin/bash

 

PBXEXE=/usr/local/pbxnsip/pbxctrl

PBXDIR=/usr/local/pbxnsip

 

#Service script for the pbxnsip PBX:

case "$1" in

start)

echo -n "Starting pbxnsip daemon"

$PBXEXE --dir $PBXDIR || return=$rc_failed

echo -e "$return"

;;

stop)

echo -n "Shutting down pbxnsip daemon:"

killall $PBXEXE || return=$rc_failed

echo -e "$return"

;;

restart)

$0 stop && $0 start || return=$rc_failed

;;

status)

echo -n "Checking for service pbxnsip: "

checkproc /usr/sbin/pbxnsip && echo OK || echo No process

;;

*)

echo "Usage: $0 {start|stop|status|restart}"

exit 1

esac

 

# Inform the caller not only verbosely and set an exit status.

test "$return" = "$rc_done" || exit 1

exit 0

 

sip01:/etc/init.d# chmod 755 pbxnsip

 

build the config files and such (Start pbxnsip)

sip01:/usr/local/pbxnsip# /etc/init.d/pbxnsip start

 

NOTE!!!! If you have a 64bit Debian version, you will need to install ia32-libs [apt-get install ia32-libs]

 

sip01:/usr/local/pbxnsip# ps -ef | grep pbx

root 15798 1 1 18:02 pts/0 00:00:00 /usr/local/pbxnsip/pbxctrl --dir /usr/local/pbxnsip

 

We are almost done; at this point, you should be able to connect to the http interface and login.

 

This is the basic installation. We will now want to setup the cluster to:

 

Start the virtual IP

Start DRBD

Mount the file system (/usr/local/pbxnsip)

and start the application

 

This will be dicussed later on!

 

This has to be start in this order to make it work right.

 

The first thing we will want to modify is the init script.

 

A couple of things to note:

 

Per pbnxsip:

https://www.pbxnsipsupport.com/index.php?_m...v=0,142,143,188

 

They state to also do this:

 

To install the script, just use the command "update-rc.d" like this:

update-rc.d pbxnsip defaults ( DO NOT DO THIS!!! the cluster resource manager will start the application ):

 

Also for DRBD:

 

Use the DRBD OCF resource agent. In this case, you must not let init load and configure DRBD, because this

resource agent does that itself.

 

If you followed this tutorial, you are safe; the init scripts are not enabled to start at boot tho the scripts

are in the init.d directory.

 

You can install rcconf (Debian only) and double check the status of those scripts:

 

sip01:# apt-get install rcconf

sip01:# rcconf

 

They should not be checked!

 

We will want to modify the pbxnsip script that is in /etc/init.d

Why?

Well, the folks at pbxnsip have a bug (This is normal; nothing bad; I am sure there will be something is this document that I missed; It happens)

 

sip01:~# /etc/init.d/pbxnsip status

Checking for service pbxnsip: /etc/init.d/pbxnsip: line 23: checkproc: command not found

No process

 

When we add this to the CRM, the Cluster Resource Manager will use that script to check the status

to see if it is running, but will fail as there is no such thing as checkproc in Debian 5.0 functions!

 

sip01:~# vi /etc/init.d/pbxnsip

 

find the line:

PBXDIR=/usr/local/pbxnsip

 

Add this after that line

 

. /lib/lsb/init-functions

 

find the line

 

checkproc /usr/sbin/pbxnsip && echo OK || echo No process

 

change that line to read:

 

status_of_proc "$PBXEXE" && exit 0 || exit 1

 

Final result will be:

 

#!/bin/bash

 

PBXEXE=/usr/local/pbxnsip/pbxctrl

PBXDIR=/usr/local/pbxnsip

 

. /lib/lsb/init-functions

 

#Service script for the pbxnsip PBX:

case "$1" in

start)

echo -n "Starting pbxnsip daemon"

$PBXEXE --dir $PBXDIR || return=$rc_failed

echo -e "$return"

;;

stop)

echo -n "Shutting down pbxnsip daemon:"

killall $PBXEXE || return=$rc_failed

echo -e "$return"

;;

restart)

$0 stop && $0 start || return=$rc_failed

;;

status)

echo -n "Checking for service pbxnsip: "

status_of_proc "$PBXEXE" && exit 0 || exit 1

;;

*)

echo "Usage: $0 {start|stop|status|restart}"

exit 1

esac

 

# Inform the caller not only verbosely and set an exit status.

test "$return" = "$rc_done" || exit 1

exit 0

 

So now when you check the status of that application, it will be fixed!

 

 

sip01:~# /etc/init.d/pbxnsip status

Checking for service pbxnsip: is running.

 

Lets go ahead and stop it.

 

Lets go ahead and get sip02 configured and tested:

 

 

sip01:~# umount /usr/local/pbxnsip

sip01:~# /etc/init.d/drbd stop

sip01:~# crm node standby

sip02:~# drbdadm primary r0

sip02:~# mount /dev/drbd0 /usr/local/pbxnsip/

sip02:/usr/local/pbxnsip# ls

 

Your data should be there!

 

Copy over the init script that we create and tweaked on sip01:

 

sip01:~# scp /etc/init.d/pbxnsip root@sip02:/etc/init.d

sip02:~# /etc/init.d/pbxnsip start

sip02:~# /etc/init.d/pbxnsip status

 

It should be running!

 

Cluster is almost finished!

 

Login to the http gui

 

Change the admin password

 

save, then logout.

 

sip02:~# /etc/init.d/pbxnsip stop

sip02:~# umount /usr/local/pbxnsip

sip02:~# drbdadm secondary r0

sip01:~# crm node online

sip01:~# /etc/init.d/drbd start

sip01:~# drbdadm primary r0

sip01:~# mount /dev/drbd0 /usr/local/pbxnsip/

sip01:~# /etc/init.d/pbxnsip start

 

Login to the HTTP GUI to ensure you can log in with the new password.

The data should have replicated over.

 

It should work. We have manually failed over the cluster, at this point the cluster is operational from a manaul

prespective.

 

We now want to configure the CRM for the pbxnsip, drdb and file system resources:

 

Start the virtual IP

Start DRBD

Mount the file system (/usr/local/pbxnsip)

and start the application

 

In this order!

 

crm configure primitive drbd_pbxnsip ocf:linbit:drbd params drbd_resource="r0" op monitor interval="10s"

crm configure ms ms_drbd_pbxnsip drbd_pbxnsip meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

crm configure primitive fs_pbxnsip ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/usr/local/pbxnsip" fstype="ext3"

crm configure primitive ip_pbxnsip ocf:heartbeat:IPaddr params ip=10.1.10.210 op monitor interval=10s

 

If you choose to add routes and not bind to the application to the VIP to the ports, you will need to add the gwsrc_route resource, if not skip the next line.crm configure primitive gwsrc_route ocf:heartbeat:Route params destination="0.0.0.0/0" gateway="10.1.10.1" source="10.1.10.210"

crm configure primitive pbxnsipd lsb:pbxnsip

crm configure group pbxnsip fs_pbxnsip ip_pbxnsip pbxnsipd

crm configure colocation pbxnsip_on_drbd inf: pbxnsip ms_drbd_pbxnsip:Master

crm configure order gwsrc_route inf: pbx:start gwsrc_route:start

crm configure order pbxnsip_after_drbd inf: ms_drbd_pbxnsip:promote pbxnsip:start

 

 

Also, you might want to include a resource to ping your default gateway and if that should fail, initiate a failover:

crm configure primitive pingd ocf:pacemaker:pingd params host_list=10.1.10.1 multiplier=100 op monitor interval=15s timeout=5s

crm configure location my_pbxnsip_cluster_on_connected_node pbxnsip rule -inf: not_defined pingd or pingd lte 0

 

 

sip01:~# crm_mon --one-shot

 

 

============

Last updated: Thu Oct 22 15:07:49 2009

Stack: openais

Current DC: sip01 - partition WITHOUT quorum

Version: 1.0.5-unknown

2 Nodes configured, 3075885346 expected votes

2 Resources configured.

============

 

Online: [ sip01 sip02 ]

 

Master/Slave Set: ms_drbd_pbxnsip

Masters: [ sip01 ]

Slaves: [ sip02 ]

Resource Group: pbxnsip

fs_pbxnsip (ocf::heartbeat:Filesystem): Started sip01

ip_pbxnsip (ocf::heartbeat:IPaddr): Started sip01

pbxnsipd (lsb:pbxnsip): Started sip01

 

 

Test failover

 

sip01:~# crm node standby

 

 

sip01:~# crm_mon --one-shot

 

 

============

Last updated: Thu Oct 22 15:09:22 2009

Stack: openais

Current DC: sip01 - partition WITHOUT quorum

Version: 1.0.5-unknown

2 Nodes configured, 3075885346 expected votes

2 Resources configured.

============

 

Node sip01: standby

Online: [ sip02 ]

 

Master/Slave Set: ms_drbd_pbxnsip

Masters: [ sip02 ]

Stopped: [ drbd_pbxnsip:0 ]

Resource Group: pbxnsip

fs_pbxnsip (ocf::heartbeat:Filesystem): Started sip02

ip_pbxnsip (ocf::heartbeat:IPaddr): Started sip02

pbxnsipd (lsb:pbxnsip): Started sip02

 

 

congratulations, you have a fully functional PBXnSIP Linux cluster...... Hopefully

If there is anything you think I missed or you are stuck somewhere, shoot me an email.

 

I am not sure how this is going to come out, so I am including what I have pasted as a text document.

 

Good luck!

forum_debain_cluster.txt

Link to comment
Share on other sites

  • 4 weeks later...

UPDATE:

 

* pacemaker-openais and pacemaker-heartbeat are gone; pacemaker now

only comes in one flavour, having support for corosync and heartbeat

built it. This is based on pacemaker's capability to detect by which

messaging framework it has been started and act accordingly.

 

* openais is gone. pacemaker 1.0.6 uses corosync.

 

I have updated the above tutorial.

 

Have fun!

Billy

Link to comment
Share on other sites

Hi,

 

with the cost of small appliance pc's coming down we really should have a manageable way to setup a group of computers that act as one in some way. And simple to use. (wow, big order)

 

A start is to make a simple failover which a vbscript should easily do.

 

Let me give this some thot...for the Windows world ...

 

Matt

Link to comment
Share on other sites

  • 1 month later...
  • 3 weeks later...
Did anybody ever try Pacemaker and Corosync? Seems this is a newer project that heatbeat.

 

 

Yes, I have been updating my original post. The update shows that I am using Pacemaker which includes Corosync. Previously, The OpenAIS messaging layer was used over

heartbeat; The current messaging and membership layer is Corosync when installing Pacemaker. I find that Pacemaker is truly a great project with many benefits

and resources from other groups within the community. Good Luck.

 

Billy

Link to comment
Share on other sites

  • 4 weeks later...

Hi, Billy

 

I tried your tutorial on two Lenny Virtual Machines. Everything is pretty straight forward until the last step configure the CRM for pbxnsip, drbd and file system resources:

I can do ip address fail over. The DRBD also works. However, when it comes to fail over everything from corosync it's not working. There are errors.

Here's the output.

rhuang@advocado:~$ sudo crm_mon --one-shot
============
Last updated: Sun Feb 21 10:32:51 2010
Stack: openais
Current DC: cherry - partition WITHOUT quorum
Version: 1.0.7-54d7869bfe3691eb723b1d47810e5585d8246b58
3 Nodes configured, unknown expected votes
3 Resources configured.
============

Online: [ cherry advocado ]
OFFLINE: [  ]

Master/Slave Set: ms_drbd_pbxnsip
 Masters: [ advocado ]
 Slaves: [ cherry ]
gwsrc_route	(ocf::heartbeat:Route): Started advocado (unmanaged) FAILED
Resource Group: pbxnsip
 fs_pbxnsip (ocf::heartbeat:Filesystem):	Started advocado
 ip_pbxnsip (ocf::heartbeat:IPaddr):		Started advocado
 pbxnsipd   (lsb:pbxnsip):  Started advocado

Failed actions:
drbd_pbxnsip_monitor_0 (node=cherry, call=3, rc=6, status=complete): not configured
fs_pbxnsip_start_0 (node=cherry, call=10, rc=1, status=complete): unknown error
gwsrc_route_monitor_0 (node=cherry, call=12, rc=5, status=complete): not installed
drbd_pbxnsip_monitor_0 (node=advocado, call=3, rc=6, status=complete): not configured
gwsrc_route_start_0 (node=advocado, call=26, rc=5, status=complete): not installed
gwsrc_route_stop_0 (node=advocado, call=27, rc=5, status=complete): not installed

Don't know how to debug those failed actions.

And the failover test failed.

rhuang@advocado:~$ sudo crm node standby
rhuang@advocado:~$ sudo crm_mon --one-shot
============
Last updated: Sun Feb 21 10:46:14 2010
Stack: openais
Current DC: cherry - partition WITHOUT quorum
Version: 1.0.7-54d7869bfe3691eb723b1d47810e5585d8246b58
3 Nodes configured, unknown expected votes
3 Resources configured.
============

Node advocado: standby
Online: [ cherry ]
OFFLINE: [  ]

Master/Slave Set: ms_drbd_pbxnsip
 Masters: [ cherry ]
 Stopped: [ drbd_pbxnsip:1 ]
gwsrc_route	(ocf::heartbeat:Route): Started advocado (unmanaged) FAILED

Failed actions:
drbd_pbxnsip_monitor_0 (node=cherry, call=3, rc=6, status=complete): not configured
fs_pbxnsip_start_0 (node=cherry, call=10, rc=1, status=complete): unknown error
gwsrc_route_monitor_0 (node=cherry, call=12, rc=5, status=complete): not installed
drbd_pbxnsip_monitor_0 (node=advocado, call=3, rc=6, status=complete): not configured
gwsrc_route_start_0 (node=advocado, call=26, rc=5, status=complete): not installed
gwsrc_route_stop_0 (node=advocado, call=27, rc=5, status=complete): not installed

 

I believed I have done everything you mentioned in this post. So if you can give me some hint, that will

save me a lot of efforts. Thanks.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...