Clustering third party application with Oracle 10g Clusterware

With oracle 10gR2 oracle decided to open and publish the API of it’s clusterware. This permits to third parti application to be registered in the oracle cluster layer or to develop your own high-availability (HA) solution.
For the howto install step of the oracle cluster you can refer to this document (place here the link).

The 10gR2 allow more than one application to coexist on the same cluster, maybe sharing your RAC nodes.
Just before you ask: you can have the oracle cluster without RAC. In fact you can decide not to install the RAC at all and to go for the clusterware only.

Before you decide to cluster your application with this product, you need to be aware that the oracle cluster needs a shared disk where to place the voting disk and the cluster registry.

It is quite easy to have a SAN in an enterprise environment… not so in small companies.
As described here (http://www.oracle.com/technology/pub/articles/hunter_rac.html) there are viable alternatives.

The test system I set up was simple: two linux (SUSE Linux Enterprise Server 9) nodes connected to a SAN.
The cluster registry and voting disk are on raw devices.

The RDBMS binaries are not installed.

A practical example:

I decided to implement a simple webserver and to cluster it retiring my old heartbeat + MON solution.

The two nodes have this network configuration:

	Node1	Node2
Name	breonldblc03	breonldblc04
Public IP address	192.168.23.191	192.168.23.192
Virtual name	breonldblv03	breonldblv04
Virtual IP	192.168.23.196	192.168.23.19
Private name	internal1	Internal2
Private IP	192.168.255.1	192.168.255.2

My /etc/hosts looks as follow:

127.0.0.1       localhost

# special IPv6 addresses
::1             localhost ipv6-localhost ipv6-loopback

fe00::0         ipv6-localnet

ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
192.168.23.191 breonldblc03.ran breonldblc03
192.168.23.192 breonldblc04.ran breonldblc04
192.168.23.18   breonldblv02.ran breonldblv02
192.168.23.196 breonldblv03.ran breonldblv03
192.168.23.19   breonldblv04.ran breonldblv04
192.168.23.20   breonldblv05.ran breonldblv05
192.168.255.1   internal1.ras    internal1
192.168.255.2   internal2.ras    internal2

Where ran is the internal domain of my company.

You can see two other virtual IP breonldblv02 and breonldblv05. They will be used by my applications.

I installed apache on both nodes. At this point I bind the webserver to listen on a specific network card (eth1) using the virtual address breonldblv05.

In your /etc/httpd/httpd.conf insert:

#
# Use name-based virtual hosting.
#
NameVirtualHost 192.168.23.20:80

The basic step is to create scripts and configuration files which will be used to register your application in the cluster.
Oracle provide you the command crs_profile to simplify the process using templates.

Since apache is network based I’m going to create a resource based on the listening virtual IP.

With the oracle user issue the command:

crs_profile -create apache_ip -t application -a \
/u01/app/oracle/product/10.2/crs_1/bin/usrvip -o \
oi=eth1,ov=192.168.23.20,on=255.255.255.0

This will create a apache_ip.cap file in $ORA_CRS_HOME/crs/public containing the parameters used by the next phase: the registration.

Check the file exists:

oracle10g@breonldblc03:/u01/app/oracle/product/10.2/crs_1/crs/public> ll apache_ip*
-rw-r–r– 1 oracle10g dba 799 2005-07-28 17:47 apache_ip.cap

The content of the file describe the configuration of your resource called apache_ip.
You can modify it at will before registering the resource into the cluster.

cat apache_ip.cap
NAME=apache_ip
TYPE=application
ACTION_SCRIPT=/u01/app/oracle/product/10.2/crs_1/bin/usrvip
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=60
DESCRIPTION=apache_ip
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=60
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=eth1
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=255.255.255.0
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=192.168.23.20

The ACTION_SCRIPT=/u01/app/oracle/product/10.2/crs_1/bin/usrvip specify which script to use for starting, stopping and checking your application (in the example the IP address).
The options: oi=eth1,ov=192.168.23.20,on=255.255.255.0 specify which ethernet card to use, the ip address and the netmask.
All the parameters of the configuration file will be parsed by the crs_registry command giving you an error messages is a problem is found.
After the changes you can register the application:

crs_register apache_ip

For any modification at the configuration file you are going to issue the command:

crs_register apache -u -dir /u01/app/oracle/product/10.2/crs_1/crs/public

to update immediately the new configuration in the cluster.
The example assume that the apache.cap files is in the directory /u01/app/oracle/product/10.2/crs_1/crs/public.

Several other steps are required after the registration. As root:

$ORA_CRS_HOME/bin/crs_setperm apache_ip -o root

$ORA_CRS_HOME/bin/crs_setperm apache_ip -u user:oracle10g:r-x

These two directives change the ownership of the resource (an IP should be managed by root) and the permission on who can execute the script (in my system oracle is the cluster owner).

Now as oracle user I replicate the changes on the other nodes:

scp /u01/app/oracle/product/10.2/crs_1/crs/public/* breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public/

Always as oracle I start the virtual ip:

crs_start apache_ip
Attempting to start `apache_ip` on member `breonldblc04`
Start of `apache_ip` on member `breonldblc04` succeeded.

On breonldblc04 I find the following ifconfig output:

eth1:2    Link encap:Ethernet HWaddr 00:08:02:1A:5E:12
          inet addr:192.168.23.20 Bcast:192.168.23.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

The resource is started.

Now a second step to cluster the apache daemon.

As oracle:

crs_profile -create apache -t application -B /usr/sbin/apachectl \
-d “Apache Server” -r apache_ip \
-p favored -h “breonldblc03 breonldblc04” \
-a apache.scr -o ci=30,ft=3,fi=12,ra=5

The syntax is a little bit more complex.
It is creating two files in your in $ORA_CRS_HOME/crs/public: the cap file and a apache.scr file containing the script used for the application. This script is generated by a standard template and can be modified even after the registration of the service.
The apache_ip resource is need by apache to run correctly so a dependency has been specified.

The basic command for the apache administration is /usr/sbin/apachectl and will be included in the apache.scr script.
The options ci=30,ft=3,fi=12,ra=5 indicate the timeouts are retries used bu the cluster before switching the application to another node while the line

-p favored -h “breonldblc03 breonldblc04”

indicates the policy to apply for the application placement on the nodes.
In the official documentation FAVORED sometimes is incorrectly referred as PREFFERED.
If you use a policy different from balanced you need to specify the list of nodes with the –h option.

Now modify the action script /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr

Personally I made only these three modifications:

PROBE_PROCS=”httpd”

START_APPCMD=”/usr/sbin/apachectl start”

STOP_APPCMD=”/usr/sbin/apachectl stop”

If you are satisfied by your cap file you can register the resource:

crs_register apache

And as root:

/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm apache -o root

/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm apache -u user:oracle10g:r-x

Now I prefer to change the apache.scr permission by hand:

chmod a+x /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr

adding the executable right to everyone. This solve me a problem: the script is run by a user different by oracle and I prefer not to change the ownership of the file to root.

It can be a security risk. Personally I handle the security on APPCMD but it can be questionable.

Copy the scripts and cap files on other nodes:

scp /u01/app/oracle/product/10.2/crs_1/crs/public/*
breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public/

and make sure the permission are correct everywhere (that’s really important or your application won’t be able to start).

ls -l /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
-rwxr-xr-x 1 oracle10g dba 13228 2005-07-28 18:01 /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr

Now, as oracle, you ca start your apache:

crs_start apache
Attempting to start `apache` on member `breonldblc04`
Start of `apache` on member `breonldblc04` succeeded.

You can switch the resource on the other node:

crs_relocate apache -f -c breonldblc03
Attempting to stop `apache` on member `breonldblc04`
Stop of `apache` on member `breonldblc04` succeeded.
Attempting to stop `apache_ip` on member `breonldblc04`
Stop of `apache_ip` on member `breonldblc04` succeeded.
Attempting to start `apache_ip` on member `breonldblc03`
Start of `apache_ip` on member `breonldblc03` succeeded.
Attempting to start `apache` on member `breonldblc03`
Start of `apache` on member `breonldblc03` succeeded.

The –f is needed since there are dependencies (apache_ip) while the –c is optional since I have only two nodes.

Using the second node:

Now, since I have a spare node, I decided to use it to provide another service: a nfs.

After installing the nfs tools on both nodes I decided to use the virtual name breonldblv05 for my new resource.

Two solution are available:
– to register a single cumulative resource containing the command for the mount point and for the nfs daemon;
– or to create two different resources, the mount point and the nfs daemon, with the latter dependant by the former.

Since I have only a mount point I went for the first and simpler solution.
If you have more complex and flexible needs you can go for the second solution.

I perform the previous steps for the virtual IP:

crs_profile -create nfs_ip -t application -a \
/u01/app/oracle/product/10.2/crs_1/bin/usrvip -o \ oi=eth1,ov=192.168.23.18,on=255.255.255.0

crs_register nfs_ip

scp /u01/app/oracle/product/10.2/crs_1/crs/public/* oracle10g@breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public

As root:

/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm nfs_ip -o root

/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm nfs_ip -u user:oracle10g:r-x

As oracle:

crs_start nfs_ip

On the breonldblc04 you should have the nfs virtual IP address.

Now a little nfs daemon setup.

In the /etc/exports of both nodes place:

/pub            *(ro,insecure,all_squash,async)

It export your /pub mount point in read only mode permitting to all anonymous user to read the data inside (in asynchronous mode).

While in the /etc/fstab:

/dev/oradata8_r/nfslv /pub                 ext3       noauto,ro             1 2

This line help me with the cluster starting and stopping script; /pub is not mounted automatically at boot time.

The /dev/oradata8_r/nfslv is the device to be mounted and should be shared between the two nodes.
You can even try a solution where the file system is local at the node and is kept synchronized by a home made solution. In this scenario you won’t have to mount and umount /pub during a failover.

Since the oracle clusterware needs a shared device I prefer a shared device for my nfs (in my solution I’m even using a LVM).
Before going on make sure you have a file system and a mount point as described in your fstab.

The command:

crs_profile -create pubnfs -t application -B /etc/init.d/nfsserver \
-d “Public NFS” -r nfs_ip \
-a pubnfs.scr -p favored -h “breonldblc03 breonldblc04” \
-o ci=30,ft=3,fi=12,ra=5

will create the script and configuration file for pubfs resource.
In my example I decided to be lazy and used the nfs init script since it is already there, ready for me.

My modification to /u01/app/oracle/product/10.2/crs_1/crs/public/ pubnfs.scr:

PROBE_PROCS=”nfsd”

START_APPCMD=”/bin/mount /pub”

START_APPCMD2=”/etc/init.d/nfsserver start”

STOP_APPCMD=”/etc/init.d/nfsserver stop”

STOP_APPCMD2=”/bin/umount /pub”

Since to mount a normal filesystem from two nodes can lead to a data corruption you can integrate a special check in your pubnfs.scr. Otherwise you can register your mount point as a cluster resource.
In this case you need to modify your scripts a little bit more, adding checks not only for the process daemon but even for the mount point.

This could be a starting point:

checkmount () {
    R=`mount|grep “on $1 type”|wc -l`
    return $R
}

The procedure checkmount could be used in probeapp.

Let’s be back at our previous example and at the usual steps:

crs_register pubnfs

As root:

/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm pubnfs -o root

/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm pubnfs -u user:oracle10g:r-x

as oracle:

chmod a+x /u01/app/oracle/product/10.2/crs_1/crs/public/ pubnfs.scr

scp /u01/app/oracle/product/10.2/crs_1/crs/public/* oracle10g@breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public

crs_start pubnfs

Check your whole system:

crs_stat -v -t
Name           Type           R/RA   F/FT   Target    State     Host
———————————————————————-
apache         application    0/5    0/3    ONLINE    ONLINE    breo…lc04
apache_ip     application    0/1    0/0    ONLINE    ONLINE    breo…lc04
nfs_ip         application    0/1    0/0    ONLINE    ONLINE    breo…lc03
ora….c03.gsd application    0/5    0/0    ONLINE    ONLINE    breo…lc03
ora….c03.ons application    0/3    0/0    ONLINE    ONLINE    breo…lc03
ora….c03.vip application    0/0    0/0    ONLINE    ONLINE    breo…lc03
ora….c04.gsd application    0/5    0/0    ONLINE    ONLINE    breo…lc04
ora….c04.ons application    0/3    0/0    ONLINE    ONLINE    breo…lc04
ora….c04.vip application    0/0    0/0    ONLINE    ONLINE    breo…lc04
pubnfs         application    0/5    0/3    ONLINE    ONLINE    breo…lc03

As showed by the above output my system is exporting the webserver service on the breonldblc04 and nfs on breonldblc03.
Make sure to point at your service using the right virtual ip.

The resources starting with ora. are reserved to the oracle clusterware and shouldn’t be managed directly without the oracle support.

Failover:

Now you can start testing the failover of your system.

oracle@breonldblc03:~> crs_stat apache
NAME=apache
TYPE=application
TARGET=ONLINE
STATE=ONLINE on breonldblc04

ps -fe|grep httpd
root     29459     1 0 10:44 ?        00:00:00 /usr/sbin/httpd
wwwrun   29463 29459 0 10:44 ?        00:00:00 /usr/sbin/httpd
wwwrun   29665 29459 0 10:44 ?        00:00:00 /usr/sbin/httpd
root      8144 24437 0 15:16 pts/0    00:00:00 grep httpd

kill the daemon:

kill -9 29459 29463 29665

The cluster should restart the httpd after several seconds.

ps -fe|grep httpd
root      9560     1 0 15:16 ?        00:00:00 /usr/sbin/httpd
wwwrun    9566 9560 0 15:16 ?        00:00:00 /usr/sbin/httpd
root     11475 24437 0 15:18 pts/0    00:00:00 grep httpd

Apache resource was set to switch to the other nodes after five restarts.
Kill the processes five times and I apache is going to migrate:

oracle@breonldblc03:~> crs_stat apache
NAME=apache
TYPE=application
TARGET=ONLINE
STATE=ONLINE on breonldblc03

For normal administration the three main command are crs_start, crs_stop, crs_relocate.

If you have a starting issue and your application is in UNKNOWN state than you can clear it by using:

crs_stop apache -f

The command:

crs_stop –all

is useful if you wish to stop the whole cluster.

crs_stat -t -v -v
Name           Type           R/RA   F/FT   Target    State     Host
———————————————————————-
apache         application    0/2    0/3    OFFLINE   OFFLINE
apache_ip      application    0/1    0/0    OFFLINE   OFFLINE
nfs_ip         application    0/1    0/0    OFFLINE   OFFLINE
ora….c03.gsd application    0/5    0/0    OFFLINE   OFFLINE
ora….c03.ons application    0/3    0/0    OFFLINE   OFFLINE
ora….c03.vip application    0/0    0/0    OFFLINE   OFFLINE
ora….c04.gsd application    0/5    0/0    OFFLINE   OFFLINE
ora….c04.ons application    0/3    0/0    OFFLINE   OFFLINE
ora….c04.vip application    0/0    0/0    OFFLINE   OFFLINE
pubnfs         application    0/5    0/3    OFFLINE   OFFLINE

The same option exists for crs_start.

Conclusions:
Using the techniques described in this paper you secure your own application with an HA solution. Adding nodes and applications is not an issue at all and can be described in further papers.