Clustering third party application with Oracle 10g Clusterware

With oracle 10gR2 oracle decided to open and publish the API of it’s clusterware. This permits to third parti application to be registered in the oracle cluster layer or to develop your own high-availability (HA) solution.
For the howto install step of the oracle cluster you can refer to this document (place here the link).
The 10gR2 allow more than one application to coexist on the same cluster, maybe sharing your RAC nodes.
Just before you ask: you can have the oracle cluster without RAC. In fact you can decide not to install the RAC at all and to go for the clusterware only.
Before you decide to cluster your application with this product, you need to be aware that the oracle cluster needs a shared disk where to place the voting disk and the cluster registry.
It is quite easy to have a SAN in an enterprise environment… not so in small companies.
As described here ( there are viable alternatives.
The test system I set up was simple: two linux (SUSE Linux Enterprise Server 9) nodes connected to a SAN.
The cluster registry and voting disk are on raw devices.
The RDBMS binaries are not installed.
A practical example:
I decided to implement a simple webserver and to cluster it retiring my old heartbeat + MON solution.
The two nodes have this network configuration:

Public IP address192.168.23.191192.168.23.192
Virtual namebreonldblv03breonldblv04
Virtual IP192.168.23.196192.168.23.19
Private nameinternal1Internal2
Private IP192.168.255.1192.168.255.2

My /etc/hosts looks as follow:       localhost
# special IPv6 addresses
::1             localhost ipv6-localhost ipv6-loopback
fe00::0         ipv6-localnet
ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts  breonldblc03.ran breonldblc03  breonldblc04.ran breonldblc04   breonldblv02.ran breonldblv02  breonldblv03.ran breonldblv03   breonldblv04.ran breonldblv04   breonldblv05.ran breonldblv05   internal1.ras    internal1   internal2.ras    internal2
Where ran is the internal domain of my company.
You can see two other virtual IP breonldblv02 and breonldblv05. They will be used by my applications.
I installed apache on both nodes. At this point I bind the webserver to listen on a specific network card (eth1) using the virtual address breonldblv05.
In your /etc/httpd/httpd.conf insert:
# Use name-based virtual hosting.
The basic step is to create scripts and configuration files which will be used to register your application in the cluster.
Oracle provide you the command crs_profile to simplify the process using templates.
Since apache is network based I’m going to create a resource based on the listening virtual IP.
With the oracle user issue the command:
crs_profile -create apache_ip -t application -a \
/u01/app/oracle/product/10.2/crs_1/bin/usrvip -o \
This will create a apache_ip.cap file in $ORA_CRS_HOME/crs/public containing the parameters used by the next phase: the registration.
Check the file exists:
oracle10g@breonldblc03:/u01/app/oracle/product/10.2/crs_1/crs/public> ll apache_ip*
-rw-r–r–  1 oracle10g dba 799 2005-07-28 17:47 apache_ip.cap
The content of the file describe the configuration of your resource called apache_ip.
You can modify it at will before registering the resource into the cluster.
cat apache_ip.cap
The ACTION_SCRIPT=/u01/app/oracle/product/10.2/crs_1/bin/usrvip specify which script to use for starting, stopping and checking your application (in the example the IP address).
The options: oi=eth1,ov=,on= specify which ethernet card to use, the ip address and the netmask.
All the parameters of the configuration file will be parsed by the crs_registry command giving you an error messages is a problem is found.
After the changes you can register the application:
crs_register apache_ip
For any modification at the configuration file you are going to issue the command:
crs_register apache -u -dir /u01/app/oracle/product/10.2/crs_1/crs/public
to update immediately the new configuration in the cluster.
The example assume that the apache.cap files is in the directory /u01/app/oracle/product/10.2/crs_1/crs/public.
Several other steps are required after the registration. As root:
$ORA_CRS_HOME/bin/crs_setperm apache_ip -o root
$ORA_CRS_HOME/bin/crs_setperm apache_ip -u user:oracle10g:r-x
These two directives change the ownership of the resource (an IP should be managed by root) and the permission on who can execute the script (in my system oracle is the cluster owner).
Now as oracle user I replicate the changes on the other nodes:
scp /u01/app/oracle/product/10.2/crs_1/crs/public/* breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public/
Always as oracle I start the virtual ip:
crs_start apache_ip
Attempting to start `apache_ip` on member `breonldblc04`
Start of `apache_ip` on member `breonldblc04` succeeded.
On breonldblc04 I find the following ifconfig output:
eth1:2    Link encap:Ethernet  HWaddr 00:08:02:1A:5E:12
          inet addr:  Bcast:  Mask:
The resource is started.
Now a second step to cluster the apache daemon.
As oracle:
crs_profile -create apache -t application -B /usr/sbin/apachectl \
-d “Apache Server” -r apache_ip \
-p favored -h “breonldblc03 breonldblc04” \
-a apache.scr -o ci=30,ft=3,fi=12,ra=5
The syntax is a little bit more complex.
It is creating two files in your in $ORA_CRS_HOME/crs/public: the cap file and a apache.scr file containing the script used for the application. This script is generated by a standard template and can be modified even after the registration of the service.
The apache_ip resource is need by apache to run correctly so a dependency has been specified.
The basic command for the apache administration is /usr/sbin/apachectl and will be included in the apache.scr script.
The options ci=30,ft=3,fi=12,ra=5 indicate the timeouts are retries used bu the cluster before switching the application to another node while the line
-p favored -h “breonldblc03 breonldblc04”
indicates the policy to apply for the application placement on the nodes.
In the official documentation FAVORED sometimes is incorrectly referred as  PREFFERED.
If you use a policy different from balanced you need to specify the list of nodes with the –h option.
Now modify the action script /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
Personally I made only these three modifications:
START_APPCMD=”/usr/sbin/apachectl start”
STOP_APPCMD=”/usr/sbin/apachectl stop”
If you are satisfied by your cap file you can register the resource:
crs_register apache
And as root:
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm apache -o root
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm apache -u user:oracle10g:r-x
Now I prefer to change the apache.scr permission by hand:
chmod  a+x /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
adding the executable right to everyone. This solve me a problem: the script is run by a user different by oracle and I prefer not to change the ownership of the file to root.
It can be a security risk. Personally I handle the security on APPCMD but it can be questionable.
Copy the scripts and cap files on other nodes:
scp /u01/app/oracle/product/10.2/crs_1/crs/public/*
and make sure the permission are correct everywhere (that’s really important or your application won’t be able to start).
ls -l /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
-rwxr-xr-x  1 oracle10g dba 13228 2005-07-28 18:01 /u01/app/oracle/product/10.2/crs_1/crs/public/apache.scr
Now, as oracle, you ca start your apache:
crs_start apache
Attempting to start `apache` on member `breonldblc04`
Start of `apache` on member `breonldblc04` succeeded.
You can switch the resource on the other node:
crs_relocate apache -f -c breonldblc03
Attempting to stop `apache` on member `breonldblc04`
Stop of `apache` on member `breonldblc04` succeeded.
Attempting to stop `apache_ip` on member `breonldblc04`
Stop of `apache_ip` on member `breonldblc04` succeeded.
Attempting to start `apache_ip` on member `breonldblc03`
Start of `apache_ip` on member `breonldblc03` succeeded.
Attempting to start `apache` on member `breonldblc03`
Start of `apache` on member `breonldblc03` succeeded.
The –f is needed since there are dependencies (apache_ip) while the –c is optional since I have only two nodes.
Using the second node:
Now, since I have a spare node, I decided to use it to provide another service: a nfs.
After installing the nfs tools on both nodes I decided to use the virtual name breonldblv05 for my new resource.
Two solution are available:
– to register a single cumulative resource containing the command for the mount point and for the nfs daemon;
– or to create two different resources, the mount point and the nfs daemon, with the latter dependant by the former.
Since I have only a mount point I went for the first and simpler solution.
If you have more complex and flexible needs you can go for the second solution.
I perform the previous steps for the virtual IP:
crs_profile -create nfs_ip -t application -a \
/u01/app/oracle/product/10.2/crs_1/bin/usrvip -o \ oi=eth1,ov=,on=
crs_register nfs_ip
scp /u01/app/oracle/product/10.2/crs_1/crs/public/* oracle10g@breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public
As root:
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm nfs_ip -o root
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm nfs_ip -u user:oracle10g:r-x
As oracle:
crs_start nfs_ip
On the breonldblc04 you should have the nfs virtual IP address.
Now a little nfs daemon setup.
In the /etc/exports of both nodes place:
/pub            *(ro,insecure,all_squash,async)
It export your /pub mount point in read only mode permitting to all anonymous user to read the data inside (in asynchronous mode).
While in the /etc/fstab:
/dev/oradata8_r/nfslv /pub                 ext3       noauto,ro             1 2
This line help me with the cluster starting and stopping script; /pub is not mounted automatically at boot time.
The /dev/oradata8_r/nfslv is the device to be mounted and should be shared between the two nodes.
You can even try a solution where the file system is local at the node and is kept synchronized by a home made solution. In this scenario you won’t have to mount and umount /pub during a failover.
Since the oracle clusterware needs a shared device I prefer a shared device for my nfs (in my solution I’m even using a LVM).
Before going on make sure you have a file system and a mount point as described in your fstab.
The command:
crs_profile -create pubnfs -t application -B /etc/init.d/nfsserver \
-d “Public NFS”  -r nfs_ip \
-a pubnfs.scr -p favored -h “breonldblc03 breonldblc04” \
-o ci=30,ft=3,fi=12,ra=5
will create the script and configuration file for pubfs resource.
In my example I decided to be lazy and used the nfs init script since it is already there, ready for me.
My modification to /u01/app/oracle/product/10.2/crs_1/crs/public/ pubnfs.scr:
START_APPCMD=”/bin/mount /pub”
START_APPCMD2=”/etc/init.d/nfsserver start”
STOP_APPCMD=”/etc/init.d/nfsserver stop”
STOP_APPCMD2=”/bin/umount /pub”
Since to mount a normal filesystem from two nodes can lead to a data corruption you can integrate a special check in your pubnfs.scr. Otherwise you can register your mount point as a cluster resource.
In this case you need to modify your scripts a little bit more, adding checks not only for the process daemon but even for the mount point.
This could be a starting point:
checkmount () {
    R=`mount|grep “on $1 type”|wc -l`
    return $R
The procedure checkmount could be used in probeapp.
Let’s be back at our previous example and at the usual steps:
crs_register pubnfs
As root:
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm pubnfs -o root
/u01/app/oracle/product/10.2/crs_1/bin/crs_setperm pubnfs -u user:oracle10g:r-x
as oracle:
chmod  a+x /u01/app/oracle/product/10.2/crs_1/crs/public/ pubnfs.scr
scp /u01/app/oracle/product/10.2/crs_1/crs/public/* oracle10g@breonldblc04.ras:/u01/app/oracle/product/10.2/crs_1/crs/public
crs_start pubnfs
Check your whole system:
crs_stat -v -t
Name           Type           R/RA   F/FT   Target    State     Host
apache         application    0/5    0/3    ONLINE    ONLINE    breo…lc04
apache_ip      application    0/1    0/0    ONLINE    ONLINE    breo…lc04
nfs_ip         application    0/1    0/0    ONLINE    ONLINE    breo…lc03
ora….c03.gsd application    0/5    0/0    ONLINE    ONLINE    breo…lc03
ora….c03.ons application    0/3    0/0    ONLINE    ONLINE    breo…lc03
ora… application    0/0    0/0    ONLINE    ONLINE    breo…lc03
ora….c04.gsd application    0/5    0/0    ONLINE    ONLINE    breo…lc04
ora….c04.ons application    0/3    0/0    ONLINE    ONLINE    breo…lc04
ora… application    0/0    0/0    ONLINE    ONLINE    breo…lc04
pubnfs         application    0/5    0/3    ONLINE    ONLINE    breo…lc03
As showed by the above output my system is exporting the webserver service on the breonldblc04 and nfs on breonldblc03.
Make sure to point at your service using the right virtual ip.
The resources starting with ora. are reserved to the oracle clusterware and shouldn’t be managed directly without the oracle support.
Now you can start testing the failover of your system.
oracle@breonldblc03:~> crs_stat apache
STATE=ONLINE on breonldblc04
ps -fe|grep httpd
root     29459     1  0 10:44 ?        00:00:00 /usr/sbin/httpd
wwwrun   29463 29459  0 10:44 ?        00:00:00 /usr/sbin/httpd
wwwrun   29665 29459  0 10:44 ?        00:00:00 /usr/sbin/httpd
root      8144 24437  0 15:16 pts/0    00:00:00 grep httpd
kill the daemon:
kill -9 29459 29463 29665
The cluster should restart the httpd after several seconds.
ps -fe|grep httpd
root      9560     1  0 15:16 ?        00:00:00 /usr/sbin/httpd
wwwrun    9566  9560  0 15:16 ?        00:00:00 /usr/sbin/httpd
root     11475 24437  0 15:18 pts/0    00:00:00 grep httpd
Apache resource was set to switch to the other nodes after five restarts.
Kill the processes five times and I apache is going to migrate:
oracle@breonldblc03:~> crs_stat apache
STATE=ONLINE on breonldblc03
For normal administration the three main command are crs_start, crs_stop, crs_relocate.
If you have a starting issue and your application is in UNKNOWN state than you can clear it by using:
crs_stop apache -f
The command:
crs_stop –all
is useful if you wish to stop the whole cluster.
crs_stat -t -v -v
Name           Type           R/RA   F/FT   Target    State     Host
apache         application    0/2    0/3    OFFLINE   OFFLINE
apache_ip      application    0/1    0/0    OFFLINE   OFFLINE
nfs_ip         application    0/1    0/0    OFFLINE   OFFLINE
ora….c03.gsd application    0/5    0/0    OFFLINE   OFFLINE
ora….c03.ons application    0/3    0/0    OFFLINE   OFFLINE
ora… application    0/0    0/0    OFFLINE   OFFLINE
ora….c04.gsd application    0/5    0/0    OFFLINE   OFFLINE
ora….c04.ons application    0/3    0/0    OFFLINE   OFFLINE
ora… application    0/0    0/0    OFFLINE   OFFLINE
pubnfs         application    0/5    0/3    OFFLINE   OFFLINE
The same option exists for crs_start.
Using the techniques described in this paper you secure your own application with an HA solution. Adding nodes and applications is not an issue at all and can be described in further papers.

, , ,