Oracle RAC 10g x86 on SUSE LINUX Enterprise Server 9 x86


Oracle has recently released a new 10g RDBMS version with the first service pack integrated: 10.1.0.3. It gives better support for SuSE and solves some installation issues.
In addition, SuSE has released a new orarun version which makes the installation process a little bit easier.
The plain RDBMS installation is the smoothest I have performed so far.

Unfortunately, RAC is a little bit more complex and requires some additional steps and some architectural decisions to be made beforehand.

First of all you need at least two machine (physical or virtual) and a shared common storage to which they both have equal access.
If you only wish to set up a test environment I suggest WMWare ESX (or the GSX if you have only a client).
Otherwise shared storage accessible over a storage area network (SAN) would be the best solution for a production environment.

There is even the possibility of using NFS, but I would discourage it… too unreliable (ok, you could use it in a cheap testing setup, but do not try it on a production system!).
For a reliable production setup using NFS, preferably use Oracle RAC with Network Appliance NFS servers. There is even a Filer Simulator available for download from Network Appliance.
I will cover the installation of the 10.1.0.3 version which can be downloaded from OTN. I will describe an installation using a common storage on a SAN. The storage will be managed by oracle Automatic Storage Management (ASM).
You have other choices: oracle cluster files system (ocfs) which is supported in the default SLES9 kernel, raw devices or a third party (and perhaps unsupported) common shared storage.

You need two components: ship.db.lnx32.cpio.gz (the DB installation),
and the cluster service ship.crs.lnx32.cpio.gz.

The latter should be installed before the former, in other words please install the cluster services component first.

The use of ASM was chosen for simplicity and for testing purposes (I lack experience on ASM). ASM can be a good replacement for linux’s logical volume management (LVM) or device mapper (DM). However, when using ASM the basic I/O layer still depends on having direct access to raw disk partitions (you’ll see why it’s important to understand this later on).

After installing the basic system make sure you have the libX, libaio, compat, libaio-devel and openmotif (even the 32 bit version).
make-3.80-184.1
gcc-3.3.3-43.24
compat-2004.7.1-1.2
XFree86-libs-4.3.99.902-43.22
libaio-devel-0.3.98-18.4
libaio-0.3.98-18.4
openmotif-libs-2.2.2-519.1
openmotif-2.2.2-519.1

Installing the orarun package will make your installation easier so I recommend installing it. However, read the notes below before installing it.
Last version is orarun-1.8-109.5 which can be downloaded from the suse website (actually their ftp).

Note on gcc:
gcc_old-2.95.3-11 is not actually necessary (as described on some website). On the contrary the linking phase needs gcc 3.x!!!
So be warned: if you are going to install older gcc for any purpose, make sure that oracle looks for the 3.x version during the relink.

Note on orarun:
Orarun is a useful package which can simplify the preinstallation part. The new orarun checks for gcc_old but does not depend on it anymore.

The operations from here on are to be performed on every node:
 

linux: # rpm -Uvh orarun-1.8-109.5.i686.rpm

The orarun package addresses the “infamous” orainstaller issue, which manifests itself with the following error message when invoking the oracle Univerasl Installer using runInstaller:

Unable to load native library: /tmp/OraInstall2004-02-24_10-40-59AM/jre/lib/i386/libjava.so: symbol __libc_wait, version GLIBC_2.0 not defined in file libc.so.6 with link time reference.

You no longer need to install (or create by yourself) the patch #3006854 for __libc_wait.

Instead, simply modify the /etc/profile.d/oracle.sh as follows, adding:

export LD_PRELOAD=/usr/lib/libInternalSymbols.so
LD_ASSUME_KERNEL=2.4.21

Setting LD_PRELOAD in this way will help in solving the above issue.

Create the directory tree for the oracle installation (look at the standard OFA): the default is /opt/oracle/product/10g/db_1.
I prefer /u01/app/oracle/product/10g/db_1
 

linux: # mkdir -p /u01/app/oracle/product/10g/db_1

linux: # mkdir /u01/app/oracle/product/10g/crs

Make sure to change the ownership of the tree with chown (the owner should be the oracle user and the group should be the oinstall group).
 

linux: # chown -r oracle:oinstall /u01/app/oracle

Now you can modify some files in /etc:
 

  • /etc/passwd: change the shell for the oracle user created by orarun (default is /bin/false);
  • /etc/group: oracle user should belong to dba and oinstall;
  • /ets/sysconfig/oracle for ORACLE_BASE, ORACLE_HOME, ORACLE_SID and several kernel parameters plus the starting parameter for the oracle script in /etc/init.d (useful during machine boot).
  • /etc/profile.d/oracle.sh (or oracle.csh depending on the shell you chose above). Make sure you set LD_ASSUME_KERNEL=’2.4.21′ as described above (other values could be used: read the paper by Ulrich Drepper at http://people.redhat.com/drepper/assumekernel.html).

Note: I attended a Red Hat workshop about “RAC installation on redhat AS 3”. It helped me to gain experience to perform the installation on SLES9 more easily.
In that workshop they advised me not to set the other environment variables and to keep only the ORACLE_BASE environment setting.
It seems, on Red Hat, you can’t complete the installation properly otherwise… I was able to set the variable on SuSE without problems (it helps to solve relinking issues).
You are free to follow your own judgment for the best installation.

The operating systems of each node need to be configured in preparation for a RAC installation.
Modify the /etc/hosts (I suggest you to do this even if you have a DNS) inserting all the definition for the nodes. Here is an example:

———————————————————————————————

127.0.0.1       localhost

# special IPv6 addresses
::1             localhost ipv6-localhost ipv6-loopback

fe00::0         ipv6-localnet

ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts

192.168.24.61   sles9rac2.ras sles9rac2
192.168.24.60   sles9rac1.ras sles9rac1

192.168.24.63   sles9rac2-vip.ras sles9rac2-vip
192.168.24.62   sles9rac1-vip.ras sles9rac1-vip

192.168.255.2   rac2-int.ras rac2-int
192.168.255.1   rac1-int.ras rac1-int

————————————————————————————————

You need two NICs for node: one for public connections while the others for the interconnect.
The virtual ip has to be set but it is not associated with any physical adapter yet. The configuration will be performed later by oracle.

All the configuration on each of the nodes should be identical. I suggest you transfer the hosts file using scp instead of simply cutting and pasting the entries. Then change the permission on the file:
 

linux: # chmod u-w /etc/hosts

otherwise you could have problems when changing the network configuration using yast.
Restart your network services with:
 

linux: # /etc/init.d/networking restart

The above part is important: without this, you risk having the installation stop while attempting to perform its tasks on each remote node.

Now you need to set the ssh properly for the oracle user.

Go in the oracle user home (on SuSE, by default, it is /opt/oracle).
Create the .ssh directory
 

linux: # mkdir .ssh

Then you need to generate a couple of private and public keys for ssh. This is the first step in generating the ssh configuration which is going to allow the installation to be performed on every node at once.

Below is an example taken from a system of mine. I gave no passphrase.
 

oracle@sles9rac2:~> ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/opt/oracle/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /opt/oracle/.ssh/id_rsa.
Your public key has been saved in /opt/oracle/.ssh/id_rsa.pub.
The key fingerprint is:
38:d3:7f:57:38:63:f4:94:9b:e3:38:7b:f7:77:13:ac oracle@sles9rac2

You can also choose to use a different protocol (example: -t dsa).

Now you have two files: id_rsa and id_rsa.pub.
The first is the private key (to guard closely) while the second is the public key which should be shared by all nodes.

After you have generated all the public keys for every node you have to insert them in a file called authorized_keys2.
(you can copy them remotely to a single node and then accumulate them using ‘cat’ into a single authorized_keys2 file).

Example:
 

oracle@sles9rac2:~> cat id_rsa.pub >> authorized_keys2

You need to end up with a file containing all the keys of all the nodes.
Copy over that file to each node, placing it in the $HOME/.ssh directory.

Another solution is to generate only one pair of keys on one node and insert the public key into authorized_keys2 as described above. Then you can copy the three files (id_rsa.pub, id_rsa, authorized_keys2) over to every $HOME/.ssh directory on each node.

Now, for every node you have to connect to the other using all the private and public name used in /etc/hosts (with and without domain).
Reply ‘yes’ to every question and make sure that you are no longer prompted for a password.
At every second try with the same connection you shouldn’t receive any message or request. You need to be immediately authenticated and presented with a shell prompt for the oracle installation to proceed smoothly.

Warning!!!!!

If you can be authenticated without password or any other request but if an output (or a warning) is shown then oracle will interpret that as an error, stopping the installation. So, solve any related issue/warning before going  ahead.

Here, you can see an example of the messages shown when establishing the initial ssh connections:
 

oracle@sles9rac1:~/.ssh> ssh oracle@192.168.255.1
The authenticity of host ‘192.168.255.1 (192.168.255.1)’ can’t be established.
RSA key fingerprint is 4c:70:d1:4c:6c:71:5c:19:a6:87:14:38:e5:f7:7f:51.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘192.168.255.1’ (RSA) to the list of known hosts.
Last login: Wed Nov 10 16:18:43 2004 from 192.168.255.2
oracle@sles9rac1:~> exit
logout
Connection to 192.168.255.1 closed.
oracle@sles9rac1:~/.ssh> ssh oracle@192.168.255.2
The authenticity of host ‘192.168.255.2 (192.168.255.2)’ can’t be established.
RSA key fingerprint is 4c:70:d1:4c:6c:71:5c:19:a6:87:14:38:e5:f7:7f:51.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘192.168.255.2’ (RSA) to the list of known hosts.
Last login: Wed Nov 10 16:20:14 2004 from 192.168.24.60
oracle@sles9rac2:~> exit
logout
Connection to 192.168.255.2 closed.

Last step before starting:
You need to configure the common shared storage. I used ASM so I needed at least three raw devices. The first one for the quorum disk (of at least 200MB), a voting disk (also 200MB) and disk(s) to be managed by ASM.

In /etc/raw insert the raw name and the block device to be bound to:

example:

raw1:sdb1
raw2:sdb2
raw3:sdb3

Now start the raw service:
 

oracle@sles9rac2:~> /etc/init.d/raw start

From the manual: you should change ownership and permissions:
 

oracle@sles9rac2:~> chown oracle:dba /dev/raw/raw1
oracle@sles9rac2:~> chown oracle:dba /dev/raw/raw2
oracle@sles9rac2:~> chown oracle:dba /dev/raw/raw3
oracle@sles9rac2:~> chmod 660 /dev/raw/raw1
oracle@sles9rac2:~> chmod 660 /dev/raw/raw2
oracle@sles9rac2:~> chmod 660 /dev/raw/raw3

On SuSE the Oracle user (the one installed by orarun) is part of  group disk having the rights to read and write on raw devices.
Just to be sure: check if your oracle user is part of disk (if not add it editing your /etc/groups or using yast), and try to read (and, on a not yet used device, write) on a couple fo raw devices.

oracle@sles9rac2:~> id
uid=100(oracle) gid=102(dba) groups=6(disk),101(oinstall),102(dba),103(oper)

oracle@sles9rac2:~> dd if=/dev/raw/raw1 of=/tmp/foo bs=4096 count=8
8+0 records in
8+0 records out

Now the system has been preconfigured. You only need to unpack the downloaded oracle engines and install them:

oracle@sles9rac2:~> gunzip ship.crs.lnx32.cpio.gz
oracle@sles9rac2:~> cpio -imdv ship.crs.lnx32.cpio

You are ready to install the oracle cluster service:

  • if you are in a remote machine make sure your X server is running and export the DISPLAY: export DISPLAY=<your local IP>:0.0;
  • launch runInstaller from Disk1 directory with the command “./runInstaller”


I’m adding some images which can increase the clarity of the next steps:

Change the destination to the crs directory created earlier.

setting the home for crs

Select orainstall as group for performing installations and launch the required script as root.

This is a critical step. You need to insert the name of the public and private nodes and the cluster name.
The names have to be identical to the ones listed in /etc/hosts.

Then simply launch the script as root on every node.

If everything went fine you are ready for the database install:

oracle@sles9rac2:~> gunzip ship.db.lnx32.cpio.gz
oracle@sles9rac2:~> cpio -imdv ship.db.lnx32.cpio

Launch the unpacked runInstaller from Disk1 directory and perform the usual steps.
The destination home needs to be the HORACLE_HOME.

A check of the installed packages is performed.

Group for installation
Third

Later you have to lanch another scripts as root on all nodes. Before doing it you need to export the DISPLAY if you are remotely installing the components.

Fourth

The script will open a new window. Deselect the private interface and carry on:

Insert the VIP (vritual IP) with the same definition as listed in /etc/hosts.

You can skip the configuration assistant for the listener.

This concludes the installation of the database software.
Link the existing oratab to the one needed by oracle (from root):

linux: # ln -s /etc/oratab /var/opt/oracle/oratab
Now you only need to create your database.

Notes on cssd and ASM

If you are using ASM the default configuration is wrong and after a reboot you could get a: ORA-29702 or ORA-29701.

In /etc/oratab set at Y the DB you wish to be started automatically:

*:/u01/app/oracle/product/10.1/db_1:N
+ASM:/u01/app/oracle/product/10.1/db_1:Y
PITIA:/u01/app/oracle/product/10.1/db_1:Y

Then in /etc/inittab move the cssd line before the init 3 servicing:

l0:0:wait:/etc/init.d/rc 0
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
# Starting Cluster Deamon for ASM
h1:35:respawn:/etc/init.d/init.cssd run >/dev/null 2>&1 </dev/null
l3:3:wait:/etc/init.d/rc 3
#l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/rc 6

(make sure you don’t have two init.cssd lines).

Now you can test the reboot (in /etc/sysconfig/oracle you need to decide which components to start on reboot).

Have fun!

, , , ,