This post may not be relevant to the majority of people out there but I didn’t find a lot of useful documentation out there regarding RAC 11.2 on OCFS2 installations. And also I am in desparate need of my own test environment! Oracle Cluster File System version 2 is a promising cluster file system on Linux, and I stand corrected about the state of development (see also comment at the bottom of the post!), ocfs2 file system OCFS2 1.4.4-1 dates from 2009.09.25.
The reason for evaluating this combination is that I am constrained on disk space and memory so saving a little bit of RAN by not having to have a set of ASM instances is good for me (I hope). Also I’d like to try having the RDBMS binaries setup as a shared home-a configuration I haven’t used before.
So here are the key facts about my environment:
- dom0: OpenSuSE 11.2 kernel 2.6.31.12-0.1-xen x86-64
- domU: Oracle Enterprise Linux 5 update 4 x86-64
I created the domUs using virt-manager, an openSuSE supplied tool. Unfortunately it can’t set the “shareable” attribute to the shared storage (libvirt’s equivalent to the exclamation mark in the disk= directive) so I have to do this manually. “virsh edit <domain>” is a blessing-it is no more necessary to do the four-step
- “virsh dumpxml <domain> > /tmp/domain.xml
- virsh undefine <domain>
- vi /tmp/domain.xml
- virsh define /tmp/domain.xml
How nice is that! For reference, here are the dumps of my VMs while they were running:
<domain type='xen' id='23'>
<name>node1</name>
<uuid>7be3202c-f684-e594-b377-0395f0601be2</uuid>
<memory>1572864</memory>
<currentMemory>1048576</currentMemory>
<vcpu>2</vcpu>
<bootloader>/usr/bin/pygrub</bootloader>
<bootloader_args>-q</bootloader_args>
<os>
<type>linux</type>
<cmdline> </cmdline>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/lib64/xen/bin/qemu-dm</emulator>
<disk type='file' device='disk'>
<driver name='file'/>
<source file='/m/xen/node1/disk0'/>
<target dev='xvda' bus='xen'/>
</disk>
<disk type='file' device='disk'>
<driver name='file'/>
<source file='/m/xen/node1/oracle'/>
<target dev='xvdb' bus='xen'/>
</disk>
<disk type='block' device='disk'>
<driver name='phy'/>
<source dev='/dev/mapper/root_vg-rhel5_shared_rdbms'/>
<target dev='xvdc' bus='xen'/>
<shareable/>
</disk>
<disk type='block' device='disk'>
<driver name='phy'/>
<source dev='/dev/mapper/root_vg-rhel5_shared_data'/>
<target dev='xvdd' bus='xen'/>
<shareable/>
</disk>
<interface type='bridge'>
<mac address='00:16:3e:10:99:20'/>
<source bridge='br1'/>
<script path='/etc/xen/scripts/vif-bridge'/>
<target dev='vif23.0'/>
</interface>
<interface type='bridge'>
<mac address='00:16:3e:10:64:20'/>
<source bridge='br2'/>
<script path='/etc/xen/scripts/vif-bridge'/>
<target dev='vif23.1'/>
</interface>
<console type='pty' tty='/dev/pts/5'>
<source path='/dev/pts/5'/>
<target port='0'/>
</console>
<input type='mouse' bus='xen'/>
<graphics type='vnc' port='5900' autoport='yes'/>
</devices>
</domain>
<domain type='xen'>
<name>node2</name>
<uuid>7be3302c-f684-e594-b377-0395f0601be2</uuid>
<memory>1572864</memory>
<currentMemory>1048576</currentMemory>
<vcpu>2</vcpu>
<bootloader>/usr/bin/pygrub</bootloader>
<bootloader_args>-q</bootloader_args>
<os>
<type>linux</type>
</os>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/lib64/xen/bin/qemu-dm</emulator>
<disk type='file' device='disk'>
<driver name='file'/>
<source file='/m/xen/node2/disk0'/>
<target dev='xvda' bus='xen'/>
</disk>
<disk type='file' device='disk'>
<driver name='file'/>
<source file='/m/xen/node2/oracle'/>
<target dev='xvdb' bus='xen'/>
</disk>
<disk type='block' device='disk'>
<driver name='phy'/>
<source dev='/dev/mapper/root_vg-rhel5_shared_rdbms'/>
<target dev='xvdc' bus='xen'/>
<shareable/>
</disk>
<disk type='block' device='disk'>
<driver name='phy'/>
<source dev='/dev/mapper/root_vg-rhel5_shared_data'/>
<target dev='xvdd' bus='xen'/>
<shareable/>
</disk>
<interface type='bridge'>
<mac address='00:16:3e:10:99:30'/>
<source bridge='br1'/>
<script path='/etc/xen/scripts/vif-bridge'/>
</interface>
<interface type='bridge'>
<mac address='00:16:3e:10:64:30'/>
<source bridge='br2'/>
<script path='/etc/xen/scripts/vif-bridge'/>
</interface>
<console type='pty' tty='/dev/pts/5'>
<source path='/dev/pts/5'/>
<target port='0'/>
</console>
<input type='mouse' bus='xen'/>
<graphics type='vnc' port='5900' autoport='yes'/>
</devices>
</domain>
The important bits are the static MACs and the <shareable/> attribute. The bridges br1 and br2 are host only bridges I defined through yast (again, see my previous post about xen based virtualisation on opensuse for more information about the beauty of host only networks and their definition through yast)
The storage I have come up with will be used as follows:
- xvda -> root fs, used during installation
- xvdb -> /u01 for Oracle Grid Infrastructure (can’t be on a shared home anymore)
- xvdc -> /u02 ocfs2 for shared rdbms home
- xvdd -> /u03 ocfs2 for ocr/votedisk & database files
Start by defining and starting node1-the installation of a RHEL/OEL guest VM is described in https://martincarstenbach.wordpress.com/2010/01/15/xen-based-virtualisation-with-opensuse-11-2/. I chose “software installation” during the anaconda configuration screens and also chose “configure later”. This will save you a little bit of time coming up with all these dependencies. Then start the installation; once complete set SELinux to permissive or disable it (there is still a lot of work around getting the Oracle software to work with SE Linux!) and turn off the firewall unless you know very well what you are doing.
OCFS2 software
I downloaded the following software for ocfs2, regardless of the fact that OEL might come with it. All of the software is from oss.oracle.com:
- ocfs2-2.6.18-164.el5xen-1.4.4-1.el5.x86_64.rpm
- ocfs2-tools-1.4.3-1.el5.x86_64.rpm
- ocfs2-tools-devel-1.4.3-1.el5.x86_64.rpm
- ocfs2console-1.4.3-1.el5.x86_64.rpm
In addition I got oracle-validated-1.0.0-18.el5.x86_64.rpm as well to make the setup easier. Install the software including all dependencies, lib-db2.so.2 is actually from compat-db4 in case you wondered like I did.
Set up the domUs
I changed my /etc/hosts to list all the hosts in my 2 node cluster:
127.0.0.1 localhost ::1 localhost6.localdomain6 localhost6 192.168.99.20 node1.localdomain node1 192.168.99.21 node1v.localdomain node1v 192.168.100.20 node1p.localdomain node1p 192.168.99.30 node2.localdomain node2 192.168.99.31 node2v.localdomain node2v 192.168.100.30 node2p.localdomain node2p 192.168.99.25 scanocfs2.localdomain scanocfs2
Note that putting the scan address into /etc/hosts is a Bad Thing but since I don’t have a choice I’ll have to do it anyway. BTW the majority of system requirements isn’t satisfied on my small box… Make sure cluvfy doesn’t complain when you do this for real!
I then partitioned the remaining disks to xvdb1/xvdd1 and xvde1, 1 partition spanning the whole disk for each virtual hard disk. /etc/fstab reflects the changes made; ensure that the /u0* mount points are owned by oracle:oinstall.
[root@node2 ~]# cat /etc/fstab /dev/root_vg/root_lv / ext3 defaults 1 1 LABEL=/boot /boot ext3 defaults 1 2 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 /dev/root_vg/swap_lv swap swap defaults 0 0 /dev/xvdb1 /u01 ext3 defaults 0 0 #/dev/xvdc1 /u02 ocfs2 _netdev 0 0 #/dev/xvdd1 /u03 ocfs2 _netdev,datavolume,nointr,noatime 0 0
Mount options for ocfs2 are set as per MOS document “Recommended Ocfs2 1.4.1 mount options with DB volumes [ID 835839.1]”. Note that you can’t install Oracle binaries on the ocfs2 partition mounted with the datavolume option, therefore I have /u02 in addition to this.
/u01 is local storage to each node for Grid Infrastrucutre, therefore I am using the ext3 file system. The ocfs2 volumes are commented out for the moment as ocfs2 isn’t configured yet and it would only confuse Linux during system boot. Now Shut down node1 now and copy it, i.e. for my configuration I used a “cp -a /m/xen/node1 /m/xen/node2”. This saves you having to do the same now boring installation again.
Use virsh dumpxml node1 > node2.xml and modify the dumped XML file to reflect the node name, change MACs and UUIDs, remove the target tag in the interface section of the file. Now you can run virsh define node2.xml to add the second node to the xen backend. Start node2 and change /etc/sysconfig/network to give it a new name and also run system-config-network-tui to configure eth0 and eth1 with their respective IP addresses. Reboot and start node1.
With the ocfs2 software is installed, start ocfs2console on node1 in a vncserver session. Select the “Cluster > Configure Nodes…” menu option and enter the information for the cluster nodes. I chose the 192.168.100/24 subnet for OCFS2. Then select “Cluster -> Propagate Configuration” to copy the configuration to the other cluster node. Then run service o2cb configure-I lived ok with the defaults except that you want ocfs2 to be started at system boot.
Uncomment the lines referencing the ocfs2 mount points /u02 and /u03 from both nodes’ /etc/fstab and execute a mount -a as root to mount them. This can take a few seconds. Use df -h to verify all devices are mounted.
Now unzip the Grid Infrastructure zip somewhere convenient and start runInstaller. Go through the advanced installation and ensure you install voting disk and OCR on a cluster file system, not ASM. My storage location for these is /u03/oradata/grid/. Before you run root.sh a bit of memory shuffling around was necessary. I allocated 1024M per domU but want 1.5G for the execution of root.sh. My setup allows me to do this on the fly:
uklnxpc005:/m/xen/node1 # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 1231 2 r----- 52139.0 node1 28 1024 2 -b---- 1046.2 node2 27 1024 2 -b---- 757.4 uklnxpc005:/m/xen/node1 # xm mem-set 28 1536 uklnxpc005:/m/xen/node1 # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 1231 2 r----- 52142.8 node1 28 1536 2 -b---- 1046.4 node2 27 1024 2 -b---- 757.4
Brilliant, isn’t it? Now I executed orainstRoot.sh and root.sh on node1, output below:
[root@node1 ~]# /u01/app/oraInventory/orainstRoot.sh Changing permissions of /u01/app/oraInventory. Adding read,write permissions for group. Removing read,write,execute permissions for world. Changing groupname of /u01/app/oraInventory to oinstall. The execution of the script is complete. [root@node1 ~]# /u01/crs/oracle/product/11.2.0/grid/root.sh Running Oracle 11g root.sh script... The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/crs/oracle/product/11.2.0/grid Enter the full pathname of the local bin directory: [/usr/local/bin]: Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root.sh script. Now product-specific root actions will be performed. 2010-03-02 14:38:57: Parsing the host name 2010-03-02 14:38:57: Checking for super user privileges 2010-03-02 14:38:57: User has super user privileges Using configuration parameter file: /u01/crs/oracle/product/11.2.0/grid/crs/install/crsconfig_params Creating trace directory LOCAL ADD MODE Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert Adding daemon to inittab CRS-4123: Oracle High Availability Services has been started. ohasd is starting CRS-2672: Attempting to start 'ora.gipcd' on 'node1' CRS-2672: Attempting to start 'ora.mdnsd' on 'node1' CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'node1' CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1' CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'node1' CRS-2672: Attempting to start 'ora.diskmon' on 'node1' CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded CRS-2676: Start of 'ora.cssd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'node1' CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded clscfg: -install mode specified Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. CRS-2672: Attempting to start 'ora.crsd' on 'node1' CRS-2676: Start of 'ora.crsd' on 'node1' succeeded Now formatting voting disk: /u03/oradata/grid/vdsk1. Now formatting voting disk: /u03/oradata/grid/vdsk2. Now formatting voting disk: /u03/oradata/grid/vdsk3. CRS-4603: Successful addition of voting disk /u03/oradata/grid/vdsk1. CRS-4603: Successful addition of voting disk /u03/oradata/grid/vdsk2. CRS-4603: Successful addition of voting disk /u03/oradata/grid/vdsk3. ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 44399c7d09544fbbbfe54848c339b2fc (/u03/oradata/grid/vdsk1) [] 2. ONLINE 863fbd4492f64f02bf89786e0791e49f (/u03/oradata/grid/vdsk2) [] 3. ONLINE 08ab672c19264fd0bfe62dbc9d707ae7 (/u03/oradata/grid/vdsk3) [] Located 3 voting disk(s). CRS-2673: Attempting to stop 'ora.crsd' on 'node1' CRS-2677: Stop of 'ora.crsd' on 'node1' succeeded CRS-2673: Attempting to stop 'ora.ctssd' on 'node1' CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'node1' CRS-2677: Stop of 'ora.cssdmonitor' on 'node1' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'node1' CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1' CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'node1' CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1' CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.mdnsd' on 'node1' CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'node1' CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'node1' CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1' CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'node1' CRS-2672: Attempting to start 'ora.diskmon' on 'node1' CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded CRS-2676: Start of 'ora.cssd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'node1' CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'node1' CRS-2676: Start of 'ora.crsd' on 'node1' succeeded CRS-2672: Attempting to start 'ora.evmd' on 'node1' CRS-2676: Start of 'ora.evmd' on 'node1' succeeded node1 2010/03/02 14:44:04 /u01/crs/oracle/product/11.2.0/grid/cdata/node1/backup_20100302_144404.olr Preparing packages for installation... cvuqdisk-1.0.7-1 Configure Oracle Grid Infrastructure for a Cluster ... succeeded Updating inventory properties for clusterware Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 2047 MB Passed The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oraInventory 'UpdateNodeList' was successful.
This was ok-so reduce memory to 1G for node1 and inflate it to 1.5G for node2. If you get PROT-1 errors in ocrconfig then your mount options are wrong! You need to have the datavolume specified for ocrconfig to format the OCR.
uklnxpc005:/m/xen/node1 # xm mem-set 28 1024 uklnxpc005:/m/xen/node1 # xm mem-set 27 1536 uklnxpc005:/m/xen/node1 # xm list Name ID Mem VCPUs State Time(s) Domain-0 0 1090 2 r----- 52250.7 node1 28 1024 2 -b---- 1140.5 node2 27 1536 2 -b---- 759.8
run root.sh on the second node now:
[root@node2 ~]# /u01/app/oraInventory/orainstRoot.sh Changing permissions of /u01/app/oraInventory. Adding read,write permissions for group. Removing read,write,execute permissions for world. Changing groupname of /u01/app/oraInventory to oinstall. The execution of the script is complete. [root@node2 ~]# /u01/crs/oracle/product/11.2.0/grid/root.sh Running Oracle 11g root.sh script... The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/crs/oracle/product/11.2.0/grid Enter the full pathname of the local bin directory: [/usr/local/bin]: Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ... Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root.sh script. Now product-specific root actions will be performed. 2010-03-02 14:45:40: Parsing the host name 2010-03-02 14:45:40: Checking for super user privileges 2010-03-02 14:45:40: User has super user privileges Using configuration parameter file: /u01/crs/oracle/product/11.2.0/grid/crs/install/crsconfig_params Creating trace directory LOCAL ADD MODE Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. Adding daemon to inittab CRS-4123: Oracle High Availability Services has been started. ohasd is starting CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating An active cluster was found during exclusive startup, restarting to join the cluster CRS-2672: Attempting to start 'ora.mdnsd' on 'node2' CRS-2676: Start of 'ora.mdnsd' on 'node2' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'node2' CRS-2676: Start of 'ora.gipcd' on 'node2' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'node2' CRS-2676: Start of 'ora.gpnpd' on 'node2' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node2' CRS-2676: Start of 'ora.cssdmonitor' on 'node2' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'node2' CRS-2672: Attempting to start 'ora.diskmon' on 'node2' CRS-2676: Start of 'ora.diskmon' on 'node2' succeeded CRS-2676: Start of 'ora.cssd' on 'node2' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'node2' CRS-2676: Start of 'ora.ctssd' on 'node2' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'node2' CRS-2676: Start of 'ora.crsd' on 'node2' succeeded CRS-2672: Attempting to start 'ora.evmd' on 'node2' CRS-2676: Start of 'ora.evmd' on 'node2' succeeded node2 2010/03/02 14:49:28 /u01/crs/oracle/product/11.2.0/grid/cdata/node2/backup_20100302_144928.olr Preparing packages for installation... cvuqdisk-1.0.7-1 Configure Oracle Grid Infrastructure for a Cluster ... succeeded Updating inventory properties for clusterware Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 2047 MB Passed The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oraInventory 'UpdateNodeList' was successful. [root@node2 ~]#
This worked despite the message that another daemon was running. Finally, close the pop up window in OUI and let the configuration assistants do their work.
That’s it-you have Grid Infrastructure installed! The next part of the series will deal with the RDBMS installation on /u02 as a shared oracle home.
Responses
http://oss.oracle.com/projects/ocfs2/news/
ocfs2 1.4.4 was released in Sep 2009. Not 2007.
I stand corrected! I think I got the date from one of the download pages on oss.oracle.com
Excellent post thank you for the info
[…] 10-How to install 11GR2 RAC on OCFS2 Martin Bach-Installing RAC 11.2 on ocfs2 part I […]
Hello, I am trying to understand how data is shared between the clustered filesystems (/u02 and /u03 in your set-up). Is it OCFS2 that shares the data (does the “clustering”) or is ocfs2 installed into nodes that are somehow already “clustered” in the first place? I can see the “shareable” attribute set for xvdc and xvdd. Are these required for OCFS2? If I wanted to run (test environment only) OCFS2 on physical hardware, what are my options?
Good morning,
Yes, OCFS2 is a clustered file system-see http://oss.oracle.com/projects/ocfs2/ and http://www.oracle.com/us/technologies/linux/025995.htm for more information.
As for anything that spans domUs, you have to set the sharable attribute for the storage. The use of /u02 and /u03 is explained in the post, the main reason is the requirement for different mount options for data files and voting disks.
OCFS2 creates/is a new cluster layer in addition to Clusterware, which is why I’d recommend you use ASM. If you still want OCFS2, make sure to check My Oracle Support which has recommendations for setting OCFS2 parameters in conjunction with Clusterware.
Hope this helps,
Martin