Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Sunday, December 12, 2010

Node 2 failed after applying OS Patches

Yesterday client applied OS security patches on the Node 2 (Windows 2003 64bit) and restarted the server. After restart there was hang observed for cluster related services.
I made all the Oracle Services to Manual as startup type and issued the following command.


D:\oracle\product\10.2.0\crs\BIN>crsctl start crs
Attempting to start CRS stack
service OracleCSService in improper PENDING state, err(0)
I observed the logs which showed the following
D:\oracle\product\10.2.0\crs\log\db2\css106.log
2010-12-12 10:00:59.559: [  OCROSD][4368]utopen:9:failed to open OCR file/disk \\.\ocrcfg,  oserror=2
2010-12-12 10:00:59.559: [  OCROSD][4368]utopen:Could not open any of the devices
2010-12-12 10:00:59.559: [  OCRRAW][4368]proprinit: Could not open raw device
2010-12-12 10:00:59.559: [ default][4368]a_init:7!: Backend init unsuccessful : [26]
2010-12-12 10:00:59.559: [ CSSCLNT][4368]clsssinit: Unable to access OCR device in OCR init.

D:\oracle\product\10.2.0\crs\log\db2\cssd\cssdOUT.log
] [2]
12/12/10 12:05:51  ssmain_run_css:  boot check returned 8, looping
12/12/10 12:05:52  ssmain_run_css:  launching boot check 7059 with d:\oracle\product\10.2.0\crs\bin\crsctl.exe check boot
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [The system cannot find the file specified.

I did the following to resolve the issue
on Node ran
1- opmd.ext -install
2- Made all Oracle services startup type as automatic
3- Restart the Server
4- Manually startup the Instance 2

D:\oracle\product\10.2.0\crs\BIN>crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....E1.inst application    ONLINE    ONLINE    db1
ora....E2.inst application    ONLINE    ONLINE    db2
ora.HOME.db    application    ONLINE    ONLINE    db1
ora....SM1.asm application    ONLINE    ONLINE    db1
ora....11.lsnr application    ONLINE    ONLINE    db1
ora....b11.gsd application    ONLINE    ONLINE    db1
ora....b11.ons application    ONLINE    ONLINE    db1
ora....b11.vip application    ONLINE    ONLINE    db1
ora....SM2.asm application    ONLINE    ONLINE    db2
ora....12.lsnr application    ONLINE    OFFLINE
ora....b12.gsd application    ONLINE    ONLINE    db12
ora....b12.ons application    ONLINE    ONLINE    db12
ora....b12.vip application    ONLINE    OFFLINE

D:\oracle\product\10.2.0\crs\BIN>crsctl start resources
Starting resources.
Successfully started CRS resources

D:\oracle\product\10.2.0\crs\BIN>crs_stat -t
Name           Type           Target    State     Host
------------------------------------------------------------
ora....E1.inst application    ONLINE    ONLINE    db1
ora....E2.inst application    ONLINE    ONLINE    db2
ora.HOME.db    application    ONLINE    ONLINE    db1
ora....SM1.asm application    ONLINE    ONLINE    db1
ora....11.lsnr application    ONLINE    ONLINE    db1
ora....b11.gsd application    ONLINE    ONLINE    db1
ora....b11.ons application    ONLINE    ONLINE    db1
ora....b11.vip application    ONLINE    ONLINE    db1
ora....SM2.asm application    ONLINE    ONLINE    db2
ora....12.lsnr application    ONLINE    ONLINE    db2
ora....b12.gsd application    ONLINE    ONLINE    db2
ora....b12.ons application    ONLINE    ONLINE    db2
ora....b12.vip application    ONLINE    ONLINE    db2

Knowlegebase used for this resolution

Ocr Initialization Failed Accessing Ocr Device on ASM single node [ID 428548.1]
Cause

There are three potential reasons can cause the problem:
1. ocrconfig_loc is not pointing to the correct ocr.
2. Problem of rights and owners on the ocr devices
3. Configuration problem on Oracle Cluster Synchronization Services
 Solution
1. Please check the rights and owners of the ocr devices
2. Check if the ocrconfig_loc is pointing to the correct ocr in /var/opt/oracle/ocr.loc.
The ocrconfig_loc parameter specifies the location of the Oracle Cluster
Registry (OCR) used by the CSS daemon. The path up to the cdata directory is the Oracle home directory where the CSS daemon is running.
3. Reconfiguring Oracle Cluster Synchronization Services
a. $/u01/app/oracle/product/10G/bin/localconfig delete
Unexpected parameter: css
    $
PROC-26 CRS Does not Start on 2nd Node But on 1st Node is up [ID 603398.1]
Symptoms
2 node RAC cluster, CRS is up on node 1, but not up on 2nd node.
ps -ef | grep css
show:
root 11813 1 0 12:28 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 15228 11813 0 12:28 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 15246 12448 0 12:28 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
root 15991 13672 0 12:28 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
 
/tmp/crsctl.15228 show:
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage

as oracle user, run "crsctl check boot" failed with same error.
But OCR is owned by root:oinstall and permission is 644. There is no issue for oracle user to run dd against OCR device: /dev/raw/raw1 and /dev/raw/raw2.
 Changes
Just restored OCR and Voting disk from backup due to storage problem before starting CRS on both nodes.
Cause
It is caused by mismatch of /etc/oracle/ocr.loc file on two nodes.

On node 1, /etc/oracle/ocr.loc does not have ocrmirror defined:
ocrconfig_loc=/dev/raw/raw1
local_only=false

On node2, /etc/oracle/ocr.loc has ocrmirror defined:
ocrconfig_loc=/dev/raw/raw1
ocrmirrorconfig_loc=/dev/raw/raw2
local_only=false

CRS already started on node 1 without OCR mirror, that is why on 2nd node CRS report PROC-26 for OCR mirror as it is not same as OCR.
Solution
1. On node 2, run following command as root:
# ocrconfig -repair ocrmirror
This command will remove the entry ocrmirrorconfig_loc from /etc/oracle/ocr.loc, it can only be run when CRS is down on that node.

2. Wait for upto 60 seconds, CRS should be started automatically on 2nd node.

3. If you want to add OCR mirror again, run following command as root user on either node (assume CRS is up on all nodes):
# ocrconfig -replace ocrmirror /dev/raw/raw2
It will add ocrmirror device in /etc/oracle/ocr.loc.

Note: ocrconfig -replace ocr|ocrmirror command only propagates the changes in /etc/oracle/ocr.loc to nodes which CRS is running (at the time command is issued). By comparison, ocrconfig -repair ocr|ocrmirror command only modifies /etc/oracle/ocr.loc on the node where the command is run from (while CRS is shutdown)."

Can not Start CRS on Windows Cluster [ID 1115153.1] 
Symptoms
2 node RAC on windows cluster. After some network changes, both nodes were rebooted. CRS started fine on node 1 but failed on the 2nd node.

All Oracle services were set to manual start, starting OracleCRService failed with timeout and OracleCSService ended up in "Starting" status, ocssd.log wasn't updated.

However ocssdOUT.log repeatedly showed the following messages:

05/12/10 10:22:22 ssmain_run_css: boot check returned 8, looping
05/12/10 10:22:23 ssmain_run_css: launching boot check 662 with c:\oracrs\bin\crsctl.exe check boot
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage Operating System error [The system cannot find the file specified.] [2]
05/12/10 10:22:23 ssmain_run_css: boot check returned 8, looping
05/12/10 10:22:24 ssmain_run_css: launching boot check 663 with c:\oracrs\bin\crsctl.exe check boot
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage Operating System error [The system cannot find the file specified.] [2]

When running ocrcheck, the first time it reported the same error: The system cannot find the file specified.
When run again, it returned the expected  result.

Used ocopy to verify OCR raw device access, it copied fine.

Manually running "crsctl check boot" from DOS prompt succeeded without error.
Changes
The "Startup type" of the Oracle services was set to "manual" on the 2nd node.
 Cause
The issue was caused by having the "Startup type" of all Oracle related services set  to "manual", more specifically the fact that the service OracleObjectService was not started before OracleCRService tried to start.

If the OCR and/or Voting disks are located in raw devices, the OracleObjectService must be running before the CRS startup can be issued, as it is used to synchronize raw device access across the cluster.
Solution 
1.Start OracleObjectService from Windows service panel
2. Start OracleCRService
CRS is then started fine.
Also make sure that the "Startup type" of OracleObjectService is set to "automatic".

No comments: