We have SCOM configured for Oracle Servers to know if some service crashes, today I got the following alert for one of our RAC node.
The OracleASMService+ASM1 service terminated unexpectedly.
I started investigations and tried to get the status by crsctl as below
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
then I tried the same with -init option
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.crf
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE or1
ora.cssdmonitor
1 ONLINE ONLINE or1
ora.ctssd
1 ONLINE INTERMEDIATE or1 CHECK TIMED OUT
ora.drivers.acfs
1 ONLINE ONLINE or1
ora.evmd
1 ONLINE ONLINE or1
ora.gipcd
1 ONLINE ONLINE or1
ora.gpnpd
1 ONLINE ONLINE or1
ora.mdnsd
1 ONLINE ONLINE or1
the result showed the ASM instance shutdown same as the SCOM alert but it could not be online automatically.
Using the ASMCMD I started the ASM instance
and then tried to start the ora.crsd but it could not started and gave the following error.
I tried to stop it but it could not be stopped even and gave the following errors.
I used the -init option and found the ora.ctssd is in INTERMEDIATE state thats why ora.crsd is not being started.
So I stopped ora.ctssd first and then started it to come out of INTERMEDIATE state
After this cluster services began to start normally and I could check the status as below.
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DBDATA.dg
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.DBFLASH.dg
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.DGDUP.dg
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.LISTENER.lsnr
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.asm
ONLINE ONLINE or1 Started
ONLINE ONLINE or2 Started
ora.gsd
OFFLINE OFFLINE or1
OFFLINE OFFLINE or2
ora.net1.network
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.ons
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.registry.acfs
ONLINE ONLINE or1
ONLINE ONLINE or2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE or1
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE or2
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE or2
ora.cvu
1 ONLINE ONLINE or2
ora.rac.db
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
2 ONLINE ONLINE or2 Open
ora.oc4j
1 ONLINE ONLINE or2
ora.or1.vip
1 ONLINE ONLINE or1
ora.or2.vip
1 ONLINE ONLINE or2
ora.scan1.vip
1 ONLINE ONLINE or1
ora.scan2.vip
1 ONLINE ONLINE or2
ora.scan3.vip
1 ONLINE ONLINE or2
ora.testrac.db
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
2 ONLINE ONLINE or2 Open
Now everything was fine on the node but what caused all this.
From the crsd.log I found the below which is self explanatory
The OracleASMService+ASM1 service terminated unexpectedly.
I started investigations and tried to get the status by crsctl as below
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
then I tried the same with -init option
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.crf
1 ONLINE OFFLINE
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE or1
ora.cssdmonitor
1 ONLINE ONLINE or1
ora.ctssd
1 ONLINE INTERMEDIATE or1 CHECK TIMED OUT
ora.drivers.acfs
1 ONLINE ONLINE or1
ora.evmd
1 ONLINE ONLINE or1
ora.gipcd
1 ONLINE ONLINE or1
ora.gpnpd
1 ONLINE ONLINE or1
ora.mdnsd
1 ONLINE ONLINE or1
the result showed the ASM instance shutdown same as the SCOM alert but it could not be online automatically.
Using the ASMCMD I started the ASM instance
and then tried to start the ora.crsd but it could not started and gave the following error.
D:\app\11.2.0.3\grid\BIN>crsctl start res ora.crsd -init
CRS-2800: Cannot start resource 'ora.ctssd' as it is already in the INTERMEDIATE state on server 'or1'
CRS-4000: Command Start failed, or completed with errors.
CRS-2800: Cannot start resource 'ora.ctssd' as it is already in the INTERMEDIATE state on server 'or1'
CRS-4000: Command Start failed, or completed with errors.
I tried to stop it but it could not be stopped even and gave the following errors.
D:\app\11.2.0.3\grid\BIN>crsctl stop res ora.crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.
I used the -init option and found the ora.ctssd is in INTERMEDIATE state thats why ora.crsd is not being started.
D:\app\11.2.0.3\grid\BIN>crsctl start res ora.crsd -init
CRS-2800: Cannot start resource 'ora.ctssd' as it is already in the INTERMEDIATE state on server 'or1'
CRS-4000: Command Start failed, or completed with errors.
CRS-2800: Cannot start resource 'ora.ctssd' as it is already in the INTERMEDIATE state on server 'or1'
CRS-4000: Command Start failed, or completed with errors.
So I stopped ora.ctssd first and then started it to come out of INTERMEDIATE state
D:\app\11.2.0.3\grid\BIN>crsctl stop res ora.ctssd -init
CRS-2673: Attempting to stop 'ora.ctssd' on 'or1'
CRS-2677: Stop of 'ora.ctssd' on 'or1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'or1'
CRS-2677: Stop of 'ora.ctssd' on 'or1' succeeded
D:\app\11.2.0.3\grid\BIN>crsctl start res ora.ctssd -init
CRS-2672: Attempting to start 'ora.ctssd' on 'or1'
CRS-2676: Start of 'ora.ctssd' on 'or1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'or1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "D:\app\11.2.0.3\grid\log\or1\agent\ohasd\oraagent\oraagent.log".
CRS-2676: Start of 'ora.asm' on 'or1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'or1'
CRS-2676: Start of 'ora.ctssd' on 'or1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'or1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "D:\app\11.2.0.3\grid\log\or1\agent\ohasd\oraagent\oraagent.log".
CRS-2676: Start of 'ora.asm' on 'or1' succeeded
After this cluster services began to start normally and I could check the status as below.
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DBDATA.dg
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.DBFLASH.dg
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.DGDUP.dg
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.LISTENER.lsnr
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.asm
ONLINE ONLINE or1 Started
ONLINE ONLINE or2 Started
ora.gsd
OFFLINE OFFLINE or1
OFFLINE OFFLINE or2
ora.net1.network
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.ons
ONLINE ONLINE or1
ONLINE ONLINE or2
ora.registry.acfs
ONLINE ONLINE or1
ONLINE ONLINE or2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE or1
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE or2
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE or2
ora.cvu
1 ONLINE ONLINE or2
ora.rac.db
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
2 ONLINE ONLINE or2 Open
ora.oc4j
1 ONLINE ONLINE or2
ora.or1.vip
1 ONLINE ONLINE or1
ora.or2.vip
1 ONLINE ONLINE or2
ora.scan1.vip
1 ONLINE ONLINE or1
ora.scan2.vip
1 ONLINE ONLINE or2
ora.scan3.vip
1 ONLINE ONLINE or2
ora.testrac.db
1 ONLINE OFFLINE Instance Shutdown,S
TARTING
2 ONLINE ONLINE or2 Open
Now everything was fine on the node but what caused all this.
From the crsd.log I found the below which is self explanatory
2012-05-23 09:04:58.958: [
OCRASM][21492]ASM Error Stack : ORA-15077: could not locate ASM instance
serving a required diskgroup
2012-05-23 09:04:58.958: [
OCRASM][21492]proprasmo: kgfoCheckMount returned [7]
2012-05-23
09:04:58.958: [ OCRASM][21492]proprasmo: The ASM instance is down
2012-05-23 09:04:58.958: [
OCRRAW][21492]proprioo: Failed to open [+DBDATA]. Returned proprasmo() with [26].
Marking location as UNAVAILABLE.
2012-05-23 09:04:58.958: [
OCRASM][21492]proprasmo: Error in open/create file in dg [DBFLASH]
[ OCRASM][21492]SLOS :
SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
2012-05-23 09:04:58.989: [
OCRRAW][21492]proprinit: Could not open raw device
2012-05-23 09:04:58.989: [
OCRASM][21492]proprasmcl: asmhandle is NULL
2012-05-23 09:04:58.989: [
OCRASM][21492]proprasmcl: asmhandle is NULL
2012-05-23 09:04:58.989: [
OCRAPI][21492]a_init:16!: Backend init unsuccessful : [26]
2012-05-23 09:04:58.989: [
CRSOCR][21492] OCR context init failure. Error: PROC-26: Error while accessing
the physical storage
ORA-15077: could not locate ASM
instance serving a required diskgroup
2012-05-23 09:04:58.989: [
CRSMAIN][21492] Created alert : (:CRSD00111:) : Could not init OCR,
error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM
instance serving a required diskgroup
2012-05-23 09:04:58.989:
[ CRSD][21492][PANIC] CRSD exiting: Could not init OCR, code:
26
2012-05-23 09:04:58.989:
[ CRSD][21492] Done.
Ref: 1323698.1,1368382.1
2 comments:
Thanks for the Info. We have found that if the CRSD is still down and the other Cluster Services are running (crsctl stat res -t -init) then the problem resides in the OCR files. We added an additional OCR file to Node 1 and it worked but when we added (ocrconfig -repair -add +ocr1) it failed and we could not access the OCR information (2nd Node)(ocrcheck). What we had to do was to remove +ocr1 and readd back in (ocrconfig -repair -delete +ocr1 then ocrconfig -repair -add +ocr1). This worked and it brought the crsd up and running along with the other resources.
It was helpful as I have the same issue and could follow your instruction and address it
Thanks
Post a Comment