Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Wednesday, May 23, 2012

CRS-4535: Cannot communicate with Cluster Ready Services

We have SCOM configured for Oracle Servers to know if some service crashes, today I got the following alert for one of our RAC node.
The OracleASMService+ASM1 service terminated unexpectedly.  
I started investigations and tried to get the status by crsctl as below 


D:\app\11.2.0.3\grid\BIN>crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

then I tried the same with -init option
D:\app\11.2.0.3\grid\BIN>crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE                               Instance Shutdown
ora.crf
      1        ONLINE  OFFLINE
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  ONLINE       or1
ora.cssdmonitor
      1        ONLINE  ONLINE       or1
ora.ctssd
      1        ONLINE  INTERMEDIATE or1            CHECK TIMED OUT
ora.drivers.acfs
      1        ONLINE  ONLINE       or1
ora.evmd
      1        ONLINE  ONLINE       or1
ora.gipcd
      1        ONLINE  ONLINE       or1
ora.gpnpd
      1        ONLINE  ONLINE       or1
ora.mdnsd
      1        ONLINE  ONLINE       or1
 



the result showed the ASM instance shutdown same as the SCOM alert but it could not be online automatically. 
Using the ASMCMD I started the ASM instance


and then tried to start the ora.crsd but it could not started and gave the following error.


D:\app\11.2.0.3\grid\BIN>crsctl start res ora.crsd -init
CRS-2800: Cannot start resource 'ora.ctssd' as it is already in the INTERMEDIATE state on server 'or1'
CRS-4000: Command Start failed, or completed with errors.

I tried to stop it but it could not be stopped even and gave the following errors.


D:\app\11.2.0.3\grid\BIN>crsctl stop res ora.crsd
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Stop failed, or completed with errors.

I used the -init option and found the ora.ctssd is in INTERMEDIATE state thats why ora.crsd is not being started.

D:\app\11.2.0.3\grid\BIN>crsctl start res ora.crsd -init
CRS-2800: Cannot start resource 'ora.ctssd' as it is already in the INTERMEDIATE state on server 'or1'
CRS-4000: Command Start failed, or completed with errors.

So I stopped ora.ctssd first and then started it to come out of INTERMEDIATE state



D:\app\11.2.0.3\grid\BIN>crsctl stop res ora.ctssd -init
CRS-2673: Attempting to stop 'ora.ctssd' on 'or1'
CRS-2677: Stop of 'ora.ctssd' on 'or1' succeeded


D:\app\11.2.0.3\grid\BIN>crsctl start res ora.ctssd -init
CRS-2672: Attempting to start 'ora.ctssd' on 'or1'
CRS-2676: Start of 'ora.ctssd' on 'or1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'or1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
. For details refer to "(:CLSN00107:)" in "D:\app\11.2.0.3\grid\log\or1\agent\ohasd\oraagent\oraagent.log".
CRS-2676: Start of 'ora.asm' on 'or1' succeeded

After this cluster services began to start normally and I could check the status as below.

D:\app\11.2.0.3\grid\BIN>crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DBDATA.dg
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
ora.DBFLASH.dg
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
ora.DGDUP.dg
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
ora.LISTENER.lsnr
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
ora.asm
               ONLINE  ONLINE       or1            Started
               ONLINE  ONLINE       or2            Started
ora.gsd
               OFFLINE OFFLINE      or1
               OFFLINE OFFLINE      or2
ora.net1.network
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
ora.ons
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
ora.registry.acfs
               ONLINE  ONLINE       or1
               ONLINE  ONLINE       or2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       or1
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       or2
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       or2
ora.cvu
      1        ONLINE  ONLINE       or2
ora.rac.db
      1        ONLINE  OFFLINE                               Instance Shutdown,S
                                                             TARTING
      2        ONLINE  ONLINE       or2            Open
ora.oc4j
      1        ONLINE  ONLINE       or2
ora.or1.vip
      1        ONLINE  ONLINE       or1
ora.or2.vip
      1        ONLINE  ONLINE       or2
ora.scan1.vip
      1        ONLINE  ONLINE       or1
ora.scan2.vip
      1        ONLINE  ONLINE       or2
ora.scan3.vip
      1        ONLINE  ONLINE       or2
ora.testrac.db
      1        ONLINE  OFFLINE                               Instance Shutdown,S
                                                             TARTING
      2        ONLINE  ONLINE       or2            Open


Now everything was fine on the node but what caused all this.

From the crsd.log I found the below which is self explanatory


2012-05-23 09:04:58.958: [  OCRASM][21492]ASM Error Stack : ORA-15077: could not locate ASM instance serving a required diskgroup

2012-05-23 09:04:58.958: [  OCRASM][21492]proprasmo: kgfoCheckMount returned [7]
2012-05-23 09:04:58.958: [  OCRASM][21492]proprasmo: The ASM instance is down
2012-05-23 09:04:58.958: [  OCRRAW][21492]proprioo: Failed to open [+DBDATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2012-05-23 09:04:58.958: [  OCRASM][21492]proprasmo: Error in open/create file in dg [DBFLASH]
[  OCRASM][21492]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
2012-05-23 09:04:58.989: [  OCRRAW][21492]proprinit: Could not open raw device
2012-05-23 09:04:58.989: [  OCRASM][21492]proprasmcl: asmhandle is NULL
2012-05-23 09:04:58.989: [  OCRASM][21492]proprasmcl: asmhandle is NULL
2012-05-23 09:04:58.989: [  OCRAPI][21492]a_init:16!: Backend init unsuccessful : [26]
2012-05-23 09:04:58.989: [  CRSOCR][21492] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
 2012-05-23 09:04:58.989: [ CRSMAIN][21492] Created alert : (:CRSD00111:) :  Could not init OCR, error: PROC-26: Error while accessing the physical storage
ORA-15077: could not locate ASM instance serving a required diskgroup
2012-05-23 09:04:58.989: [    CRSD][21492][PANIC] CRSD exiting: Could not init OCR, code: 26
2012-05-23 09:04:58.989: [    CRSD][21492] Done.


Ref: 1323698.1,1368382.1

2 comments:

Unknown said...

Thanks for the Info. We have found that if the CRSD is still down and the other Cluster Services are running (crsctl stat res -t -init) then the problem resides in the OCR files. We added an additional OCR file to Node 1 and it worked but when we added (ocrconfig -repair -add +ocr1) it failed and we could not access the OCR information (2nd Node)(ocrcheck). What we had to do was to remove +ocr1 and readd back in (ocrconfig -repair -delete +ocr1 then ocrconfig -repair -add +ocr1). This worked and it brought the crsd up and running along with the other resources.

Zainab Shibly said...

It was helpful as I have the same issue and could follow your instruction and address it
Thanks