Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, November 28, 2013

Exadata: Shuting down/Rebooting cell without affecting ASM

Sometimes it becomes necessary to power down or reboot the cell to perform maintenance while one or more databases are running. In this situation You need to verify that taking the storage server offline will not impact Oracle ASM disk group and database availability. The ability to take Oracle Exadata Storage Server offline without affecting database availability depends on the level of Oracle ASM redundancy used on the affected disk groups, and the current status of disks in other Oracle Exadata Storage Servers that have mirror copies of data as Oracle Exadata Storage Server to be taken offline.

Remember ASM drops a disk shortly after it is taken offline; however, you can set the DISK_REPAIR_TIME (3.6h by default) attribute to prevent this operation by specifying a time interval to repair the disk and bring it back online.

1- check repair times (on ASM instance) for all mounted disk groups
SQL> select,a.value from v$asm_diskgroup
dg, v$asm_attribute a where dg.group_number=a.group_number and'disk_repair_time';  2    3  


2- Optionaly modify the DISK_REPAIR_TIME

Diskgroup altered.

3- Check if ASM will be OK if the grid disks go OFFLINE
CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
         datagd_CD_disk01_cell1  ONLINE          Yes
         datagd_CD_disk02_cell1  ONLINE          Yes
         datagd_CD_disk03_cell1  ONLINE          Yes
         datagd_CD_disk04_cell1  ONLINE          Yes
         datagd_CD_disk05_cell1  ONLINE          Yes
         datagd_CD_disk06_cell1  ONLINE          Yes
         datagd_CD_disk07_cell1  UNKNOWN         Yes
         datagd_CD_disk08_cell1  UNUSED          Yes
         datagd_CD_disk09_cell1  UNUSED          Yes
         datagd_CD_disk10_cell1  UNUSED          Yes
         datagd_CD_disk11_cell1  UNUSED          Yes
         datagd_CD_disk12_cell1  UNUSED          Yes

If one or more disks does not return asmdeactivationoutcome='Yes', you should check the respective diskgroup and restore the data redundancy for that diskgroup. Shutting down the cell servives when one or more grid disks does not return asmdeactivationoutcome='Yes' will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.
4- Inactivate all grid disks on the cell you wish to power down/reboot
cellcli -e alter griddisk all inactive

The action above may take 10 minutes or longer depending on activity. It is important to make sure you are able to offline all the disks successfully before shutting down the cell services. Inactivating the grid disks will automatically OFFLINE the disks in the ASM instance. 

5- Confirm that the griddisks are now offline
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

The output of above command should show either asmmodestatus=OFFLINE or asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are offline in ASM.

6- Confirm all griddiks now are inactive
cellcli -e list griddisk

7- Reboot the cell
[root@exacell1 ~]# shutdown -h now

8- Once the cell comes back online - you will need to reactive the griddisks
cellcli -e alter griddisk all active

9- Verify grid disk status:
cellcli -e list griddisk
cellcli -e list griddisk attributes name, asmmodestatu

You may need to wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. This operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance. The Resync operation restores only the extents that would have been written while the disk was offline.

10) Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check (Step 3) performed on another storage server will fail.

1 comment:

The Human Fly said...

Nice explanation Inam. Keep up the good work. You didn't mention under which circumstances, this is not possible? Though, you mentioned about redundancy level you talk, but, there are other factors that could prevent you doing this.