Sometimes it becomes necessary to power down or reboot the cell to perform maintenance while one or more databases are running. In this situation You need to verify that taking the storage server offline will not impact Oracle ASM disk group and database availability. The ability to take Oracle Exadata Storage Server offline without affecting database availability depends on the level of Oracle ASM redundancy used on the affected disk groups, and the current status of disks in other Oracle Exadata Storage Servers that have mirror copies of data as Oracle Exadata Storage Server to be taken offline.
Remember ASM drops a disk shortly after it is taken offline; however, you can set the DISK_REPAIR_TIME (3.6h by default) attribute to prevent this operation by specifying a time interval to repair the disk and bring it back online.
1- check repair times (on ASM instance) for all mounted disk groups
SQL> select dg.name,a.value from v$asm_diskgroup
dg, v$asm_attribute a where dg.group_number=a.group_number and
a.name='disk_repair_time'; 2 3
NAME
------------------------------
VALUE
--------------------------------------------------------------------------------
DATA
3.6h
2- Optionaly modify the DISK_REPAIR_TIME
SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';
Diskgroup altered.
CellCLI> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
datagd_CD_disk01_cell1 ONLINE Yes
datagd_CD_disk02_cell1 ONLINE Yes
datagd_CD_disk03_cell1 ONLINE Yes
datagd_CD_disk04_cell1 ONLINE Yes
datagd_CD_disk05_cell1 ONLINE Yes
datagd_CD_disk06_cell1 ONLINE Yes
datagd_CD_disk07_cell1 UNKNOWN Yes
datagd_CD_disk08_cell1 UNUSED Yes
datagd_CD_disk09_cell1 UNUSED Yes
datagd_CD_disk10_cell1 UNUSED Yes
datagd_CD_disk11_cell1 UNUSED Yes
datagd_CD_disk12_cell1 UNUSED Yes
If one or more disks does not return asmdeactivationoutcome='Yes', you should check the respective diskgroup and restore the data redundancy for that diskgroup. Shutting down the cell servives when one or more grid disks does not return asmdeactivationoutcome='Yes' will cause Oracle ASM to dismount the affected disk group, causing the databases to shut down abruptly.
4- Inactivate all grid disks on the cell you wish to power down/reboot
cellcli -e alter griddisk all inactive
5- Confirm that the griddisks are now offline
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
cellcli -e alter griddisk all inactive
The action above may take 10 minutes or longer depending on activity. It is important to make sure you are able to offline all the disks successfully before shutting down the cell services. Inactivating the grid disks will automatically OFFLINE the disks in the ASM instance.
5- Confirm that the griddisks are now offline
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
cellcli -e list griddisk
7- Reboot the cell
8- Once the cell comes back online - you will need to reactive the griddisks
cellcli -e alter griddisk all active
9- Verify grid disk status:
cellcli -e list griddisk
cellcli -e list griddisk attributes name, asmmodestatus
You may need to wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. This operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance. The Resync operation restores only the extents that would have been written while the disk was offline.
10) Before taking another storage server offline, Oracle ASM synchronization must complete on the restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check (Step 3) performed on another storage server will fail.
1 comment:
Nice explanation Inam. Keep up the good work. You didn't mention under which circumstances, this is not possible? Though, you mentioned about redundancy level you talk, but, there are other factors that could prevent you doing this.
Post a Comment