In Exadata an alert is automatically triggered when a predefined hardware or software issue is detected, or when a metric exceeds a threshold. By default, there are no thresholds defined but you can define your own if you want.
1- List the thresholds currently defined on the Exadata cell.
CellCLI> list threshold
CellCLI>
2- The LIST ALERTDEFINITION command displays all available sources of the alerts on the cell. You can use this list to remind yourself which metrics can have thresholds associated with them.
CellCLI> list alertdefinition
CellCLI> list alertdefinition cl_fsut detail
3- Create a warning threshold for file system utilization on the root (/) file system.
CellCLI> list metriccurrent cl_fsut
CL_FSUT "/" 63 %
CL_FSUT "/boot" 28 %
CL_FSUT "/dev/shm" 0 %
CL_FSUT "/opt/oracle" 36 %
CL_FSUT "/var/log/oracle" 7 %
CellCLI> list metriccurrent cl_fsut detail
name: CL_FSUT
alertState: normal
collectionTime: 2014-12-02T13:33:50+03:00
metricObjectName: "/"
metricType: Instantaneous
metricValue: 63 %
objectType: CELL_FILESYSTEM
name: CL_FSUT
alertState: normal
collectionTime: 2014-12-02T13:33:50+03:00
metricObjectName: "/boot"
metricType: Instantaneous
metricValue: 28 %
objectType: CELL_FILESYSTEM
name: CL_FSUT
alertState: normal
collectionTime: 2014-12-02T13:33:50+03:00
metricObjectName: "/dev/shm"
metricType: Instantaneous
metricValue: 0 %
objectType: CELL_FILESYSTEM
name: CL_FSUT
alertState: normal
collectionTime: 2014-12-02T13:33:50+03:00
metricObjectName: "/opt/oracle"
metricType: Instantaneous
metricValue: 36 %
objectType: CELL_FILESYSTEM
name: CL_FSUT
alertState: normal
collectionTime: 2014-12-02T13:33:50+03:00
metricObjectName: "/var/log/oracle"
metricType: Instantaneous
metricValue: 7 %
objectType: CELL_FILESYSTEM
Set the warning level to a value slightly larger than the utilization you observe above.
CellCLI> create threshold cl_fsut."/" comparison='>', warning=64
Threshold cl_fsut."/" successfully created
CellCLI>
4- View the newly created threshold definition. After this exit from cellcli.
CellCLI> list threshold detail
name: cl_fsut./
comparison: >
warning: 64.0
5- On the OS prompt, execute the following command inside the cell operating system. It creates a 512 MB file on the root file system, which will increase the utilization metric. After the metric crosses the threshold you defined above an alert will be generated.
[root@pk3-iub-cel-es01 ~]# dd if=/dev/zero of=/tmp/file.out > bs=1024 count=500000
500000+0 records in
500000+0 records out
256000000 bytes (256 MB) copied, 2.39039 seconds, 107 MB/s
[root@pk3-iub-cel-es01 ~]#
6- Relaunch CellCLI and execute the LIST ALERTHISTORY command.
CellCLI> list alerthistory
31_1 2014-10-17T02:01:06+03:00 info "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours. Note that many learn cycles do not require entering WriteThrough caching mode. When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent. Battery Serial Number : 6297 Battery Type : iBBU08 Battery Temperature : 31 C Full Charge Capacity : 1264 mAh Relative Charge : 100 % Ambient Temperature : 16 C"
31_2 2014-10-17T07:51:08+03:00 clear "All disk drives are in WriteBack caching mode. Battery Serial Number : 6297 Battery Type : iBBU08 Battery Temperature : 35 C Full Charge Capacity : 1256 mAh Relative Charge : 55 % Ambient Temperature : 17 C"
32_1 2014-12-02T14:20:50+03:00 warning "The warning threshold for the following metric has been crossed. Metric Name : CL_FSUT Metric Description : Percentage of total space on this file system that is currently used Object Name : / Current Value : 65.0 % Threshold Value : 64.0 % "
CellCLI>
CellCLI> list alerthistory detail
name: 31_1
alertMessage: "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours. Note that many learn cycles do not require entering WriteThrough caching mode. When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent. Battery Serial Number : 6297 Battery Type : iBBU08 Battery Temperature : 31 C Full Charge Capacity : 1264 mAh Relative Charge : 100 % Ambient Temperature : 16 C"
alertSequenceID: 31
alertShortName: Hardware
alertType: Stateful
beginTime: 2014-10-17T02:01:06+03:00
endTime: 2014-10-17T07:51:08+03:00
examinedBy:
metricObjectName: LUN_LEARN_CYCLE_ALERT
notificationState: 0
sequenceBeginTime: 2014-10-17T02:01:06+03:00
severity: info
alertAction: Informational.
name: 31_2
alertMessage: "All disk drives are in WriteBack caching mode. Battery Serial Number : 6297 Battery Type : iBBU08 Battery Temperature : 35 C Full Charge Capacity : 1256 mAh Relative Charge : 55 % Ambient Temperature : 17 C"
alertSequenceID: 31
alertShortName: Hardware
alertType: Stateful
beginTime: 2014-10-17T07:51:08+03:00
endTime: 2014-10-17T07:51:08+03:00
examinedBy:
metricObjectName: LUN_LEARN_CYCLE_ALERT
notificationState: 0
sequenceBeginTime: 2014-10-17T02:01:06+03:00
severity: clear
alertAction: Informational.
name: 32_1
alertMessage: "The warning threshold for the following metric has been crossed. Metric Name : CL_FSUT Metric Description : Percentage of total space on this file system that is currently used Object Name : / Current Value : 65.0 % Threshold Value : 64.0 % "
alertSequenceID: 32
alertShortName: CL_FSUT
alertType: Stateful
beginTime: 2014-12-02T14:20:50+03:00
endTime:
examinedBy:
metricObjectName: "/"
metricValue: 65.0
notificationState: 1
sequenceBeginTime: 2014-12-02T14:20:50+03:00
severity: warning
alertAction: "Examine the metric value that is violating the specified threshold, and take appropriate actions if needed."
CellCLI>
If you have configured the mail you will be receiving mail like below
7- Delete the file created above to get the space again
[root@pn3-esk-cel-es01 ~]# rm /tmp/file.out
As soon as you delete this file threshold value is below, you will receive mail if mail is configured like below
8- Relaunch CellCLI and examine the file system utilization and confirm that the root (/) file system utilization has fallen back below the warning threshold. If the metric still exceeds the warning threshold, re-execute the command periodically until the metric value is updated.
CellCLI> list metriccurrent cl_fsut
CL_FSUT "/" 63 %
CL_FSUT "/boot" 28 %
CL_FSUT "/dev/shm" 0 %
CL_FSUT "/opt/oracle" 36 %
CL_FSUT "/var/log/oracle" 7 %
CellCLI>
9- Re-execute LIST ALERTHISTORY, alert should be listed as cleared.
CellCLI> list alerthistory
31_1 2014-10-17T02:01:06+03:00 info "The disk controller battery is executing a learn cycle and may temporarily enter WriteThrough Caching mode as part of the learn cycle. Disk write throughput might be temporarily lower during this time. The flash drives are not affected. The battery learn cycle is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours. Note that many learn cycles do not require entering WriteThrough caching mode. When the disk controller cache returns to the normal WriteBack caching mode, an additional informational alert will be sent. Battery Serial Number : 6297 Battery Type : iBBU08 Battery Temperature : 31 C Full Charge Capacity : 1264 mAh Relative Charge : 100 % Ambient Temperature : 16 C"
31_2 2014-10-17T07:51:08+03:00 clear "All disk drives are in WriteBack caching mode. Battery Serial Number : 6297 Battery Type : iBBU08 Battery Temperature : 35 C Full Charge Capacity : 1256 mAh Relative Charge : 55 % Ambient Temperature : 17 C"
32_1 2014-12-02T14:20:50+03:00 warning "The warning threshold for the following metric has been crossed. Metric Name : CL_FSUT Metric Description : Percentage of total space on this file system that is currently used Object Name : / Current Value : 65.0 % Threshold Value : 64.0 % "
32_2 2014-12-02T14:31:50+03:00 clear "The warning threshold for the following metric has been cleared. Metric Name : CL_FSUT Metric Description : Percentage of total space on this file system that is currently used Object Name : / Current Value : 63.0 % Threshold Value : 64.0 % "
CellCLI>
No comments:
Post a Comment