Description:
Xymon is the web based monitoring tool of choice for Administrators and staff supporting ITSO Windows and Unix processes on the Purdue Campus. It performs a valuable duty, giving the trained eye a quick overview of the hardware and processes that may be out of synch for several key University areas.
In general, IOC staff will only need to respond to Critical (Red) alerts that affect production devices (Servers, Virtual Machines, etc.) All production Xymon alerts should also be reflected in SquaredUp. Generally, if a critical production alert appears in Xymon for more than 20 minutes, a system administrator or equivalent will need to be contacted regarding their malfunctioning equipment. In the event of a hardware failure, contact the appropriate system owner ASAP. Each shift is responsible for learning their time of day’s appropriate contact procedures.
Location:
https://monitor.itap.purdue.edu/xymon-cgi/nongreen.pl?expage=NONPROD&excol=kernel,patches,apt,libs,entropy,inode,uptime,xymonnet,xymond,trends - Xymon: All non-green systems. Keep - Keep this link up on workstation.
...
Clicking the face alongside the Backup sub category returns the states of machines currently classified as such. Any systems that are underlined indicate the presence of critical information. Additional important information may be found in the Info category for each machine. A Technical Operator will need this further information to resolve alerts.
Clicking on tsm01 as displayed above, generates the screen shown below.
The top level titles and subcategories are subject to change. The user is encouraged to explore the application and become familiar with the various categories of systems monitored.
...
This section can display messages from the last 4 hours of monitoring. Each line contains from left to right: time stamp, machine name, affected service, prior service status, and the updated machine state. Each system's title will be highlighted in red, yellow, or green; a color which corresponds to the current status of the service in question.
Prior Status: State of monitored service prior to the most recent update
...
Status History: Time and date of the update
Acknowledged Alerts
From time to time an administrator will have acknowledge an alert within Xymon without clearing it. In the event of production alerts a user will be presented with a check mark in place of the face / X shape as shown below.
...
Xymon Color Codes and Symbols
Color | Recently changed | Last change > 24 hours |
---|---|---|
Green: Status is OK |
|
|
Yellow: Warning |
|
|
Red: Critical |
|
|
Clear: No data |
|
|
Purple: No report |
|
|
Blue: Disabled |
|
|
When to Call
As previously mentioned, critical alerts in Xymon are indicated by the color red. Take time to click on the alert when it shows up in order to better prepare any future response. Generally, a system administrator should be alerted should one of their production devices is in an alarm state for more than 20 minutes. There are several exceptions to this rule as follows:
1. Hardware Failure - Contact the on-call rapidly for production machines suffering a hardware failure. These failures do not clear up on their own, and prompt response (~5 minutes) is advisable.
...