Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Description: 

Xymon is the web based monitoring tool of choice for Administrators and staff supporting ITSO Windows and Unix processes on the Purdue Campus. It performs a valuable duty, giving the trained eye a quick overview of the hardware and processes that may be out of synch for several key University areas.


In general, IOC staff will only need to respond to Critical (Red) alerts that affect production devices (Servers, Virtual Machines, etc.) All production Xymon alerts should also be reflected in SquaredUp. Generally, if a critical production alert appears in Xymon for more than 20 minutes, a system administrator or equivalent will need to be contacted regarding their malfunctioning equipment. In the event of a hardware failure, contact the appropriate system owner ASAP. Each shift is responsible for learning their time of day’s appropriate contact procedures.


 Location:

https://monitor.itap.purdue.edu/xymon-cgi/nongreen.pl?expage=NONPROD&excol=kernel,patches,apt,libs,entropy,inode,uptime,xymonnet,xymond,trends - Xymon: All non-green systems. Keep  -  Keep this link up on workstation.

...

Clicking the face alongside the Backup sub category returns the states of machines currently classified as such. Any systems that are underlined indicate the presence of critical information. Additional important information may be found in the Info category for each machine. A Technical Operator will need this further information to resolve alerts.

 


Xymon Backup Production   


Clicking on tsm01 as displayed above, generates the screen shown below.

 Xymon Specific Instructions

 


The top level titles and subcategories are subject to change. The user is encouraged to explore the application and become familiar with the various categories of systems monitored.

...

This section can display messages from the last 4 hours of monitoring. Each line contains from left to right: time stamp, machine name, affected service, prior service status, and the updated machine state. Each system's title will be highlighted in red, yellow, or green; a color which corresponds to the current status of the service in question.

 Xymon Status History  


Prior Status: State of monitored service prior to the most recent update

...

Status History: Time and date of the update

 

Acknowledged Alerts 


From time to time an administrator will have acknowledge an alert within Xymon without clearing it. In the event of production alerts a user will be presented with a check mark in place of the face / X shape as shown below. 

...

Xymon Color Codes and Symbols

ColorRecently changedLast change > 24 hours
Green: Status is OK

 Green - recently changedImage Modified

 GreenImage Modified

Yellow: Warning

 Yellow - recently changedImage Modified

 YellowImage Modified

Red: Critical

 Red - recently changedImage Modified

 RedImage Modified

Clear: No data

 Clear - recently changedImage Modified

 ClearImage Modified

Purple: No report

 Purple - recently changedImage Modified

 PurpleImage Modified

Blue: Disabled

 Blue - recently changedImage Modified

 BlueImage Modified

 

When to Call

As previously mentioned, critical alerts in Xymon are indicated by the color red. Take time to click on the alert when it shows up in order to better prepare any future response. Generally, a system administrator should be alerted should one of their production devices is in an alarm state for more than 20 minutes. There are several exceptions to this rule as follows: 


1. Hardware Failure - Contact the on-call rapidly for production machines suffering a hardware failure. These failures do not clear up on their own, and prompt response (~5 minutes) is advisable.

...