Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


https://monitor.itap.purdue.edu/xymon-cgi/nongreen.pl?expage=NONPROD&excol=kernel,patches,apt,libs,entropy,inode,uptime,xymonnet,xymond,trends 

Section
bordertrue

Description: 

Xymon is the web based monitoring tool of choice for Administrators and staff supporting ITSO Windows and Unix processes on the Purdue Campus. It performs a valuable duty, giving the trained eye a quick overview of the hardware and processes that may be out of synch for several key University areas.


In general, IOC staff will only need to respond to Critical (Red) alerts that affect production devices (Servers, Virtual Machines, etc.) All production Xymon alerts should also be reflected in SquaredUp. Generally, if a critical production alert appears in Xymon for more than 20 minutes, a system administrator or equivalent will need to be contacted regarding their malfunctioning equipment. In the event of a hardware failure, contact the appropriate system owner ASAP. Each shift is responsible for learning their time of day’s appropriate contact procedures.


Section

Location:

Xymon: All non-green systems systems -  Keep this link up on workstation.

https://monitor.itap.purdue.edu/- Top level view Xymon: Top View Top level view that displays alerts by group. It is suggested that one create a sidebar for this view.

Categories include:

  • ITIS Services – Production, Non-production, Pre-production machines and VMs for a number of systems such as Oracle, Banner, Blackboard, SAP. IOC Operators should only be concerned with devices nested within Production.
  • Infrastructure – There are no devices exclusively within this category that Technical Operators are required to monitor.
  • Platform Support – Windows production and test boxes for applications. This category is in place for Windows administrators. There are no devices exclusively within this category that Technical Operators are required to monitor.

...

From the top level a user may drill down to determine a number of factors. To investigate a given category, click the face or symbol next to each title. Users who look at the Production category are presented with a large list of Services. Below is the current list.

 Production ServicesImage Removed 

Clicking the face alongside the Backup sub category returns the states of machines currently classified as such. Any systems that are underlined indicate the presence of critical information. Additional important information may be found in the Info category for each machine. A Technical Operator will need this further information to resolve alerts.

Xymon Backup Production Image Removed 

Clicking on tsm01 as displayed above, generates the screen shown below.

...

a large list of Services. Below is the current list.

 Production Services


The top level titles and subcategories are subject to change. The user is IOC staff are encouraged to explore the application and become familiar with the various categories of systems monitored.

...

5. Purple Alerts - Treat a production machine with numerous purple status alerts as if it were a red alert. (Note: The example machine listed below was NOT a production box at the time of publication.)

...

Image Added

6. Broad Spectrum Failure - If a large number of alerts are being recorded across the board for different systems, consider picking up the phone earlier than later, there could be a bigger problem emerging. 

...