...
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Squared Up will notify Ops of two different types of alerts: Xymon alarms (covered in the next subsection) , and Network Device Alerts (covered in the next subsection) . Xymon Alerts:Before calling on any alarm review the following:
After a new Squared Up alert pops up for a production machine, If the alert is still present after 20 minutes Operations will need to call the group responsible for the system. |
...
Panel | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
Xymon is the web based monitoring tool of choice for Administrators and staff supporting ITSO Windows and Unix processes on the Purdue Campus. It performs a valuable duty, giving the trained eye a quick overview of the hardware and processes that may be out of synch for several key University areas. In general, Operations staff will only need to respond to Critical (Red) alerts that affect production devices (Servers, Virtual Machines, etc.) All production Xymon alerts should also be reflected in Squared Up. Alarms for McAfee (mcshield.exe) should not be reported unless they last for several hours. While Operations staff are primarily concerned with Critical (Red) alerts, they should also be familiar with the various other colors of Xymon alerts and their meanings:
1. After a new Xymon critical alert pops up for a production machine, begin considering which group to contact, if any. If the alert is still present after 20 minutes Operations will need to call the system owner. 2. Is the alert for a clustered device? Some systems, like Mailhub, are clustered, and thus can have several alerts before one needs to take action. Generally clustered machines will NOT alarm in Squared Up for individual boxes, but they will in Xymon. This clue can help an operator determine the severity of the Xymon alert. There are some clustered systems which react in the opposite manner - They will alert in Squared Up but not Xymon until critical mass is reached. In these cases ensure enough machines are in alert before contacting the appropriate admins. 3. Locate the night's planned maintenance in the Footprints Change and Release Management Workspace Calendar to ensure the device is not scheduled to be down. 4. Xymon - Click the status icon (pictured below) along the row corresponding to the trouble server. 5. Some admins have placed instructions for which alarms should be ignored or contact instructions in this page (pictured below is an example of these instructions). Follow any special instructions for the machine OR use the Configuration Management Database (CMDB) to locate the appropriate on-call. Further instructions on the use of the CMDB can be found here: Footprints - Configuration Management 6. Call the on-call and inform them of the situation, affected device, and any other issues that may be cropping up due to the alert. Send a follow-up email. For CPU, memory, and disk alerts, paste the Xymon alert text into the email.
7. Log the contact, and appropriate follow-up activities. Find the group owner of the system page by:
5. If the system name is clickable, there will be special instructions. Follow those instructions
6. Once step 4 is complete, find the system in the Footprints Change and Release management CMDB.
7. If there is no information from the previous steps
|
...
Panel | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||
8:00 AM to 5:00 PM, Monday through Friday:
Outside of 8:00 AM to 5:00 PM, Monday through Friday:
Network Incident On-Call ProcessFor Squared Up Network alarms and trouble calls, triage process to be followed is below:
Special Notes:
Information to Provide When Reporting Network IssuesFor Squared Up alarms report the following information from Squared Up when reporting issue:
For network issues reported by I-Light or the ITaP help desk, campus personnel, students, or visitors, report the following information:
More details can be found on the attached document below:
|
...
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Generator Test TEL Nodes
Crac units Data Centers
TEL Nodes
CamerasStruxureWare also includes monitoring functionality for the cameras in the data centers. This view in StruxureWare should be open at all times on the large screen next to the door to MATH B60.
|
...
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
|
...
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Further information regarding UC4's configuration and operation can be found here: UC4 Intro and Configuration |
...