/
High Temperature Alarms

High Temperature Alarms

 and Their Management and Reporting


Both red and yellow temperature alarms that appear in StruxureWare should be promptly investigated and closely monitored:



For a single(HYPERLINK1) temperature alarm, verify trends (as described below) and email DCM at ITIDataCenterManagement@groups.purdue.edu.

For multiple temperature alarms, or a single alarm with multiple sensors in agreement(HYPERLINK2) (as described below), call DCM and send follow-up email:


Business Hours Mon-Fri, 8am to 5pm:   Greg 765-496-2456, Joe Wenzel 765-532-3624 

After Business Hours:  Joe Wenzel 765-532-3624, Todd Turner 765-496-8214 

If there is no answer from the primary on-call, leave a message (if possible) and promptly send a follow-up email. Then, immediately try the next person on the list.

Continue calling until you reach someone. If you are unable to reach anyone within 20 minutes of calling, contact the IOC on-call (765-49-64272).



In StruxureWare, the normal view for IOC focuses on "Monitoring":


To evaluate the disposition of a temperature alarm or alarms, click on "Reports":

Click on "Saved Reports" and select a report of interest. In this example, the report relating to TEL 210 temperatures (TEL 210 Temps) is selected:

Double-click the report title or click on "Generate Report". The report will take a moment to load:

When complete, a graph covering the last 24 hours will appear:

Interpreting the Condition of a Room

Single(HYPERLINK1) High Temperature Alarm:

In the event that StruxureWare Reports a single high temperature alarm for a datacenter space, the IOC operator should assess the condition of the space to determine if a call to DCM is warranted.



Here, a single high temperature alarm is reported for the device hostname tel-210-cra-02, located in TEL 210; the monitored device name (device label) is TEL-210-AC2-4 (red circle). To determine if this alarm requires a call to DCM, a report should be generated as described above to examine the overall condition of the affected space.

The IOC operator is assessing the condition of the room. The alerting device (identifiable from the device name provided in the alert description, and then matched to the devices below the graph; red circle. Also indicated by hovering mouse cursor over a line, as shown) shows distinct and exaggerated spikes in temperature (purple boxes). According to this sensor device, temperatures in this space jump from around 65°F to ~87°F, then again from 65°F to almost 95°F, several times.

However, in this case, other sensor devices in the room are not in agreement; the other general shape and range of lines shown in the graph are consistent with both each other and themselves (purple ellipses), even (perhaps, especially) when spikes are recorded by TEL-210-AC2-4. They remain (relatively) smooth, and do not spike with the same magnitude, or at the same time:


This should suggest to an intrepid and clever operator that the preponderance of the evidence supports the assumption that the space is not overheating; thus, it is likely that the device itself is also not overheating; and ultimately, the alert, if not false or erroneous, is not sufficiently critical to warrant a call to DCM. This alert should be reported by email to DCM for further, non-urgent investigation.


Alternatively, this report below shows multiple sensors in agreement(HYPERLINK2). This indicates a datacenter space experiencing a critical temperature event. NOTE: it is not a requirement for every sensor displayed in the report to also generate an alarm in StruxureWare monitoring. As few as one alarm may be present. However, multiple sensors may agree with the alarming device, and this is the indication to the IOC operator that prompt contact with DCM is warranted:


From the data presented, temperatures can be observed to remain relatively stable from each reporting device, and within reasonable temperature ranges (y-axis) from T=00:00 until T≈13:00 (x-axis)(purple rectangle). At this point, however, all sensors report a roughly congruent change (magnitude) at essentially the same time, and for roughly the same duration (red rectangle). Given that these sensors are in agreement, one might reasonably infer that the space is, in fact, experiencing a temperature event (and therefore, likely a prior, unobserved event; i.e. CRAC failure, fire, etc.), and DCM should be notified by phone and email immediately.

Another example is provided below:

It should be noted that not every reporting device need be in agreement to warrant action; it could be that a source of excessive heat has not yet permeated the space sufficiently to be detected by every sensor. Sensors further and lower will likely detect heat events later than those closer and higher relative to the source. If no thermal event is the ideal, then detecting and identifying one before it is detectable across the space is the next best thing. Thus, the IOC operator should be considering the space as a whole; if multiple sensors indicate a change in temperature, it is strong evidence that the space is, or will be, experiencing an overall change in temperature.


In any event where you are unsure of or concerned by an alarm, contact an IOC supervisor or call DCM. It is much better to report a non-issue than to ignore a critical one.



Related content

Struxureware Setup and Monitoring Procedures
Struxureware Setup and Monitoring Procedures
More like this
Siemens Data Center Monitoring Tool
Siemens Data Center Monitoring Tool
More like this
Panalytical XRD System Maintenance Log
Panalytical XRD System Maintenance Log
More like this
IOC Monitoring Instructions
IOC Monitoring Instructions
More like this