Shift Log Fields

https://www.itap.purdue.edu/operations/ioc-log/production/index.php

Date

Default is to the current day. If starting a log for Third Shift, Keep in mind that all entries should start no earlier than Midnight of the shift day the log is for.
- i.e. If it is 23:45 on October 30, and the shift is mostly on October 31, the entry should be moved up to 00:00 on October 31.

Time

All "Shift Turnover" entries are to have the same timestamp.
Enter the time the specific entry occurred.
- When combining entries, the "Time" should reflect the initial contact made regarding the issue.

Operator

Default is the alias for the Operator that is signed in on the specific session for the log page. If you are entering the information for an Operator that worked the entry, you can change the "Operator" field to their alias.

Group

This would be the "Group" responsible for the entry being made.
- There are times when ITSMO, ITSO Incident Managers, etc., will call in outages and outage status updates for other groups due to staffing levels, or severity of the work being done. When this happens, the "Group" reflected should be the group that is actually working on the issue.
Make sure you are reviewing organizational charts on a regular basis as groups need to be reported as accurately as possible.
If you think a group needs to be added to the drop down box or a change needs to be made to an existing group, please let the supervisors know.
Combining log entries can have an effect on the selection that is being made here. This will be outlined in "Combining Log Entries".

Group Contact

This is the name of the individual you had contact with for the specific entry being made.
- If you are calling a group's on-call number and you get voicemail and they don't identify themselves, please leave the field empty. Do not assume you know who it is.
Make sure you are spelling the name of the individual correctly. It would be best to verify the spelling using the Purdue Directory and NOT previous log entries.
Do not put the alias for an individual in this field. You need to use their name.
Combining log entries can have an effect on the selection that is being made here. This will be outlined in "Combining Log Entries".

Method

This is how we were contacted or how we contacted someone about an issue.
- There is a drop down box for specific pre-populated "Methods". If you think a "Method" is missing, or incorrect, please let the supervisors know. Also use the "Other" option in the drop down list if group is not listed.
Brief explanation of the pre-populated methods:
- FootPrints
  - There are times when a FP Issue will get assigned to us without us being notified via phone of an issue. This would be the method to us.
  - We also generate FP Issues and assign them to other groups like AVS.
- In Person
  - This is used when an individual notifies us in person about an issue.
- Incoming Call
  - This is used when an individual notifies us via phone about an issue.
- Outage Notification
  - This is used when posting outages and or status updates for outages.
- Outgoing Call
  - This is used when placing calls about issues.
- Received e-mail
  - This is used when we are notified about issues or updates on issues via email.
- Sent e-mail
  - This is used when we notify groups / individuals about issues via email.
- Shift Turnover
  - When making entries for "Shift Turnover", do not put the name or alias of the person who gave you turnover as it can be left blank.
Combining log entries can have an effect on the selection that is being made here. This will be outlined in "Combining Log Entries".

Phone Number/Email Address

This is the phone number that you dialed to contact a group, or the email address of the group you sent email to regarding this specific entry.
Phone numbers should be entered using area code and hyphen's as 555-555-5555.
It is not necessary to log the number for incoming calls.
On occasion, an Admin will notify the IOC of an alternate contact number to be used in lieu of the documented on-call number. When this happens please enter the alternate number and in the description field of the log entry state "Admin advises to use alternate contact number".
If you call a group and send an email to them, please enter the information in one entry as noted below.
- EX: 555-555-5555 / ***@***.***
Combining log entries can have an effect on the selection that is being made here. This will be outlined in "Combining Log Entries".

Status

This is the status for the specific entry you are making.
- If combining entries, the status should reflect the overall status of the issue being entered.

Is this related to an outage?

If the log entry is related to a Service Alert click this button and it will open a couple of fields for you to enter information.
- Outage Number
  - This is the unique identifier for the specific Major Incident TDX ticket. It can be found under "Major Incident ID" on the "Major Incidents" field of TDX.
- Outage Name
  - This is the title of the Service Alert. It must match what is listed under "Title" in the "Major Incidents" field.
Any entries that had previously been made that pertain to a Service Alert need to be identified as being related to the Alert. There are times when Alerts are not declared until several entries pertaining to it have been entered, so you must go back and associate the entries with the outage.

Description

NEVER EVER put phone numbers or email addresses in the "Description" field. That information is considered confidential, and if it is in the description field it will be in the log that gets emailed.
If at all possible, please do not use names here. Sometimes a name needs to be entered, but use "Admin", Engineer", "Student", "Instructor", etc. whenever you can. This will limit our exposure to unnecessary mistakes.
If at all possible, please do not reference gender when referring to the person.
Please be short and to the point on your entries. Only pertinent information needs to be entered. Below is an example of an entry that is long that can be shortened:
- Long - "ITSD reports TDX is unavailable. I advised the ITSD I would contact someone on this issue".
- Short - "ITSD reports TDX is unavailable".
- There is no need to write "I advised the ITSD I would contact someone on this issue". Normal procedure dictates we would contact the admins for calls such as this so it should be assumed.
Be consistent in the wording of log entries. There really is no right or wrong way when it comes to wording, but please be consistent throughout the log. Below are examples of 2 entries by the same Operator on the same log for different calls to the ITSD for 2 different outages. The text should be the same with the exception of the specific outage.
- Entry 1 - "Called ITSD to inform them of outage resolution for X"
- Entry 2 - "Notified ITSD of outage resolution for Y"

Entries regarding what app the issue is coming from should be referred to in this section.

Grafana
- This is used when we have network alarms on Grafana. Most production Xymon alarms appear in Grafana so take care to report network alarms as appearing in Grafana and NOT Xymon (which are reported separately).
StruxureWare
- This is used when recording StruxureWare alarms.
UC4
- This is used when reporting UC4 issues that do / did not require PCA intervention.

Xymon

This is used when we have alarms in Xymon. Most production Xymon alarms will display in Grafana, but it is important to know they are Xymon alarms, not Grafana alarms and should be reported as such. EXAMPLE:

19:21

murra175

Data Center and Enterprise Storage

Todd Turner

765-496-8214 / ITIDataCenterManagement@purdue.edu

Outgoing call

Ignore Alarm

Metasys: Airstack alarm for module 9 with value of 4. Left VM. Email sent.

19:27 - Received email: Admin advises to ack alarm and ignore.

Follow-up Messaging

Any and every contact made or attempted with groups outside of the ITSD/IOC require follow-up communication. For most teams, this will take the form of an email to the team's mail-list or group. Contacts with Networking will use a TDX ticket, as described in Grafana IOC Dashboard/ Network Device Alerts (*Needs updated), utilizing the template "IOC Network Follow-Up."
These emails will always include at least two recipients:
- The group of the contact
- The IOC
Any additional related group may be added as necessary

Example:

Windows has requested contact with Database. The follow-up email is sent to Windows's email group, Database's email group, and the IOC email group.

Follow-up communication should specify who the contact was (with whom specifically you spoke), what the issue was about, what determination or action has been made, and what subsequent action is required, expected, or requested. Include as much specificity as possible.

Example:

Admins,

This is a follow-up email regarding my phone conversation with Todd about Metasys: Airstack alarm for module 9 with value of 4. Per our conversation, we will ignore this alarm until 08:00 17/47/2032, or until advised otherwise. We will continue to report new airstack alarms.

Your signature

Initial Contact	Follow-up Communication	How it is logged	More information	Exceptions
IOC receives call from Admin/user who is not Networking	Email group to whom contact belongs. Include IOC in email.	Specify that an email has been sent in log entry.		if contact is from ITSD/IOC
IOC receives call from Admin/user who is Networking	Create TDX ticket, add itns-pdnhlog-ext@lists.purdue.edu in the Contact field, and check notify contacts. (Using the IOC Network Follow Up incident form will add PU-IOC Operator as a requestor and email ioc@purdue.edu.)	Specify that a TDX ticket has been created and reference the ticket number in log entry.	Grafana IOC Dashboard/ Network Device Alerts (*Needs updated)	none at time of writing
IOC calls out to Admin/user who is not Networking	Email group to whom contact belongs. Include IOC in email.	Specify that an email has been sent in log entry.		calls to PUPD/PUFD to notify of certain service disruptions do not require follow-up emails
IOC calls out to Admin/user who is Networking	Create TDX ticket, add itns-pdnhlog-ext@lists.purdue.edu in the Contact field, and check notify contacts. (Using the IOC Network Follow Up incident form will add PU-IOC Operator as a requestor and email ioc@purdue.edu.)	Specify that a TDX ticket has been created and reference the ticket number in log entry.	Grafana IOC Dashboard/ Network Device Alerts (*Needs updated)	none at time of writing
IOC receives an email from Admin/user (including Networking) that contains information or an imperative	Reply to email, acknowledging request or notification. Include IOC in reply.	Specify that an email has been received in log entry.		if contact is from ITSD/IOC
IOC sends an email to group or user	no follow-up communication required	Method of contact should make clear that an email has been sent. Log any subsequent responses.		none at time of writing
IOC is contacted via means not specified above (Teams, in-person, some sort of message-delivery bird, etc.)	Email group to whom contact belongs. Include IOC in email.	Specify that an email has been sent in log entry.		if contact is from ITSD/IOC

Some frequently asked and less-frequently asked questions about follow-up communication:

Question: "I called the admin, and the issue either resolved before I made contact, or as we were speaking. Do I still need a follow-up email even though it turned out to be nothing?"

Answer: Of course. Every contact needs follow-up communication. This serves two (maybe three) major functions:

- - It protects us (the ITSD/IOC)
  - It protects them (the Admin/user/group)
  - It makes clear what expectations and requirements we (the service provider) and they (the customer) have, and compels an agreement thereby.

Question: "How does follow-up communication protect us (the ITSD/IOC)?"

Answer: By sending a follow-up communication, we can demonstrate that we received the call (or other contact method), responded to the request, and understood the need. The follow-up serves as a receipt, of sorts. For example, should there be a question like "why did this alarm go unreported?", we can produce an impartial email that demonstrates the alarm was, in fact, reported.

Question: "How does follow-up communication protect them (the Admin/user/group)?"

Answer: By sending a follow-up communication, the Admin or group can demonstrate that their instructions were provided and accurate. If something happens, they can show that it was not due to an erroneous or incorrect directive. Further, because we send follow-up communication to the group rather than just the contact, group members can see the course of events and correct or suggest a course of action. This can potentially improve the outcome for the IOC, the group, and the final customers of the service(s) in question/affected.

Question: "I left a voicemail, so I didn't really make contact. Do I still need a follow-up here, because there was no contact made?"

Answer: Leaving a voicemail is considered a contact (perhaps just one delayed), and thus, every contact needs follow-up communication. Additionally, a follow-up communication with the group may alert another member that an issue is outstanding and prompt a faster response and resolution.

Question: "Nobody answered, and I couldn't leave a voicemail, so I didn't really make contact. Do I still need a follow-up here, because there was definitely no contact made?"

Answer: Although contact was not made in this situation, it is perhaps even more critical to send a follow-up communication for just that fact. A follow-up communication with the group may alert a member that an issue is outstanding, and engender their making contact with the service or IOC to determine the issue.

Combining Entries:

In an effort to streamline the log and make it more efficient please combine entries when applicable.

Entries that can be combined are typically ones that involve the same issue. Some examples:

Combining Outage Entries:

Non Combined
- 8:01 - ITSD reports Brightspace is down.
- 8:03 - Notified Admin for reported Brightspace issues.
- 8:10 - Admin reports Brightspace is back up.
- 8:12 - Outage Resolution posted for Brightspace.
- 8:15 - Notified ITSD of outage resolution for Brightspace.
- 8:16 - Notified PUIT Comm of outage resolution forBrightspace.
Combined Entry
- 8:01 - ITSD reports Brightspace is down. Admin notified.
  - For "Group" and "Contact" use the group and name of the person you contacted to look into the issue.
  - For "Status" use "Group Notified."
- 8:10 - Admin reports Brightspace is back up. Service Alert Resolved. ITSD and PUIT Comm notified.
  - For "Group" and "Contact" use the group and the name of the person resolving the outage.
  - For "Status" use "Service Alert Resolved."
- The above examples went from 6 to 2 entries. 1 for the reporting of the issue and 1 for the resolution of the issue. The entries were short and contained the pertinent information. The information about who we are contacting from the ITSD as well as ITaP Comm is important, but the true meat of the issue is BB was down so the contact for the group/people responsible for fixing it is more important.

EXAMPLE:

19:21

murra175

Data Center and Enterprise Storage

Todd Turner

765-496-8214 / ITIDataCenterManagement@purdue.edu

Outgoing call

Ignore Alarm

Metasys: Airstack alarm for module 9 with value of 4. Left VM. Email sent.

19:27 - Received email: Admin advises to ack alarm and ignore.

Ignore Alarms

There are times when an admin advises us to ignore an alarm/alert until further notice. If that happens, the entry needs to remain on the log for three consecutive shift logs. After the entry has cycled through three logs, the entry needs to be moved to the shift log calendar and not made as a regular log entry. If at all possible. please try to get the admin to commit to a time period or definition of what “until further notice” means. If the admin provides a timeframe, use that for the duration on the calendar. If the admin can’t/won’t provide a period of time then put it on the calendar for the duration of a week. Once the duration has ended then the admin will need to be contacted to provide an update on the alarm/alert. Remember it is part of your daily duties to check the shift log calendar on a daily basis.

How to add a calendar entry:

First, you must get to a page with the "Add Event to Calendar" button. You can click on any current or upcoming event from the active log or select the Calendar option from the Site Utilities dropdown menu (calendar will usually take a little time to load). In either case, the add event button will be in the top left of the next page.

Second, you fill in details as necessary. As noted in the below screenshot, start and end dates must be entered in MM/DD/YYYY format and the end date must be after the start date to save properly. Event titles must be maximum 32 characters or the entry will not save. Editing an entry and using a title that is too long will also prevent that entry from saving again, essentially deleting it from the calendar.

Example errors:

How to edit a calendar entry

Any event can be edited by clicking on the pencil icon from its page.

If editing a previous/ongoing entry and adding new information, these edits are usually separated on different lines with the edit date added in front for clarity.

If editing an entry to extend how long it appears for, all you need to do is change the end date field to a later date.

How to end/remove a calendar entry

If an alarm clears, it can typically be removed from the calendar. In this context, removing means editing the event and changing its end date to the current day or a previous date as needed. If you delete an event, that removes it from the entire calendar.

Preferred method: editing the end date

Deletion is not generally required.

Asking for follow-up on DCM calendar entries

As mentioned further above, once an event's duration has ended, the admin will need to be contacted to provide an update on the alarm/alert. For any Data Center Management (DCM) alarms, that contact happens about once a month, before the ongoing events expire. (Entries can be temporarily extended until we get a reply.) Depending on their reply, we can extend the events by another month, add/remove any requested information from specific entries, or end the events as needed.

While it may be easiest to copy and paste entries from the previous month's email, the general email format is copied below:

To: ITIDataCenterManagement@groups.purdue.edu

Cc: ioc@groups.purdue.edu

Subject: Follow-up about Struxureware/Siemens/Metasys/walkthrough alarms to ignore in calendar (Monitoring systems can be removed or added as necessary.)

Body:

Admins,

Are the following alarms to ignore still current?

Calendar Entry Formatting

Group - Title: Description - current event ending date (All text would typically be left black, but this color legend helps show how the different fields are copied into the email. The event screenshot below also has the relevant areas circled/highlighted in similar colors.)

Example entries:

DCM - Strux: pfw and pnw devices: StruxureWare: new devices being added for PFW and PNW. Ignore these groups until further notice. - 01/08/2024

DCM - MSYS: Module 4, 7 & 9:
Metasys: Module 9 Airstack fault value 2 - Ignore alarm.
10/03/23 - Module 7 Airstack fault value 4 - Ignore alarm.
10/06/23 - Module 4 Airstack fault value 3 - Ignore alarm. - 01/08/2024

DCM - Walkthru alarms to ignore:
Always ignore A31 alarm for MATH-G190-CIT-CDU-01 since it is plumbed incorrectly. Ignore A28 for MATH-G190-CDU-A PDU-3 C28 unless water is pooling in unit. G190 CDU-F had alarm codes "w23" primary filter dirty. Ignore G190-PDU-20 alarm and dead screen.
10/3/23 - Math-g190-crac-8 has been shut off and will be removed eventually.
10/23/23 - Walkthroughs: Math B60 CRAC-1 error message "REM SENSOR 3 FAILURE." Ignore alarm.
11/6/23 - Math B60 unit CAB 4 active alarm "Internal Comm Error." Ignore Alarm Math B60 crac-02 and 03. Both listing "general" and "WRN REM SENSOR 3 FAILURE." Ignore alarm.
11/23/23 - Fast, steady drip coming from sixth Negishi unit door (K-10). Admin disabled MATH G109 Rack K-10 Rear Door Cooling. RCAC Engineers and DCM will investigate further on Monday (11-27). - 01/08/2024

DCM - Siemens: Unack Report:
Siemens: Alerts MAP4W1, MAP4W2, MAP5CA, MAP5CAL, MAP5W1, MAP5W2 in Unack report related to MATH DC Issues.
10/19/23 - Siemens: Siemens report was not working correctly after power outage. Admin will have vendor investigate.
10/27/23 - There are currently no alarms in Siemens. However, continue to ignore all Siemens alarms as the backup chiller is offline since the campus power outage. - 01/08/2024

DCM - TEL/MATH Key Card Checkout:
Vertical Horizon Networks has checked out TEL key cards #596076 and #596077 and return them after work is finished in 30 days. Ron - Contractor received TEL Cards as of 06/06/2022
7/24/22 - Keycards are still missing from IOC binder - assuming these have not been returned yet.
8/26/22 - Keycards checked out by Ron and Greg until further notice, approval given by Todd.
8/14/23 - Todd took TEL Department Guest cards/MATH key cards 600083, 600084, and 600085 to be issued to contractors.
8/18/23 - 600083, 600084, and 600085 issued to Greg Piercy, Quentin Foley, and Nathan Lane respectively. - 01/08/2024

These summary statements are separated by monitoring system. Screenshots can be included for clarity. If a system has no alarms, then it can be omitted. You may also state that [insert monitoring system] has no noteworthy alarms.

Example statements:

The only red alarms currently showing in Metasys are for Airstack Modules 4 and 7.

The only red alarms currently showing in StruxureWare that are not marked for maintenance are haas-234-pdu-01, math-g190-crac-01, 2550-1370a-crac-01, tel-210-crac-03, math-b60-crac-03, and lamb-20-apc-5000-02.

The only unacknowledged alarms currently showing in Siemens are for MAP4W1, MAP4W2, MAP4WET, MAP5CA, MAP5CAL, MAP5W1, and MAP5W2.

TEL Key Cards #596076 and #596077 are still checked out by Ron and Greg with Vertical Horizon Networks. MATH Key Cards #600083, #600084, and #600085 are still issued to Greg Piercy, Quentin Foley, and Nathan Lane with D.A. Dodd contractors.

Signature:
[insert name]
Purdue IT IOC/Service Desk Specialist
(765) 496-7272

Example replies from DCM (original text left black and edits left red as they usually appear in the email):

DCM - Strux: pfw and pnw devices: StruxureWare: new devices being added for PFW and PNW. Ignore these groups until further notice. - 12/13/2023 continue

DCM - Walkthru alarms to ignore:
Always ignore A31 alarm for MATH-G190-CIT-CDU-01 since it is plumbed incorrectly. continue Ignore A28 for MATH-G190-CDU-A PDU-3 C28 unless water is pooling in unit. continue G190 CDU-F had alarm codes "w23" primary filter dirty. remove Ignore G190-PDU-20 alarm and dead screen. continue
10/3/23 - Math-g190-crac-8 has been shut off and will be removed eventually. continue
10/23/23 - Walkthroughs: Math B60 CRAC-1 error message "REM SENSOR 3 FAILURE." Ignore alarm. continue
11/6/23 - Math B60 unit CAB 4 active alarm "Internal Comm Error." Ignore Alarm Math B60 crac-02 and 03. Both listing "general" and "WRN REM SENSOR 3 FAILURE." Ignore alarm. continue
11/23/23 - Fast, steady drip coming from sixth Negishi unit door (K-10). Admin disabled MATH G109 Rack K-10 Rear Door Cooling. RCAC Engineers and DCM will investigate further on Monday (11-27). - 12/13/2023 continue

DCM - Strux: math-g109-pdu-06:
StruxureWare: The output current in phase L1 has risen above the output current high threshold of 320A math-g109-pdu-06.itap.purdue.edu. Call if sensor reaches 360A.
11/4/23 - StruxureWare: Admin testing alarm parameters. Ignore alarms for math-g109-pdu-06. - 12/03/2023 remove

Browser not supported