Shift Log Fields
(https://itcrops.itap.purdue.edu/ioclog/)
...
- Grafana
- This is used when we have network alarms on Grafana. Most production Xymon alarms appear in Grafana so take care to report network alarms as appearing in Grafana and NOT Xymon (which are reported separately).
- StruxureWare
- This is used when recording StruxureWare alarms.
- UC4
- This is used when reporting UC4 issues that do / did not require PCA intervention.
- Xymon
This is used when we have alarms in Xymon. Most production Xymon alarms will display in Grafana, but it is important to know they are Xymon alarms, not Grafana alarms and should be reported as such. EXAMPLE:
19:21 murra175 Data Center and Enterprise Storage Todd Turner 765-496-8214 / ITIDataCenterManagement@purdue.edu Outgoing call Ignore Alarm Metasys: Airstack alarm for module 9 with value of 4. Left VM. Email sent.
19:27 - Received email: Admin advises to ack alarm and ignore.
Follow-up Messaging
- Any and every contact made or attempted with groups outside of the CSC/IOC require follow-up communication. For most teams, this will take the form of an email to the team's mail-list or group. Contacts with Networking will use a Footprints ticket, as described in Grafana IOC Dashboard/ Network Device Alerts, utilizing the template "CSC.IOC. Network Follow Up".
- These emails will always include at least two recipients:
- The group of the contact
- The IOC
- Any additional related group may be added as necessary
Example:
Virtualization has requested contact with Database. The follow-up email is sent to Virtualization's email group, Database's email group, and the IOC email group.
- Follow-up communication should specify who the contact was (with whom specifically you spoke), what the issue was about, what determination or action has been made, and what subsequent action is required, expected, or requested. Include as much specificity as possible.
Example:
Admins,
This is a follow-up email regarding my phone conversation with Todd about Metasys: Airstack alarm for module 9 with value of 4. Per our conversation, we will ignore this alarm until 08:00 17/47/2032, or until advised otherwise. We will continue to report new airstack alarms.
Your signature
Initial Contact | Follow-up Communication | How it is logged | More information | Exceptions |
---|---|---|---|---|
IOC receives call from Admin/user who is not Networking | Email group to whom contact belongs. Include IOC in email. | Specify that an email has been sent in log entry. | if contact is from CSC/IOC | |
IOC receives call from Admin/user who is Networking | Create FootPrints ticket (this generates email automatically). Add ioc@purdue.edu to cc. | Specify that a FootPrints ticket has been created and reference the ticket number in log entry | Grafana IOC Dashboard/ Network Device Alerts | none at time of writing |
IOC calls out | Email group to whom contact belongs. Include IOC in email. | Specify that an email has been sent in log entry. | calls to PUPD/PUFD to notify of certain service disruptions do not require follow-up emails | |
IOC receives an email from Admin/user (including Networking) that contains information or an imperative | Reply to email, acknowledging request or notification. Include IOC in reply. | Specify that an email has been sent in log entry. | if contact is from CSC/IOC | |
IOC sends an email to group or user | no follow-up communication required | Method of contact should make clear that an email has been sent. Log any subsequent responses. | none at time of writing | |
IOC is contacted via means not specified above (Teams, in-person, some sort of message delivery bird, etc.) | Email group to whom contact belongs. Include IOC in email. | Specify that an email has been sent in log entry. | if contact is from CSC/IOC |
Some frequently asked and less-frequently asked questions about follow-up communication:
Question: "I called the admin, and the issue either resolved before I made contact, or as we were speaking. Do I still need a follow-up email even though it turned out to be nothing?"
Answer: Of course. Every contact needs follow-up communication. This serves two (maybe three) major functions:
- It protects us (the CSC/IOC)
- It protects them (the Admin/user/group)
- It makes clear what expectations and requirements we (the service provider) and they (the customer) have, and compels an agreement thereby.
Question: "How does follow-up communication protect us (the CSC/IOC)?"
Answer: By sending a follow-up communication, we can demonstrate that we received the call (or other contact method), responded to the request, and understood the need. The follow-up serves as a receipt, of sorts. For example, should there be a question like "why did this alarm go unreported?", we can produce an impartial email that demonstrates the alarm was, in fact, reported.
Question: "How does follow-up communication protect them (the Admin/user/group)?"
Answer: By sending a follow-up communication, the Admin or group can demonstrate that their instructions were provided and accurate. If something happens, they can show that it was not due to an erroneous or incorrect directive. Further, because we send follow-up communication to the group rather than just the contact, group members can see the course of events and correct or suggest a course of action. This can potentially improve the outcome for the IOC, the group, and the final customers of the service(s) in question/affected.
Question: "I left a voicemail, so I didn't really make contact. Do I still need a follow-up here, because there was no contact made?"
Answer: Leaving a voicemail is considered a contact (perhaps just one delayed), and thus, every contact needs follow-up communication. Additionally, a follow-up communication with the group may alert another member that an issue is outstanding and prompt a faster response and resolution.
Question: "Nobody answered, and I couldn't leave a voicemail, so I didn't really make contact. Do I still need a follow-up here, because there was definitely no contact made?"
Answer: Although contact was not made in this situation, it is perhaps even more critical to send a follow-up communication for just that fact. A follow-up communication with the group may alert a member that an issue is outstanding, and engender their making contact with the service or IOC to determine the issue.
Combining Entries:
- In an effort to streamline the log and make it more efficient please combine entries when applicable.
- Entries that can be combined are typically ones that involve the same issue. Some examples:
- Combining Outage Entries:
- Non Combined
- 8:01 - CSC reports Blackboard is down.
- 8:03 - Notified Admin for reported BB issues.
- 8:10 - Admin reports BB is back up.
- 8:12 - Outage Resolution posted for BB.
- 8:15 - Notified CSC of outage resolution for BB
- 8:16 - Notified ITaP Comm of outage resolution for BB.
- Combined Entry
- 8:01 - CSC reports Blackboard is down. Admin notified.
- For "Group" and "Contact" use the group and name of the person you contacted to look into the issue.
- For "Status" use "Group Notified".
- 8:10 - Admin reports BB is back up. Service Alert Resolved. CSC and ITaP Comm notified.
- For "Group" and "Contact" use the group and the name of the person resolving the outage.
- For "Status" use "Service Alert Resolved"
- The above examples went from 6 to 2 entries. 1 for the reporting of the issue and 1 for the resolution of the issue. The entries were short and contained the pertinent information. The information about who we are contacting from the CSC as well as ITaP Comm is important, but the true meat of the issue is BB was down so the contact for the group/people responsible for fixing it is more important.
- 8:01 - CSC reports Blackboard is down. Admin notified.
EXAMPLE:
19:21 murra175 Data Center and Enterprise Storage Todd Turner 765-496-8214 / ITIDataCenterManagement@purdue.edu Outgoing call Ignore Alarm Metasys: Airstack alarm for module 9 with value of 4. Left VM. Email sent.
19:27 - Received email: Admin advises to ack alarm and ignore. - Non Combined
- Combining Outage Entries:
...