A Lesson for CIOs: IT Incident Management

A Lesson for CIOs IT Incident Management

IT incident management is part of IT service management (ITSM) that should be known by all the CIOs. With IT incident management, the IT team can restore services as soon as possible after a disruption in order to minimize the negative impact made on the business. An Incident is any unexpected event that suddenly disrupts the normal operations of an organization. In order to prevent the occurrence of an incident, preparing for unexpected hardware, software, and security failings is necessary. ITSM frameworks such as the IT infrastructure library (ITIL) or COBIT are very helpful in this case.


While the staff investigating the incident, identifying its root cause, developing as well as rolling out a permanent fix, temporary workarounds are needed for ensuring services are up and running. How IT works and what issues they address are important metrics when determining the specific workflows and processes in IT incident management. For instance, IT incident management workflows can address potential incidents like network slowdown, so that IT staff can prevent potential issues in other areas of the IT deployment. Then, they find a temporary workaround or implement a fix and recovery of the system and release that system back into the production environment.


Incidents are generally classified as low, medium, and high, according to its priorities:

  • Low Priority: Incidents DO NOT interrupt end-users from completing work;
  • Medium Priority: Incidents DO affect end users though, the disruption is slight or brief.
  • High Priority: Incidents DO affect large amounts of end-users and prevent the proper functioning of a system.

Other than the priorities, incidents also classified as hardware, software, or security. However, some issues may be the result of any combination of these areas:

  • Software Incidents: service availability problems or application bugs.
  • Hardware Incidents: downed or limited resources, network issues or other system outages.
  • Security Incidents: attempted and active threats intended to compromise or breach data.


There are 3 levels of support in IT incident management:

  • Level-one support typically provides basic-level support or assistance, such as password resets or computer troubleshooting, incident identification, logging, prioritizing, categorizing, and deciding to escalate to level-two support and incident resolution when appropriate.
  • Level-two support handles more complex issues that need more training, skill or security access to complete. It includes incidents that disrupt a business’s operation, marked as a high priority, and require an immediate response.
  • Level-three support team members are generally specialists in the subject matter of the incident. For example, a level-three support team could include the chief architect and engineers who work on the product or service’s daily operation and maintenance.


There are many tools used by Help Desk and incident management teams for resolving incidents:

  • Monitoring Tools: Pull operations data from across multiple systems, such as on-premises or cloud-based hardware and software.
  • Root Cause Analysis Tools: Sort through operational data, such as logs, which were collected by systems management, application performance monitoring and infrastructure monitoring tools.
  • Incident Management and Automation Platform: Correlate that monitoring data and facilitate response to events, typically with a sophisticated escalation path and method to document the response process. E.g. PagerDuty, VictorOps and xMatters.



Invest in Incident Management Today

Call LIFARS For Tips Today