Skip to content

Commit

Permalink
add incident management documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
maryfrances01 committed Aug 11, 2023
1 parent 04afd07 commit 50feedb
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions content/en/docs/internal-documentation/incident-management.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: Incident Management
linkTitle: Incident Management
---

An incident refers to an event that can happen at any given time and may cause a decrease in the quality or complete outage of one or more of our services. Internal or external customers, our monitoring and alerting systems, or a member of the SRE team can raise an incident.

Preparedness for major incidents is crucial. We have established the following Incident Management processes to ensure SREs can follow predetermined procedures:

- [Incident Management Process](https://source.redhat.com/groups/public/service-delivery/service_delivery_wiki/incident_management_process)

- [Incident Response Cheatsheet](https://github.com/openshift/ops-sop/blob/master/policies/incident_response.asciidoc)

- [Automated Incident Management Process (WebRCA)](https://source.redhat.com/groups/public/service-delivery/service_delivery_wiki/automated_incident_management_process)


## Coverage
Layered Products SRE (LPSRE) provides 24x7 coverage and support
with primary and secondary on-call SREs responsible for handling production-related issues.

If you need to escalate an incident, please refer to the
[Layered Products SRE Escalation Procedure](https://source.redhat.com/groups/public/sre/wiki/cs_sre_escalation_procedure).


0 comments on commit 50feedb

Please sign in to comment.