-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add incident management documentation
- Loading branch information
1 parent
04afd07
commit 50feedb
Showing
1 changed file
with
24 additions
and
0 deletions.
There are no files selected for viewing
24 changes: 24 additions & 0 deletions
24
content/en/docs/internal-documentation/incident-management.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
title: Incident Management | ||
linkTitle: Incident Management | ||
--- | ||
|
||
An incident refers to an event that can happen at any given time and may cause a decrease in the quality or complete outage of one or more of our services. Internal or external customers, our monitoring and alerting systems, or a member of the SRE team can raise an incident. | ||
|
||
Preparedness for major incidents is crucial. We have established the following Incident Management processes to ensure SREs can follow predetermined procedures: | ||
|
||
- [Incident Management Process](https://source.redhat.com/groups/public/service-delivery/service_delivery_wiki/incident_management_process) | ||
|
||
- [Incident Response Cheatsheet](https://github.com/openshift/ops-sop/blob/master/policies/incident_response.asciidoc) | ||
|
||
- [Automated Incident Management Process (WebRCA)](https://source.redhat.com/groups/public/service-delivery/service_delivery_wiki/automated_incident_management_process) | ||
|
||
|
||
## Coverage | ||
Layered Products SRE (LPSRE) provides 24x7 coverage and support | ||
with primary and secondary on-call SREs responsible for handling production-related issues. | ||
|
||
If you need to escalate an incident, please refer to the | ||
[Layered Products SRE Escalation Procedure](https://source.redhat.com/groups/public/sre/wiki/cs_sre_escalation_procedure). | ||
|
||
|