Site Reliability Engineering Collection

An ongoing & curated collection of awesome SRE software and tools, libraries and frameworks, engineering books and blogs, philosophical principles, technical guidelines, practical tools about the field of Site Reliablity Engineering (SRE).

What is Site Reliability Engineering?

"Fundamentally, it's what happens when you ask a software engineer to design an operations function." - Ben Treynor Sloss, VP Google Engineering, founder of Google SRE

`Table of Contents`

Culture
Education
Books
Hiring
Reliability
Monitoring & Observability & Alerting
On-Call
Post-Mortem
Capacity Planning
Service Level Agreement
Performance
Programming
Misc Articles
Real-time Messaging
Blogs
Newsletters
Conferences & Meetups
Twitter
SRE Tools

⬆ back to top

`Culture`

⬆ back to top

`Education`

⬆ back to top

`Books`

Back to top

`Hiring`

Back to top

`Reliability`

Back to top

`Monitoring & Observability & Alerting`

Back to top

`On-Call`

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

#sre channel at Hangops Slack - Discussion of Site Reliability Engineering generally.
#incident_response channel at Hangops Slack - Discussion about Incident Response.
USENIX SREcon Slack

Blogs

Brendan Gregg's Blog - Highly Technical Blog Posts About Systems Internals, Performance and SRE.
Everything Sysadmin - Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
High Scalability - Technical Blog Posts About Systems Architecture.
rachelbythebay - Techincal Blog Posts.
Susan J. Fowler - Various blog posts about SRE, Software Engineering and Microservices.
SysAdvent - One article for each day of December, ending on the 25th article.
Stephen Thorne's Blog - Blog Posts About SRE
Increment - A digital magazine about how teams build and operate software systems at scale.
GopherSRE - Blog Posts about Go and SRE.
Cindy Sridharan - Blog posts about distributed systems and their management.
Blameless Blog - Blog posts about SRE culture and practices.
Resilience Roundup - Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
Squadcast Blog - Blog posts about SRE best practices, reliability, on-call and incident management.
FireHydrant Blog - Posts about complex systems, incident response, and SRE best practices.
Rootly Blog - Incident management best practices and guides.
incident.io Blog - Guides, advice and resources on incident management and response.
Logit.io Blog - Resources on log management, SRE and devOps.

Newsletters

DevOpsLinks - A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
KubeWeekly - The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
SRE Weekly - Weekly Site Reliability Newsletter.
O’Reilly Systems Engineering and Operations Newsletter - Weekly systems engineering and operations news and insights from industry insiders.
ChaosEngineering.news - Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!

Conferences & Meetups

SRECon Conferences - The Official SRE Conference.
LISA Conferences - Prominent Conference About SysAdmin/DevOps/SRE.
SRE Tech Talks - SRE Talks Hosted by Google.
South Bay Site Reliability Engineering (Sunnyvale, CA) Meetup - A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.
San Francisco Reliability Engineering - A Group Of People Who Are Passionate About Reliable, Performant Software Systems.
Site Reliability Engineering Munich, Germany - SRE Meetup in the greater area of Oktoberfest city.
ADDO - All Day DevOps - A 24 hour conference that is completely online and free.
Site Reliability Engineering Paris, France - SRE Meetup in the city of light.
Site Reliability Engineering India - SRE Meetup India

Twitter

Google SRE Twitter Account - Google's SRE Twitter Account.
SREBook - The Official Twitter Account of Site Reliability Engineering Book.
SREcon - SRECon's Official Twitter Account.
SREWorkbook - The Official Twitter Account of Site Reliability Workbook.
The SRE Dev - SRE-related Posts from dev.to.
Twitter SRE - The Official Twitter Account of Twitter's SRE team.
Twitter SRE Weekly - The Official Twitter Account of SRE Weekly Newsletter.
USENIX Association - The Official USENIX Twitter Account.

SRE Tools

Awesome SRE Tools - A curated list of Site Reliability and Production Engineering tools
List of Continuous Integration services
SRE cheat sheet - A cheat sheet for Site Reliability Engineering principles and numbers
[SRE Capability Map] (https://www.cruform.com/sre-capability-map/) - Overview of all things SRE

⬆ back to top

Contributing

You are most welcome to contribute to this Awesome Community list as well. Big thanks to all current contributors who have helped build this Awesome Community list.

License

To the extent possible under law, Exajobs has waived all copyright and related or neighboring rights to this work.

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Site Reliability Engineering Collection

What is Site Reliability Engineering?

`Table of Contents`

`Culture`

`Education`

`Books`

`Hiring`

`Reliability`

`Monitoring & Observability & Alerting`

`On-Call`

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

Newsletters

Conferences & Meetups

Twitter

SRE Tools

Contributing

License

About

Releases

Packages

License

exajobs/sre-collection

Folders and files

Latest commit

History

Repository files navigation

Site Reliability Engineering Collection

What is Site Reliability Engineering?

Table of Contents

Culture

Education

Books

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Real-time Messaging

Blogs

Newsletters

Conferences & Meetups

Twitter

SRE Tools

Contributing

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

`Table of Contents`

`Culture`

`Education`

`Books`

`Hiring`

`Reliability`

`Monitoring & Observability & Alerting`

`On-Call`

Packages