In 2018, Site Reliability Engineering (SRE) will turn 15 years old. Since Google's inception of the term SRE, companies across the world have adopted a new operations mindset along with automation, deployment and monitoring principals. Most of what SRE does now is well established throughout the industry, so what is the next-wave of reliability principals and automation frameworks?
This session will dive into what the future holds for reliability engineering as a field and what will be the next areas of investment and improvement for reliability teams.
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
The Next Wave of Reliability Engineering
1. The Next Wave of Reliability
Engineering
Michael Kehoe
Staff Site Reliability Engineer
2. Today’s
agenda
1 Introductions
2 Where have we come from
3 What is Reliability Engineering
4 Where are we going
5 The Future of Reliability Engineering
6 Key Takeaways
7 Q&A
4. Michael Kehoe
$ WHOAMI
• Staff Site Reliability Engineer @ LinkedIn
• Production-SRE Team
• Funny accent = Australian + 4 years
American
• Former Network Engineer at the
University of Queensland
5. Production-SRE Team @ LinkedIn
$ WHOAMI
• Disaster Recovery - Planning &
Automation
• Incident Response – Process &
Automation
• Visibility Engineering – Making use of
operational data
• Reliability Principles – Defining best
practice & automating it
21. The Future of Reliability Engineering
Of the Network
Engineer
Evolution
And measure
Observe
Is the new normal
Failure
As a Service
Automation
Is king
Cloud
22. Making the network follow
SRE practices
Dawn of the Network
Reliability Engineer
https://forums.juniper.net/t5/SDN-and-NFV-Era/2018-and-the-Dawn-of-Network-Reliability-Engineering-NRE/ba-p/316915
24. Downgrade failures from
exceptional to expected
Failure is the new
Normal
https://azure.microsoft.com/en-us/blog/inside-azure-search-chaos-engineering/
25. Is the new normal
Failure
• Accept failure as normal
• Test for failure:
• Application
• Local Infrastructure
• Global Infrastructure
• Continuous experimentation
27. Is ubiquitous
Automation
• Automation is expected
• Automation is unified
• No more one-off scripts
• Automation extends to monitoring,
triage & automation
• Automation drives down:
• Time to Detect
• Time to Resolve
29. Is King
Cloud
• Adoption of Private & Public
Clouds will continue
• Most infrastructure will be
ephemeral
• Applications will be engineered to
be ‘Cloud Native’
• Engineering agility will continue to
increase
30. Making the most of
operational data
Observe & Measure
https://www.acronis.com/en-us/blog/posts/web-application-monitoring-basic-framework
31. And measure
Observe
• Machine driven triaging using
tracing and advanced learning
• Advanced analytics on
performance to drive infrastructure
optimization
• Use of incident data to drive
feedback loops
34. Key Takeaways
THE FUTURE OF RELIABILITY ENGINEERING
Of the Network
Engineer
Evolution
And measure
Observe
Is the new normal
Failure
Is ubiquitous
Automation
Is king
Cloud