SlideShare a Scribd company logo
1 of 23
Download to read offline
Winston
Diagnostic and Remediation Engineering (DaRE)
Vinay Shah & Jean-Sebastien Jeannotte
● Introduction
● Internals - How it works?
● Demo - See it in action!
● Learnings and challenges
● Metrics & Road ahead
● Additional resources
Topics
Introduction
Landscape
Operational load vs.
new features
Scale and Growth Availability
Application or
Service
Monitoring
Alerting
Pagerduty Email Winston
● Reduce MTTR
● Reduce risk of human errors
● Reduce pager fatigue, provide tier 1 support
● Don’t worry about infrastructure, focus on your business logic
● Best practice for runbook lifecycle management
Business goals
Winston is an event driven runbook
automation platform. It is designed to host
and execute runbooks in response to
operational events.
Internals
Howisitdeployed?
Execution Flow
● One stop portal for all things Winston
● Supports Create, Read, Update, Delete, Execute and Diagnose functionality
● Implements best practises
○ Compliance/Auditing
○ Persistence
○ Security (Authentication/Authorization)
● Self serve & scalable
Winston Studio
● Pack
A group of related automations typically organized around a discreet
service or product
● Action
Set of steps to help with diagnostics or remediations written as code
● Event & event source
External services that are the source of events that trigger a runbook
Terminology
Demo
Winston Studio
DEMO
● False positives
○ Cassandra ring health
● Diagnostics - correlation could point towards causation - e.g:
○ Querying Chronos events
○ Querying dependencies upstream and downstream for anomalous behaviour
● Remediation
○ Clean up disk space
○ Restart Kafka process
Sample use cases
Learnings &
challenges
Common patterns
● Usage
○ Culture of automating the manual and repeatable
○ Noisy signals become more interesting
○ Lesser the control more the opportunity
● Product
○ Safety is crucial
○ Usability is important
○ Resiliency
Insights
● Don’t reinvent the wheel
● Start simple and iterate
● Allow experimentation
● Pay special care to usability of your product
● Push for changing the culture - usage will follow
● Talk to us/others who have gone through some of the pains and learnings
Recommendations to get started
Metrics and
Road ahead
● Adoption. Adoption. Adoption.
● Usability
○ Polyglot support (Groovy based actions)
○ Deeper Integrations
● Safety
○ Resource isolation (Containers)
○ Rate limiting
The road ahead
● Introducing Winston:
http://techblog.netflix.com/2016/08/introducing-winston-event-driven.html
● Stackstorm: https://docs.stackstorm.com/
● Reach out: vshah@netflix.com or jjeannotte@netflix.com
We are hiring
Senior Software Engineer - https://jobs.netflix.com/jobs/860752
Links & resources
Thank you.

More Related Content

What's hot

What's hot (20)

State transfer With Galera
State transfer With GaleraState transfer With Galera
State transfer With Galera
 
Running Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft AzureRunning Galera Cluster on Microsoft Azure
Running Galera Cluster on Microsoft Azure
 
A Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility CloakA Forgotten HTTP Invisibility Cloak
A Forgotten HTTP Invisibility Cloak
 
44CON 2014 - Meterpreter Internals, OJ Reeves
44CON 2014 - Meterpreter Internals, OJ Reeves44CON 2014 - Meterpreter Internals, OJ Reeves
44CON 2014 - Meterpreter Internals, OJ Reeves
 
Caching solutions with Redis
Caching solutions   with RedisCaching solutions   with Redis
Caching solutions with Redis
 
Swagger With REST APIs.pptx.pdf
Swagger With REST APIs.pptx.pdfSwagger With REST APIs.pptx.pdf
Swagger With REST APIs.pptx.pdf
 
Swagger UI
Swagger UISwagger UI
Swagger UI
 
Workshop Spring - Session 1 - L'offre Spring et les bases
Workshop Spring  - Session 1 - L'offre Spring et les basesWorkshop Spring  - Session 1 - L'offre Spring et les bases
Workshop Spring - Session 1 - L'offre Spring et les bases
 
Proxysql use case scenarios fosdem17
Proxysql use case scenarios    fosdem17Proxysql use case scenarios    fosdem17
Proxysql use case scenarios fosdem17
 
Introduction to Apache Camel
Introduction to Apache CamelIntroduction to Apache Camel
Introduction to Apache Camel
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
 
The secret life of a dispatcher (Adobe CQ AEM)
The secret life of a dispatcher (Adobe CQ AEM)The secret life of a dispatcher (Adobe CQ AEM)
The secret life of a dispatcher (Adobe CQ AEM)
 
Building a REST Service in minutes with Spring Boot
Building a REST Service in minutes with Spring BootBuilding a REST Service in minutes with Spring Boot
Building a REST Service in minutes with Spring Boot
 
Rest API with Swagger and NodeJS
Rest API with Swagger and NodeJSRest API with Swagger and NodeJS
Rest API with Swagger and NodeJS
 
(참고) Elk stack 설치 및 kafka
(참고) Elk stack 설치 및 kafka(참고) Elk stack 설치 및 kafka
(참고) Elk stack 설치 및 kafka
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
Spring mvc
Spring mvcSpring mvc
Spring mvc
 
Thick Application Penetration Testing: Crash Course
Thick Application Penetration Testing: Crash CourseThick Application Penetration Testing: Crash Course
Thick Application Penetration Testing: Crash Course
 
Building Advanced XSS Vectors
Building Advanced XSS VectorsBuilding Advanced XSS Vectors
Building Advanced XSS Vectors
 
Spring Boot and REST API
Spring Boot and REST APISpring Boot and REST API
Spring Boot and REST API
 

Similar to Winston - Netflix's event driven auto remediation and diagnostics tool

Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Ontico
 

Similar to Winston - Netflix's event driven auto remediation and diagnostics tool (20)

The Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security TestingThe Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security Testing
 
Webinar: Estrategias para optimizar los costos de testing
Webinar: Estrategias para optimizar los costos de testingWebinar: Estrategias para optimizar los costos de testing
Webinar: Estrategias para optimizar los costos de testing
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing Analytics
 
Agile testing (n)
Agile testing (n)Agile testing (n)
Agile testing (n)
 
Testing in a continuous delivery environment
Testing in a continuous delivery environmentTesting in a continuous delivery environment
Testing in a continuous delivery environment
 
Demise of test scripts rise of test ideas
Demise of test scripts rise of test ideasDemise of test scripts rise of test ideas
Demise of test scripts rise of test ideas
 
Managing software projects & teams effectively
Managing software projects & teams effectivelyManaging software projects & teams effectively
Managing software projects & teams effectively
 
Dscrum
DscrumDscrum
Dscrum
 
AWS Well Architected Framework in Summary
AWS Well Architected Framework in SummaryAWS Well Architected Framework in Summary
AWS Well Architected Framework in Summary
 
Agile Development Practices May 2017
Agile Development Practices May 2017Agile Development Practices May 2017
Agile Development Practices May 2017
 
Pusheando en master, que es gerundio
Pusheando en master, que es gerundioPusheando en master, que es gerundio
Pusheando en master, que es gerundio
 
Using SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterpriseUsing SaltStack to DevOps the enterprise
Using SaltStack to DevOps the enterprise
 
Services, tools & practices for a software house
Services, tools & practices for a software houseServices, tools & practices for a software house
Services, tools & practices for a software house
 
Software management for tech startups
Software management for tech startupsSoftware management for tech startups
Software management for tech startups
 
Metrics 4 faster feedback
Metrics 4 faster feedbackMetrics 4 faster feedback
Metrics 4 faster feedback
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
 
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
Making Sites Reliable (как сделать систему надежной) (Павел Уваров, Андрей Та...
 
Usa prácticas de integración continua y sobrevive para luchar otro día.
 Usa prácticas de integración continua y sobrevive para luchar otro día. Usa prácticas de integración continua y sobrevive para luchar otro día.
Usa prácticas de integración continua y sobrevive para luchar otro día.
 
3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile3 Keys to Performance Testing at the Speed of Agile
3 Keys to Performance Testing at the Speed of Agile
 
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Winston - Netflix's event driven auto remediation and diagnostics tool

  • 1. Winston Diagnostic and Remediation Engineering (DaRE) Vinay Shah & Jean-Sebastien Jeannotte
  • 2. ● Introduction ● Internals - How it works? ● Demo - See it in action! ● Learnings and challenges ● Metrics & Road ahead ● Additional resources Topics
  • 4. Landscape Operational load vs. new features Scale and Growth Availability
  • 6. ● Reduce MTTR ● Reduce risk of human errors ● Reduce pager fatigue, provide tier 1 support ● Don’t worry about infrastructure, focus on your business logic ● Best practice for runbook lifecycle management Business goals
  • 7. Winston is an event driven runbook automation platform. It is designed to host and execute runbooks in response to operational events.
  • 11. ● One stop portal for all things Winston ● Supports Create, Read, Update, Delete, Execute and Diagnose functionality ● Implements best practises ○ Compliance/Auditing ○ Persistence ○ Security (Authentication/Authorization) ● Self serve & scalable Winston Studio
  • 12. ● Pack A group of related automations typically organized around a discreet service or product ● Action Set of steps to help with diagnostics or remediations written as code ● Event & event source External services that are the source of events that trigger a runbook Terminology
  • 13. Demo
  • 15. ● False positives ○ Cassandra ring health ● Diagnostics - correlation could point towards causation - e.g: ○ Querying Chronos events ○ Querying dependencies upstream and downstream for anomalous behaviour ● Remediation ○ Clean up disk space ○ Restart Kafka process Sample use cases
  • 18. ● Usage ○ Culture of automating the manual and repeatable ○ Noisy signals become more interesting ○ Lesser the control more the opportunity ● Product ○ Safety is crucial ○ Usability is important ○ Resiliency Insights
  • 19. ● Don’t reinvent the wheel ● Start simple and iterate ● Allow experimentation ● Pay special care to usability of your product ● Push for changing the culture - usage will follow ● Talk to us/others who have gone through some of the pains and learnings Recommendations to get started
  • 21. ● Adoption. Adoption. Adoption. ● Usability ○ Polyglot support (Groovy based actions) ○ Deeper Integrations ● Safety ○ Resource isolation (Containers) ○ Rate limiting The road ahead
  • 22. ● Introducing Winston: http://techblog.netflix.com/2016/08/introducing-winston-event-driven.html ● Stackstorm: https://docs.stackstorm.com/ ● Reach out: vshah@netflix.com or jjeannotte@netflix.com We are hiring Senior Software Engineer - https://jobs.netflix.com/jobs/860752 Links & resources