SlideShare a Scribd company logo
1 of 10
Brian Cory Sherwin
Site Reliability Engineer
LinkedIn
AutoRemediation and Workflow
@ LinkedIn
• Key Concepts
• Work Flow Ideas
• LinkedIn’s Solution
3
Agenda
Agenda
• Monitoring Systems
• Remediation Systems
• Action Systems
4
Separation of Powers
WorkflowMonitoring
Action
Systems
Restart An Application
• Grab some logs
• Start an application
• Open a ticket
5
Gather Data Restart
Ticket
Simple Work Flow Example
Key Goals
• Broker between action systems
• Linear Execution of Events
• Collaboration and ease of use
• Focus on Simple use cases
6
Remediation
Broker
Monitoring
Remote
Execution
Ticketing
Building an AutoRemedation @ LinkedIn
• Guaranteed Data Collection
• Better Accountability
• Formalized automation
• Extensibility
7
Gather Data Restart
Ticket
Why Use a Workflow
• Linear Execution
• Best Effort
• Limited Work Flow Control
8
Work Flow @ LinkedIn
Remediation
Broker
Monitoring
Remote
Execution
Ticketing
Work Flow Control Types
• Best Effort
• Guaranteed
• Abort
• OnFailure (planned)
9
Gather Data Restart
Ticket
LinkedIn: Work Flow Control
• Brian Cory Sherwin (bcs)
• LinkedIn
• bsherwin@linkedin
10
Questions?
Questions?

More Related Content

Similar to AutoRemediation and Workflow at LinkedIn

The 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 ReadinessThe 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 ReadinessAdam Levithan
 
AP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenterAP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenterBrian Huff
 
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)Jeff Ho
 
Making auditing great again! Office 365
Making auditing great again! Office 365Making auditing great again! Office 365
Making auditing great again! Office 365Paul Hunt
 
How to build a change workflow process
How to build a change workflow processHow to build a change workflow process
How to build a change workflow processTufin
 
What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013CTE Solutions Inc.
 
Governance Strategies for Office 365
Governance Strategies for Office 365Governance Strategies for Office 365
Governance Strategies for Office 365Montrium
 
Nicole Larsen-Portfolio
Nicole Larsen-PortfolioNicole Larsen-Portfolio
Nicole Larsen-PortfolioNicole Larsen
 
How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)plumbee
 
Best Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the CloudBest Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the CloudPerficient, Inc.
 
NetSuite Data Mining and Reporting
NetSuite Data Mining and ReportingNetSuite Data Mining and Reporting
NetSuite Data Mining and ReportingHassan RB
 
Beyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical TrialsBeyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical TrialsMontrium
 
2008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv22008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv2kalawhite3
 
2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountability2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountabilityIntacct Corporation
 
Метрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектораМетрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектораSQALab
 
13 - Building Info Systems
13 -  Building Info Systems13 -  Building Info Systems
13 - Building Info SystemsHemant Nagwekar
 

Similar to AutoRemediation and Workflow at LinkedIn (20)

The 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 ReadinessThe 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 Readiness
 
AP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenterAP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenter
 
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
 
Making auditing great again! Office 365
Making auditing great again! Office 365Making auditing great again! Office 365
Making auditing great again! Office 365
 
How to build a change workflow process
How to build a change workflow processHow to build a change workflow process
How to build a change workflow process
 
What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013
 
Governance Strategies for Office 365
Governance Strategies for Office 365Governance Strategies for Office 365
Governance Strategies for Office 365
 
Nicole Larsen-Portfolio
Nicole Larsen-PortfolioNicole Larsen-Portfolio
Nicole Larsen-Portfolio
 
TenT-Day10.pptx
TenT-Day10.pptxTenT-Day10.pptx
TenT-Day10.pptx
 
TenT-Day10.pptx
TenT-Day10.pptxTenT-Day10.pptx
TenT-Day10.pptx
 
How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)
 
Best Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the CloudBest Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the Cloud
 
Bizcompass Presentation
Bizcompass Presentation Bizcompass Presentation
Bizcompass Presentation
 
NetSuite Data Mining and Reporting
NetSuite Data Mining and ReportingNetSuite Data Mining and Reporting
NetSuite Data Mining and Reporting
 
Beyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical TrialsBeyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical Trials
 
Super charged prototyping
Super charged prototypingSuper charged prototyping
Super charged prototyping
 
2008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv22008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv2
 
2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountability2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountability
 
Метрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектораМетрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектора
 
13 - Building Info Systems
13 -  Building Info Systems13 -  Building Info Systems
13 - Building Info Systems
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

AutoRemediation and Workflow at LinkedIn

  • 1.
  • 2. Brian Cory Sherwin Site Reliability Engineer LinkedIn AutoRemediation and Workflow @ LinkedIn
  • 3. • Key Concepts • Work Flow Ideas • LinkedIn’s Solution 3 Agenda Agenda
  • 4. • Monitoring Systems • Remediation Systems • Action Systems 4 Separation of Powers WorkflowMonitoring Action Systems
  • 5. Restart An Application • Grab some logs • Start an application • Open a ticket 5 Gather Data Restart Ticket Simple Work Flow Example
  • 6. Key Goals • Broker between action systems • Linear Execution of Events • Collaboration and ease of use • Focus on Simple use cases 6 Remediation Broker Monitoring Remote Execution Ticketing Building an AutoRemedation @ LinkedIn
  • 7. • Guaranteed Data Collection • Better Accountability • Formalized automation • Extensibility 7 Gather Data Restart Ticket Why Use a Workflow
  • 8. • Linear Execution • Best Effort • Limited Work Flow Control 8 Work Flow @ LinkedIn Remediation Broker Monitoring Remote Execution Ticketing
  • 9. Work Flow Control Types • Best Effort • Guaranteed • Abort • OnFailure (planned) 9 Gather Data Restart Ticket LinkedIn: Work Flow Control
  • 10. • Brian Cory Sherwin (bcs) • LinkedIn • bsherwin@linkedin 10 Questions? Questions?

Editor's Notes

  1. At least in my designs The key concept is to separate the existence of a monitoring system, a workflow system, and things doing the work. I swiss army knife might have a corkscrew and a screw driver, but they aren’t necessarily very good at any of those jobs. This is potentially a system trying to solve your issues before they become issues. Do you want a purpose built system or do you want something trying to masquerade as 3 separate systems?
  2. lets talk about a simple example of a work flow This is a very simple example I’ll reference a few times. I’ll reference this as a plan throughout the presentation and the individual units of work as a job.
  3. We had a healthy system together already. Monitoring, remote executions, code deployment. What we missed was the glue between these system But how to glue? Focus on simple use cases that you’re already doing. Restarting a web server? Re-kicking a box? All of these should be automated by you first so you don’t have run it manually. Ensure that using the system is easy to use. This should be one of the key design cases. Creating, running, and scheduling remediation work flows should not be challenging. Now that’s not to detract from the importance of understanding what is going on. At the end of the day you need to have faith in your monitoring system.
  4. Lets go back to the example work flow We can a guarantee data collection attempt. I’ve been in a few situation where problems were being fixed by the ops team without gather good data to resolve. As an app owner you should be aware of what data you need to fixe an issue. Better Accountability: We know exactly how many times we’ve done something. Ops teams sometimes can toil in the darkness restarting applications (or other simpler systems that just restart automatically) By keep better records we can make better business decisions on fixing bugs. Is it a .1% problem or a .01% problem. Without good record keeping we’d never know. Related to above: Formalizing automation would mean that simpler solutions that restart applications automatically could hide problems easier. Additionally by using a formalized system we can train less technical people to use it In addition to formalizing, extensibility is key. We do similar actions across our platform. We have dozens of applications with similar infrastructure. We can recycle automations from one group to the next without have to train people use new systems.
  5. Linear. We execute jobs with no branching. No conditionals. Many workflows can be solved using this. Allowing branching work flows is not a necessary feature and can just lead to complicated configurations. best effort: The monitoring system should be telling us to fix a problem. Each time the monitoring system tells us to fix, we begin the work flow. If we fail, it shouldn’t be an issue because the monitoring system will know its still wrong and remind us to run a work flow again. We offer users some limited workflow control options. We’ll detail that in the next slide
  6. The key understanding here is what to do on plan health changes. If gathering data fails, do you want to not attempt to restart? The answer varies on environment. The following descriptions are some of the work flow ideas we’ve come up with during our sojourn into auto remediation. Best Effort: Runs only when plan state is healthy. A particular unit of work’s failure to succeed has no bearing on further execution Guaranteed: Runs regardless of plan state. Its failure will move the plan state to unhealthy. Abort: Only runs when plan state is healthy, on failure, makes plan state unhealthy OnFailure: Runs when plan state is unhealthy, since we’re still designing it, its success could possibly move plan state back to healthy (or perhaps leave it unhealthy).