Presenters: Matthew Skelton and Rob Thatcher, Skelton Thatcher Consulting
Webinar: Operability is all about making software work well in Production. In this webinar, we explore practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT: logging with Event IDs, Run Book dialogue sheets, endpoint healthchecks, correlation IDs, and lightweight User Personas.
Target audience: Software Developer, Tester, Software Architect, DevOps Engineer, Delivery Manager, Head of Delivery, Head of IT.
Benefits: Attendees will gain insights into operability and why this is important for modern software systems, along with practical experience of techniques to enhance operability in almost any software system they encounter.
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Practical operability techniques for teams - webinar - Skelton Thatcher & Unicom
1. Practical Operability
Techniques for Teams
Matthew Skelton & Rob Thatcher
Skelton Thatcher Consulting
skeltonthatcher.com / @SkeltonThatcher
Unicom Seminars – webinar – 28 September 2017
6. Operability:
use modern logging, Run Book
dialogue sheets, endpoint
healthchecks, correlation IDs,
and user personas as
team collaboration techniques
32. Example: video processing
Discover processing bottlenecks
Trigger alerts via LogEntries /
HostedGraphite
Report on KPIs
Target areas for improvement
35. Run Book dialogue sheets
Checklists: typical operational
considerations
Team-friendly exploration
36. System characteristics
Hours of operation
During what hours does the service or system actually need to operate? Can portions or features of the
system be unavailable at times if needed?
Hours of operation - core features
(e.g. 03:00-01:00 GMT+0)
Hours of operation - secondary features
(e.g. 07:00-23:00 GMT+0)
Data and processing flows
How and where does data flow through the system? What controls or triggers data flows?
(e.g. mobile requests / scheduled batch jobs / inbound IoT sensor data )
…
41. endpoint healthchecks
Every runnable app/service/daemon
exposes /status/health
An HTTP GET to the endpoint returns:
200 – "I am healthy"
500 – "I am sick"
42. endpoint healthchecks
For databases and other non-HTTP
components, run a lightweight HTTP
service in front of the component
200 / 500 responses
48. Synchronous HTTP:
X-HEADER e.g. X-trace-id
X-trace-id: 348e1cf8
If header is present, pass it on
(Yes, RFC6648, but this is internal only)
49. Asynchonous (queues, etc.):
Message Attributes, name:value pair
e.g. "trace-id":"348e1cf8"
AWS SQS: SendMessage() / ReceiveMessage()
Log the Correlation ID if present
50. Example: electronic trading
High speed, low latency
Trading options & derivatives
Connected to stock exchanges
Sub-millisecond timings
> £1 million per day traded
51.
52.
53.
54. Correlations IDs for trading
Evidence for timely operation
Help identify bottlenecks
Target areas for perf tuning
Identify race conditions
Increase operability
68. Operability
use modern logging, Run Book
dialogue sheets, endpoint
healthchecks, correlation IDs,
and user personas as
team collaboration techniques
69. Team Guide to
Software Operability
Matthew Skelton & Rob Thatcher
skeltonthatcher.com/publications
Download a free sample chapter
71. Questions?
via the webinar chat tool
via Twitter: @SkeltonThatcher
via email: questions@skeltonthatcher.com
Unicom Seminars: info@unicom.co.uk
72. Resources
• Training: Practical Operability for Developers and Testers – led
by Matthew Skelton and Rob Thatcher – 1-day workshop –
http://www.unicom.co.uk/practical-operability-for-developers-
and-testers.html
• Team Guide to Software Operability by Matthew Skelton and Rob
Thatcher (Skelton Thatcher Publications, 2016)
http://operabilitybook.com/
• Run Book template & Run Book dialogue sheets
http://runbooktemplate.info/