SlideShare a Scribd company logo
1 of 42
Download to read offline
Debugging (Docker) containers
in production
CTO
bryan@joyent.com
Bryan Cantrill
@bcantrill
Containers at Joyent
• Joyent runs OS containers in the cloud via SmartOS — and we
have run containers in multi-tenant production since ~2006
• Adding support for hardware-based virtualization circa 2011
strengthened our resolve with respect to OS-based virtualization
• OS containers are lightweight and efficient — which is especially
important as services become smaller and more numerous:
overhead and latency become increasingly important!
• We emphasized their operational characteristics — performance,
elasticity, tenancy — and for many years, we were a lone voice...
Containers as PaaS foundation?
• Some saw the power of OS containers to facilitate up-stack
platform-as-a-service abstractions
• For example, dotCloud — a platform-as-a-service provider — built
their PaaS on OS containers
• Struggling as a PaaS, dotCloud pivoted — and open sourced
their container-based orchestration layer...
...and Docker was born
Docker and the container revolution
• Docker has brought the concept to a much broader audience
• Docker has used the rapid provisioning + shared underlying
filesystem of containers to allow developers to think operationally
• Developers can encode deployment procedures via an image
• Images can be reliably and reproducibly deployed as a container
• Images can be quickly deployed — and re-deployed
• Docker is doing to apt what apt did to tar
Docker at Joyent
• We wanted to create a best-of-all-worlds platform: the developer
ease of Docker on the production-grade substrate of SmartOS
• We developed a Linux system call interface for SmartOS,
allowing SmartOS to run Linux binaries at bare-metal speed
• In March 2015, we introduced Triton, our (open source!) stack
that deploys Docker containers directly on the metal
• Triton virtualizes the notion of a Docker host (i.e., “docker ps”
shows all of one’s containers datacenter-wide)
• Brings full debugging (DTrace, MDB) to Docker containers
Containers and the rise of microservices
• The container revolution has both accelerated and been
accelerated by the rise of microservices: small, well-defined
services that do one thing and do it well
• While the term provokes nerd rage in some, it is merely a new
embodiment of an old idea: the Unix Philosophy
• Write programs that do one thing and do it well.
• Write programs to work together.
• Write programs to handle text streams, because that is a
universal interface.
Containers in production
• Containers + microservices are great when they work — but what
happens when these systems fail?
• For continuous integration/continuous deployment use cases
(and/or other stateless services), failure is less of a concern…
• But as containers are increasingly used for workloads that matter,
one can no longer insist that failures don’t happen — or that
restarts will cure any that do
• The ability to understand failure is essential to leap the chasm
from development into meaningful production!
When containers fail?
When containers fail?
When containers fail?
Container failure modes
• When deploying containers + microservices, there is an unstated
truth: you are developing a distributed system
• While more resilient to certain classes of force majeure failure,
distributed systems remain vulnerable to software defects
• We must be able to debug such systems; hope is not a strategy!
• Distributed systems are hard to debug — and are more likely to
exhibit behavior non-reproducible in development
• With container-based systems, we must debug not in terms of
sick pets but rather sick cattle
When containers fail: A more apt metaphor?
When containers fail: The most apt metaphor?
When containers fail: The most apt metaphor!
Debugging anti-patterns
• For too many, debugging is the process of making problems go
away rather than understanding the system!
• The view of bugs-as-nuisance has many knock-on effects:
• Fixes that don’t fix the problem (or introduce new ones!)
• Bug reports closed out as “will not fix” or “works for me”
• Users who are told to “restart” or “reboot” or “log out” or
anything else that amounts to wishful thinking
• And this is only when the process has obviously failed...
Darker debugging anti-patterns
• More insidious effects are felt when the problem appears to have
been resolved, but hasn’t actually been fully understood
• These are the fixes that amount to a fresh coat of paint over a
crack in the foundation — and they are worse than nothing
• Not only do these fixes not actually resolve the problem, they give
the engineer a false sense of confidence that spreads virally
• “Debugging” devolves into an oral tradition: folk tales of problems
that were made to go away
• Container-based systems are especially vulnerable to this!
Thinking methodically
• The way we think about debugging is fundamentally wrong; we
need to think methodically about debugging!
• When we think of debugging as the quest for understanding our
(misbehaving) systems, it allows us to consider it more abstractly
• Namely, how do we explain the phenomena that affect our world?
• We have found that the most powerful explanations reflect an
understanding of underlying structure — beyond what to why
• This deeper understanding allows us to not only to explain but
make predictions
Predictive power
• Valuing predictive power allows us to test our explanations: if our
predictions are wrong, our understanding is incomplete
• We can use the understanding from failed predictions to develop
new explanations and new predictions
• We can then test these new predictions to test our understanding
• If all of this is sounding familiar, it’s because it’s science — and
the methodical exploration of it is the scientific method
The scientific method
• The scientific method is to:
• Make observations
• Formulate a question
• Formulate a hypothesis that answers the question
• Formulate predictions that test the hypothesis
• Test the predictions by conducting an experiment
• Refine the hypothesis and repeat as needed
Science
Science, seriously?!
Science, seriously.
• Software debugging is a pure distillation of scientific thinking
• The limitless amount of data from software systems allows
experiments in seconds instead of weeks/months/years
• The systems we’re reasoning about are entirely synthetic,
discrete and mutable — we made it, we can understand it
• Software is mathematical machine; the conclusions of software
debugging are often mathematical in their unequivocal power!
• Software debugging is so pure, it requires us to refine the
scientific method slightly to reflect its capabilities...
The software debugging method
• Make observations
• Based on observations, formulate a question
• If the question can be answered through subsequent observation,
answer the question through observation and refine/iterate
• If the question cannot be answered through observation, make a
hypothesis as to the answer and formulate predictions
• If predictions can be tested through subsequent observation, test
the predictions through observation and refine/iterate
• Otherwise, test predictions through experiment and refine/iterate
Observation is the heart of debugging!
• The essence — and art! — of debugging software is making
observations and asking questions, not formulating hypotheses!
• Observations are facts — they constrain hypotheses in that any
hypothesis contradicted by facts can be summarily rejected
• As facts beget questions which beget observations and more
facts, hypotheses become more tightly constrained — like a
cordon being cinched around the truth
• Or, in the words of Sir Arthur Conan Doyle’s Sherlock Holmes,
“when you have eliminated all which is impossible, then whatever
remains, however improbable, must be the truth”
Making the hypothetical leap
• Once observation has sufficiently narrowed the gap between what
is known and what is wrong, a hypothetical leap should be made
• Debugging is inefficient when this leap is made too early — like
making a specific guess too early in Twenty Questions
• A hypothesis is only as good as its ability to form a prediction
• A prediction should be tested with either subsequent observation
or by conducting an experiment
• If the prediction proves to be incorrect, understanding is
incomplete; the hypothesis must be rejected — or refined
Experiments in software
• A beauty of software is that it is highly amenable to experiment
• Many experiments are programs — and the most satisfying
experiments test predictions about how failure can be induced
• Many “non-reproducible” problems are merely unusual!
• Debugging a putatively non-reproducible problem to the point of a
reproducible test case is a joy unique in software engineering
Software debugging in practice
• The specifics of observation depends on the nature of the failure
• Software has different kinds of failure modes:
• Fatal failure (segmentation violation, uncaught exception)
• Non-fatal failure (gives the wrong answer, performs terribly)
• Explicit failure (assertion failure, error message)
• Implicit failure (cheerfully does the wrong thing)
Taxonomizing software failure
Implicit
Explicit
Non-fatal Fatal
Gives the wrong answer
Returns the wrong result
Leaks resources
Stops doing work
Performs pathologically
Emits an error message
Returns an error code
Assertion failure
Process explicitly aborts
Exits with an error code
Segmentation violation
Bus Error
Panic
Type Error
Uncaught Exception
Cascading container failure modes
Implicit
Explicit
Non-fatal Fatal
Gives the wrong answer
Returns the wrong result
Leaks resources
Stops doing work
Performs pathologically
Emits an error message
Returns an error code
Assertion failure
Process explicitly aborts
Exits with an error code
Segmentation violation
Bus Error
Panic
Type Error
Uncaught Exception
Debugging fatal failure
• When software fails fatally, we know that the software itself is
broken — its state has become inconsistent
• By saving in-memory state to stable storage, the software can be
debugged postmortem
• To debug, one starts with the invalid state and reasons backwards
to discover a transition from a valid state to an invalid one
• This technique is so old, that the terms for this saved state dates
back to the dawn of the computing age: a core dump
• Not as low-level as the name implies! Modern high-level
languages (e.g., node.js and Go) allow postmortem debugging!
Debugging fatal failure: containers
• Postmortem analysis lends itself very well to containers:
• There is no run-time overhead; overhead (such as it is) is only
at the time of death
• The container can be safely (automatically!) restarted; the core
dump can be analyzed asynchronously
• Tooling need not be in container, can be made arbitrarily rich
• In Triton, all core dumps are automatically stored and then
uploaded into a system that allows for analysis, tagging, etc.
• This has been invaluable for debugging our own services!
Core dump management in Docker
• In Triton, all core dumps are automatically stored and then
uploaded into a system that allows for analysis, tagging, etc.
• This has been invaluable for debugging our own services!
• Outside of Triton, the lack of container awareness around
core_pattern in the Linux kernel is problematic for Docker: core
dumps from Docker are still a bit rocky (viz. docker#11740)
• Docker-based core dump management would be a welcome
addition, but the current proposal (docker#19289) seems to have
been lost in the noise
Debugging non-fatal failure
• There is a solace in fatal failure: it always represents a software
defect at some level — and the inconsistent state is static
• Non-fatal failure can be more challenging: the state is valid and
dynamic — it’s difficult to separate symptom from cause
• Non-fatal failure must still be understood empirically!
• Debugging in vivo requires that data be extracted from the system
— either of its own volition (e.g., via logs) or by coercion (e.g., via
instrumentation)
Debugging explicit, non-fatal failure
• When failure is explicit (e.g., an error or warning message), it
provides a very important data point
• If failure is non-reproducible or otherwise transient, analysis of
explicit software activity becomes essential
• Action in one container will often need to be associated with
failures in another
• Especially for distributed systems, this becomes log analysis, and
is an essential forensic tool for understanding explicit failure
• Essential observation: a time line of events!
Log management in Docker
• “docker logs” was designed for casual development use, not
production use cases
• Inadequate log solutions gave rise to anti-patterns that relied on
Docker host manipulation for production logging
• Docker 1.6 introduced the notion of log drivers, allowing for
Docker containers to log to Graylog, syslogd, fluentd, etc.
• Failure modes aren’t necessarily well considered: what happens
(or should happen) when logging becomes impaired?
• In Triton, failure to log constitutes container failure!
Aside: Docker host anti-patterns
• In the traditional Docker model, Docker hosts are virtual machines
to which containers are directly provisioned
• It may become tempting to manipulate Docker hosts directly, but
doing this entirely compromises the Docker security model
• Worse, compromising the security model creates a VM
dependency that makes bare-metal containers impossible
• And ironically, Docker hosts can resemble pets: the reasons for
backdooring through the Docker host can come to resemble the
arguments made by those who resist containerization entirely!
Debugging implicit, non-fatal failure
• Problems that are both implicit and non-fatal represent the most
time-consuming, most difficult problems to debug because the
system must be understood against its will
• Wherever possible make software explicit about failure!
• Where errors are programmatic (and not operational), they
should always induce fatal failure!
• Container-based systems break at the boundaries: each
thinks they are operating correctly, but together they’re broken
• Data must be coerced from the system via instrumentation
Instrumenting production systems
• Traditionally, software instrumentation was hard-coded and static
(necessitating software restart or — worse — recompile)
• Dynamic system instrumentation was historically limited to system
call table (strace/truss) or packet capture (tcpdump/snoop)
• Effective for some problems, but a poor fit for ad hoc analysis
• In 2003, Sun developed DTrace, a facility for arbitrary, dynamic
instrumentation of production systems that has since been ported
to Mac OS X, FreeBSD, NetBSD and (to a degree) Linux
• DTrace has inspired dynamic instrumentation in other systems
Instrumenting Docker containers
• In Docker, instrumentation is a challenge as containers may not
include the tooling necessary to understand the system
• Docker host-based techniques for instrumentation may be
tempting, but (again!) they should be considered an anti-pattern!
• DTrace has a privilege model that allows it to be safely (and
usefully) used from within a container
• In Triton, DTrace is available from within every container — one
can “docker exec -it bash” and then debug interactively
Instrumenting node.js-based containers
• We have invested heavily in node.js-based infrastructure to allow
us to meaningfully instrument containers in production:
• We developed Bunyan, a logging facility for node.js that
includes DTrace support
• Added DTrace support for node.js profiling
• An essential vector for iterative observation: turning up the
logging level on a running container-based microservice!
Debugging Docker containers in production
• Debugging methodically requires us to shift our thinking — and
learn how to carefully observe the systems we build
• Different types of failures necessitate different techniques:
• Fatal failure is best debugged via postmortem analysis —
which is particular appropriate in an all-container world
• Non-fatal failure necessitates log analysis and dynamic
instrumentation
• The ability to debug problems in production is essential to
successfully deploy and scale container-based systems!

More Related Content

What's hot

The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionbcantrill
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of FailSteve Poole
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsUwe Friedrichsen
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIInternet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIArti Parab Academics
 
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018Codemotion
 
SRVision 2019, Utrecht: Swarming and Cynefin
SRVision 2019, Utrecht: Swarming and CynefinSRVision 2019, Utrecht: Swarming and Cynefin
SRVision 2019, Utrecht: Swarming and CynefinJon Stevens-Hall
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explainedUwe Friedrichsen
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of softwareUwe Friedrichsen
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent CerveauTheFamily
 
CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45Bilal Ahmed
 
Lessons Learned from Xen - SELF2013
Lessons Learned from Xen - SELF2013Lessons Learned from Xen - SELF2013
Lessons Learned from Xen - SELF2013The Linux Foundation
 
Velocity19 Berlin: Swarming, Cynefin… and avoiding the problems of becoming a...
Velocity19 Berlin: Swarming, Cynefin…and avoiding the problems of becoming a...Velocity19 Berlin: Swarming, Cynefin…and avoiding the problems of becoming a...
Velocity19 Berlin: Swarming, Cynefin… and avoiding the problems of becoming a...Jon Stevens-Hall
 
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...Jon Stevens-Hall
 
OSCON 2013: "Case Study: What to do when your project outgrows your company"
OSCON 2013: "Case Study: What to do when your project outgrows your company"OSCON 2013: "Case Study: What to do when your project outgrows your company"
OSCON 2013: "Case Study: What to do when your project outgrows your company"The Linux Foundation
 
DevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming PresentationDevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming PresentationJon Stevens-Hall
 
Video as a prototyping tool for connected products
Video as a prototyping tool for connected productsVideo as a prototyping tool for connected products
Video as a prototyping tool for connected productsMartin Charlier
 

What's hot (20)

The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destruction
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestors
 
Internet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit IIInternet of Things, TYBSC IT, Semester 5, Unit II
Internet of Things, TYBSC IT, Semester 5, Unit II
 
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
Thierry de Pauw - Feature Branching considered Evil - Codemotion Milan 2018
 
SRVision 2019, Utrecht: Swarming and Cynefin
SRVision 2019, Utrecht: Swarming and CynefinSRVision 2019, Utrecht: Swarming and Cynefin
SRVision 2019, Utrecht: Swarming and Cynefin
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explained
 
From 1 to 100
From 1 to 100From 1 to 100
From 1 to 100
 
DevOps
DevOpsDevOps
DevOps
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of software
 
No man's land v2
No man's land v2No man's land v2
No man's land v2
 
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
"Startups, comment gérer une équipe de développeurs" par Laurent Cerveau
 
CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45CS101- Introduction to Computing- Lecture 45
CS101- Introduction to Computing- Lecture 45
 
Lessons Learned from Xen - SELF2013
Lessons Learned from Xen - SELF2013Lessons Learned from Xen - SELF2013
Lessons Learned from Xen - SELF2013
 
Velocity19 Berlin: Swarming, Cynefin… and avoiding the problems of becoming a...
Velocity19 Berlin: Swarming, Cynefin…and avoiding the problems of becoming a...Velocity19 Berlin: Swarming, Cynefin…and avoiding the problems of becoming a...
Velocity19 Berlin: Swarming, Cynefin… and avoiding the problems of becoming a...
 
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
Swarming: How a new approach to support can save DevOps teams from 3rd-line t...
 
OSCON 2013: "Case Study: What to do when your project outgrows your company"
OSCON 2013: "Case Study: What to do when your project outgrows your company"OSCON 2013: "Case Study: What to do when your project outgrows your company"
OSCON 2013: "Case Study: What to do when your project outgrows your company"
 
DevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming PresentationDevOpsDays Riga - Swarming Presentation
DevOpsDays Riga - Swarming Presentation
 
Video as a prototyping tool for connected products
Video as a prototyping tool for connected productsVideo as a prototyping tool for connected products
Video as a prototyping tool for connected products
 

Viewers also liked

Papers We Love: Jails and Zones
Papers We Love: Jails and ZonesPapers We Love: Jails and Zones
Papers We Love: Jails and Zonesbcantrill
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadebcantrill
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocatorbcantrill
 
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containersbcantrill
 
Why it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metalWhy it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metalbcantrill
 
Run containers on bare metal already!
Run containers on bare metal already!Run containers on bare metal already!
Run containers on bare metal already!bcantrill
 
The DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps PlaybookThe DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps Playbookbcantrill
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontierbcantrill
 
A crime against common sense
A crime against common senseA crime against common sense
A crime against common sensebcantrill
 
Leaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guideLeaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guidebcantrill
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIbcantrill
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"Dr. Mirko Kämpf
 
Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)bcantrill
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Cloudera, Inc.
 
Taming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemTaming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemCloudera, Inc.
 
Software Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesSoftware Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesAngelos Kapsimanis
 

Viewers also liked (18)

Papers We Love: Jails and Zones
Papers We Love: Jails and ZonesPapers We Love: Jails and Zones
Papers We Love: Jails and Zones
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocator
 
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to ContainersThe Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers
 
Why it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metalWhy it’s (past) time to run containers on bare metal
Why it’s (past) time to run containers on bare metal
 
Run containers on bare metal already!
Run containers on bare metal already!Run containers on bare metal already!
Run containers on bare metal already!
 
The DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps PlaybookThe DIY Punk Rock DevOps Playbook
The DIY Punk Rock DevOps Playbook
 
node.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontiernode.js and Containers: Dispatches from the Frontier
node.js and Containers: Dispatches from the Frontier
 
A crime against common sense
A crime against common senseA crime against common sense
A crime against common sense
 
Leaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guideLeaping the chasm from proprietary to open: A survivor's guide
Leaping the chasm from proprietary to open: A survivor's guide
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
 
DDTrace
DDTraceDDTrace
DDTrace
 
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
 
Hadoop Puzzlers
Hadoop PuzzlersHadoop Puzzlers
Hadoop Puzzlers
 
Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)Joyent circa 2006 (Scale with Rails)
Joyent circa 2006 (Scale with Rails)
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
Taming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop EcosystemTaming Operations in the Hadoop Ecosystem
Taming Operations in the Hadoop Ecosystem
 
Software Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesSoftware Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniques
 

Similar to Debugging (Docker) containers in production

Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overviewalessio_ferrari
 
Supercharging your bug reports
Supercharging your bug reportsSupercharging your bug reports
Supercharging your bug reportsNeil Studd
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
iOS Test-Driven Development
iOS Test-Driven DevelopmentiOS Test-Driven Development
iOS Test-Driven DevelopmentPablo Villar
 
Continuous Delivery - the missing parts - Paul Stack
Continuous Delivery - the missing parts - Paul StackContinuous Delivery - the missing parts - Paul Stack
Continuous Delivery - the missing parts - Paul StackJAXLondon_Conference
 
Simplify Your Life with CQRS
Simplify Your Life with CQRSSimplify Your Life with CQRS
Simplify Your Life with CQRSJoel Mason
 
Dare to Explore: Discover ET!
Dare to Explore: Discover ET!Dare to Explore: Discover ET!
Dare to Explore: Discover ET!Raj Indugula
 
Software Defects and SW Reliability Assessment
Software Defects and SW Reliability AssessmentSoftware Defects and SW Reliability Assessment
Software Defects and SW Reliability AssessmentKristine Hejna
 
Defining Test Competence
Defining Test CompetenceDefining Test Competence
Defining Test CompetenceJohan Hoberg
 
It's XP Stupid (2019)
It's XP Stupid (2019)It's XP Stupid (2019)
It's XP Stupid (2019)Mike Harris
 
Agile software development
Agile software developmentAgile software development
Agile software developmentHemangi Talele
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014gdusbabek
 
Feedback loops between tooling and culture
Feedback loops between tooling and cultureFeedback loops between tooling and culture
Feedback loops between tooling and cultureChris Winters
 
Arch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesArch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesIgor Moochnick
 
The biggest DevOps problems you didn't know you had and what to do about them
The biggest DevOps problems you didn't know you had and what to do about themThe biggest DevOps problems you didn't know you had and what to do about them
The biggest DevOps problems you didn't know you had and what to do about themWayne Greene
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven developmentEinar Ingebrigtsen
 
DevOps Transformation - Another View
DevOps Transformation - Another ViewDevOps Transformation - Another View
DevOps Transformation - Another ViewAgron Fazliu
 

Similar to Debugging (Docker) containers in production (20)

Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
Supercharging your bug reports
Supercharging your bug reportsSupercharging your bug reports
Supercharging your bug reports
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
iOS Test-Driven Development
iOS Test-Driven DevelopmentiOS Test-Driven Development
iOS Test-Driven Development
 
Continuous Delivery - the missing parts - Paul Stack
Continuous Delivery - the missing parts - Paul StackContinuous Delivery - the missing parts - Paul Stack
Continuous Delivery - the missing parts - Paul Stack
 
Unit4 for st.pdf
Unit4 for st.pdfUnit4 for st.pdf
Unit4 for st.pdf
 
Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Simplify Your Life with CQRS
Simplify Your Life with CQRSSimplify Your Life with CQRS
Simplify Your Life with CQRS
 
Dare to Explore: Discover ET!
Dare to Explore: Discover ET!Dare to Explore: Discover ET!
Dare to Explore: Discover ET!
 
Software Defects and SW Reliability Assessment
Software Defects and SW Reliability AssessmentSoftware Defects and SW Reliability Assessment
Software Defects and SW Reliability Assessment
 
Defining Test Competence
Defining Test CompetenceDefining Test Competence
Defining Test Competence
 
It's XP Stupid (2019)
It's XP Stupid (2019)It's XP Stupid (2019)
It's XP Stupid (2019)
 
Agile software development
Agile software developmentAgile software development
Agile software development
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Feedback loops between tooling and culture
Feedback loops between tooling and cultureFeedback loops between tooling and culture
Feedback loops between tooling and culture
 
Arch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesArch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best Practices
 
Application of analytics
Application of analyticsApplication of analytics
Application of analytics
 
The biggest DevOps problems you didn't know you had and what to do about them
The biggest DevOps problems you didn't know you had and what to do about themThe biggest DevOps problems you didn't know you had and what to do about them
The biggest DevOps problems you didn't know you had and what to do about them
 
Driving application development through behavior driven development
Driving application development through behavior driven developmentDriving application development through behavior driven development
Driving application development through behavior driven development
 
DevOps Transformation - Another View
DevOps Transformation - Another ViewDevOps Transformation - Another View
DevOps Transformation - Another View
 

More from bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Presentbcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmakingbcantrill
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...bcantrill
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsbcantrill
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systemsbcantrill
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolutionbcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Agebcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesbcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Lawbcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarebcantrill
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the unionbcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after darkbcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadershipbcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondbcantrill
 

More from bcantrill (16)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...Coming of Age: Developing young technologists without robbing them of their y...
Coming of Age: Developing young technologists without robbing them of their y...
 
I have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systemsI have come to bury the BIOS, not to open it: The need for holistic systems
I have come to bury the BIOS, not to open it: The need for holistic systems
 
Towards Holistic Systems
Towards Holistic SystemsTowards Holistic Systems
Towards Holistic Systems
 
The Coming Firmware Revolution
The Coming Firmware RevolutionThe Coming Firmware Revolution
The Coming Firmware Revolution
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?Is it time to rewrite the operating system in Rust?
Is it time to rewrite the operating system in Rust?
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 

Recently uploaded

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 

Recently uploaded (20)

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 

Debugging (Docker) containers in production

  • 1. Debugging (Docker) containers in production CTO bryan@joyent.com Bryan Cantrill @bcantrill
  • 2. Containers at Joyent • Joyent runs OS containers in the cloud via SmartOS — and we have run containers in multi-tenant production since ~2006 • Adding support for hardware-based virtualization circa 2011 strengthened our resolve with respect to OS-based virtualization • OS containers are lightweight and efficient — which is especially important as services become smaller and more numerous: overhead and latency become increasingly important! • We emphasized their operational characteristics — performance, elasticity, tenancy — and for many years, we were a lone voice...
  • 3. Containers as PaaS foundation? • Some saw the power of OS containers to facilitate up-stack platform-as-a-service abstractions • For example, dotCloud — a platform-as-a-service provider — built their PaaS on OS containers • Struggling as a PaaS, dotCloud pivoted — and open sourced their container-based orchestration layer...
  • 5. Docker and the container revolution • Docker has brought the concept to a much broader audience • Docker has used the rapid provisioning + shared underlying filesystem of containers to allow developers to think operationally • Developers can encode deployment procedures via an image • Images can be reliably and reproducibly deployed as a container • Images can be quickly deployed — and re-deployed • Docker is doing to apt what apt did to tar
  • 6. Docker at Joyent • We wanted to create a best-of-all-worlds platform: the developer ease of Docker on the production-grade substrate of SmartOS • We developed a Linux system call interface for SmartOS, allowing SmartOS to run Linux binaries at bare-metal speed • In March 2015, we introduced Triton, our (open source!) stack that deploys Docker containers directly on the metal • Triton virtualizes the notion of a Docker host (i.e., “docker ps” shows all of one’s containers datacenter-wide) • Brings full debugging (DTrace, MDB) to Docker containers
  • 7. Containers and the rise of microservices • The container revolution has both accelerated and been accelerated by the rise of microservices: small, well-defined services that do one thing and do it well • While the term provokes nerd rage in some, it is merely a new embodiment of an old idea: the Unix Philosophy • Write programs that do one thing and do it well. • Write programs to work together. • Write programs to handle text streams, because that is a universal interface.
  • 8. Containers in production • Containers + microservices are great when they work — but what happens when these systems fail? • For continuous integration/continuous deployment use cases (and/or other stateless services), failure is less of a concern… • But as containers are increasingly used for workloads that matter, one can no longer insist that failures don’t happen — or that restarts will cure any that do • The ability to understand failure is essential to leap the chasm from development into meaningful production!
  • 12. Container failure modes • When deploying containers + microservices, there is an unstated truth: you are developing a distributed system • While more resilient to certain classes of force majeure failure, distributed systems remain vulnerable to software defects • We must be able to debug such systems; hope is not a strategy! • Distributed systems are hard to debug — and are more likely to exhibit behavior non-reproducible in development • With container-based systems, we must debug not in terms of sick pets but rather sick cattle
  • 13. When containers fail: A more apt metaphor?
  • 14. When containers fail: The most apt metaphor?
  • 15. When containers fail: The most apt metaphor!
  • 16. Debugging anti-patterns • For too many, debugging is the process of making problems go away rather than understanding the system! • The view of bugs-as-nuisance has many knock-on effects: • Fixes that don’t fix the problem (or introduce new ones!) • Bug reports closed out as “will not fix” or “works for me” • Users who are told to “restart” or “reboot” or “log out” or anything else that amounts to wishful thinking • And this is only when the process has obviously failed...
  • 17. Darker debugging anti-patterns • More insidious effects are felt when the problem appears to have been resolved, but hasn’t actually been fully understood • These are the fixes that amount to a fresh coat of paint over a crack in the foundation — and they are worse than nothing • Not only do these fixes not actually resolve the problem, they give the engineer a false sense of confidence that spreads virally • “Debugging” devolves into an oral tradition: folk tales of problems that were made to go away • Container-based systems are especially vulnerable to this!
  • 18. Thinking methodically • The way we think about debugging is fundamentally wrong; we need to think methodically about debugging! • When we think of debugging as the quest for understanding our (misbehaving) systems, it allows us to consider it more abstractly • Namely, how do we explain the phenomena that affect our world? • We have found that the most powerful explanations reflect an understanding of underlying structure — beyond what to why • This deeper understanding allows us to not only to explain but make predictions
  • 19. Predictive power • Valuing predictive power allows us to test our explanations: if our predictions are wrong, our understanding is incomplete • We can use the understanding from failed predictions to develop new explanations and new predictions • We can then test these new predictions to test our understanding • If all of this is sounding familiar, it’s because it’s science — and the methodical exploration of it is the scientific method
  • 20. The scientific method • The scientific method is to: • Make observations • Formulate a question • Formulate a hypothesis that answers the question • Formulate predictions that test the hypothesis • Test the predictions by conducting an experiment • Refine the hypothesis and repeat as needed
  • 23. Science, seriously. • Software debugging is a pure distillation of scientific thinking • The limitless amount of data from software systems allows experiments in seconds instead of weeks/months/years • The systems we’re reasoning about are entirely synthetic, discrete and mutable — we made it, we can understand it • Software is mathematical machine; the conclusions of software debugging are often mathematical in their unequivocal power! • Software debugging is so pure, it requires us to refine the scientific method slightly to reflect its capabilities...
  • 24. The software debugging method • Make observations • Based on observations, formulate a question • If the question can be answered through subsequent observation, answer the question through observation and refine/iterate • If the question cannot be answered through observation, make a hypothesis as to the answer and formulate predictions • If predictions can be tested through subsequent observation, test the predictions through observation and refine/iterate • Otherwise, test predictions through experiment and refine/iterate
  • 25. Observation is the heart of debugging! • The essence — and art! — of debugging software is making observations and asking questions, not formulating hypotheses! • Observations are facts — they constrain hypotheses in that any hypothesis contradicted by facts can be summarily rejected • As facts beget questions which beget observations and more facts, hypotheses become more tightly constrained — like a cordon being cinched around the truth • Or, in the words of Sir Arthur Conan Doyle’s Sherlock Holmes, “when you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth”
  • 26. Making the hypothetical leap • Once observation has sufficiently narrowed the gap between what is known and what is wrong, a hypothetical leap should be made • Debugging is inefficient when this leap is made too early — like making a specific guess too early in Twenty Questions • A hypothesis is only as good as its ability to form a prediction • A prediction should be tested with either subsequent observation or by conducting an experiment • If the prediction proves to be incorrect, understanding is incomplete; the hypothesis must be rejected — or refined
  • 27. Experiments in software • A beauty of software is that it is highly amenable to experiment • Many experiments are programs — and the most satisfying experiments test predictions about how failure can be induced • Many “non-reproducible” problems are merely unusual! • Debugging a putatively non-reproducible problem to the point of a reproducible test case is a joy unique in software engineering
  • 28. Software debugging in practice • The specifics of observation depends on the nature of the failure • Software has different kinds of failure modes: • Fatal failure (segmentation violation, uncaught exception) • Non-fatal failure (gives the wrong answer, performs terribly) • Explicit failure (assertion failure, error message) • Implicit failure (cheerfully does the wrong thing)
  • 29. Taxonomizing software failure Implicit Explicit Non-fatal Fatal Gives the wrong answer Returns the wrong result Leaks resources Stops doing work Performs pathologically Emits an error message Returns an error code Assertion failure Process explicitly aborts Exits with an error code Segmentation violation Bus Error Panic Type Error Uncaught Exception
  • 30. Cascading container failure modes Implicit Explicit Non-fatal Fatal Gives the wrong answer Returns the wrong result Leaks resources Stops doing work Performs pathologically Emits an error message Returns an error code Assertion failure Process explicitly aborts Exits with an error code Segmentation violation Bus Error Panic Type Error Uncaught Exception
  • 31. Debugging fatal failure • When software fails fatally, we know that the software itself is broken — its state has become inconsistent • By saving in-memory state to stable storage, the software can be debugged postmortem • To debug, one starts with the invalid state and reasons backwards to discover a transition from a valid state to an invalid one • This technique is so old, that the terms for this saved state dates back to the dawn of the computing age: a core dump • Not as low-level as the name implies! Modern high-level languages (e.g., node.js and Go) allow postmortem debugging!
  • 32. Debugging fatal failure: containers • Postmortem analysis lends itself very well to containers: • There is no run-time overhead; overhead (such as it is) is only at the time of death • The container can be safely (automatically!) restarted; the core dump can be analyzed asynchronously • Tooling need not be in container, can be made arbitrarily rich • In Triton, all core dumps are automatically stored and then uploaded into a system that allows for analysis, tagging, etc. • This has been invaluable for debugging our own services!
  • 33. Core dump management in Docker • In Triton, all core dumps are automatically stored and then uploaded into a system that allows for analysis, tagging, etc. • This has been invaluable for debugging our own services! • Outside of Triton, the lack of container awareness around core_pattern in the Linux kernel is problematic for Docker: core dumps from Docker are still a bit rocky (viz. docker#11740) • Docker-based core dump management would be a welcome addition, but the current proposal (docker#19289) seems to have been lost in the noise
  • 34. Debugging non-fatal failure • There is a solace in fatal failure: it always represents a software defect at some level — and the inconsistent state is static • Non-fatal failure can be more challenging: the state is valid and dynamic — it’s difficult to separate symptom from cause • Non-fatal failure must still be understood empirically! • Debugging in vivo requires that data be extracted from the system — either of its own volition (e.g., via logs) or by coercion (e.g., via instrumentation)
  • 35. Debugging explicit, non-fatal failure • When failure is explicit (e.g., an error or warning message), it provides a very important data point • If failure is non-reproducible or otherwise transient, analysis of explicit software activity becomes essential • Action in one container will often need to be associated with failures in another • Especially for distributed systems, this becomes log analysis, and is an essential forensic tool for understanding explicit failure • Essential observation: a time line of events!
  • 36. Log management in Docker • “docker logs” was designed for casual development use, not production use cases • Inadequate log solutions gave rise to anti-patterns that relied on Docker host manipulation for production logging • Docker 1.6 introduced the notion of log drivers, allowing for Docker containers to log to Graylog, syslogd, fluentd, etc. • Failure modes aren’t necessarily well considered: what happens (or should happen) when logging becomes impaired? • In Triton, failure to log constitutes container failure!
  • 37. Aside: Docker host anti-patterns • In the traditional Docker model, Docker hosts are virtual machines to which containers are directly provisioned • It may become tempting to manipulate Docker hosts directly, but doing this entirely compromises the Docker security model • Worse, compromising the security model creates a VM dependency that makes bare-metal containers impossible • And ironically, Docker hosts can resemble pets: the reasons for backdooring through the Docker host can come to resemble the arguments made by those who resist containerization entirely!
  • 38. Debugging implicit, non-fatal failure • Problems that are both implicit and non-fatal represent the most time-consuming, most difficult problems to debug because the system must be understood against its will • Wherever possible make software explicit about failure! • Where errors are programmatic (and not operational), they should always induce fatal failure! • Container-based systems break at the boundaries: each thinks they are operating correctly, but together they’re broken • Data must be coerced from the system via instrumentation
  • 39. Instrumenting production systems • Traditionally, software instrumentation was hard-coded and static (necessitating software restart or — worse — recompile) • Dynamic system instrumentation was historically limited to system call table (strace/truss) or packet capture (tcpdump/snoop) • Effective for some problems, but a poor fit for ad hoc analysis • In 2003, Sun developed DTrace, a facility for arbitrary, dynamic instrumentation of production systems that has since been ported to Mac OS X, FreeBSD, NetBSD and (to a degree) Linux • DTrace has inspired dynamic instrumentation in other systems
  • 40. Instrumenting Docker containers • In Docker, instrumentation is a challenge as containers may not include the tooling necessary to understand the system • Docker host-based techniques for instrumentation may be tempting, but (again!) they should be considered an anti-pattern! • DTrace has a privilege model that allows it to be safely (and usefully) used from within a container • In Triton, DTrace is available from within every container — one can “docker exec -it bash” and then debug interactively
  • 41. Instrumenting node.js-based containers • We have invested heavily in node.js-based infrastructure to allow us to meaningfully instrument containers in production: • We developed Bunyan, a logging facility for node.js that includes DTrace support • Added DTrace support for node.js profiling • An essential vector for iterative observation: turning up the logging level on a running container-based microservice!
  • 42. Debugging Docker containers in production • Debugging methodically requires us to shift our thinking — and learn how to carefully observe the systems we build • Different types of failures necessitate different techniques: • Fatal failure is best debugged via postmortem analysis — which is particular appropriate in an all-container world • Non-fatal failure necessitates log analysis and dynamic instrumentation • The ability to debug problems in production is essential to successfully deploy and scale container-based systems!