SlideShare a Scribd company logo
1 of 229
Download to read offline
http://gapingvoid.com/
Sunday, June 20, 2010
The Upside of Downtime
         Turning disaster into opportunity




Sunday, June 20, 2010
Who’s had a site go down?




Sunday, June 20, 2010
Who’s hasn’t had a site go
                       down?



Sunday, June 20, 2010
There’s always
                         that one guy!




Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Downtime
                                                   sucks



Source: http://www.motivatedphotos.com/?id=8080
Sunday, June 20, 2010
Why downtime sucks
               Business   $3,000

                          $2,250

                          $1,500
                                                         Sales
                           $750

                             $0
                                   0   2   4   6   8   10 12 14 16 18 20 22




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand
               You




Sunday, June 20, 2010
Why downtime sucks
               Business
               Brand
               You
               Users




Sunday, June 20, 2010
Downtime = Bad! (Duh)




Sunday, June 20, 2010
Approach #1
                          Don’t fail



Sunday, June 20, 2010
Source: http://kansansforlife.files.wordpress.com/2009/12/titanic.jpg
Sunday, June 20, 2010
“Everything fails all the time”
                        -- Werner Vogels (Amazon, CTO)




Sunday, June 20, 2010
“Everything fails all the time”
                        -- Werner Vogels (Amazon, CTO)




Sunday, June 20, 2010
Your site
                         will fail



                           Werner Vogels
                          (Amazon, CTO)
Sunday, June 20, 2010
Why?!?




Sunday, June 20, 2010
Why Failure Happens
                            Risk Homeostasis




Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg

Sunday, June 20, 2010
Why Failure Happens
                        Risk Homeostasis
                        Black Swan




Source: Amazon.com
Sunday, June 20, 2010
Why Failure Happens
                          Risk Homeostasis
                          Black Swan
                          Unknown unknowns




Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg
Sunday, June 20, 2010
Why Failure Happens
                           Risk Homeostasis
                           Black Swan
                           Unknown unknowns
                           Change




Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg
Sunday, June 20, 2010
Why Failure Happens
                          Risk Homeostasis
                          Black Swan
                          Unknown unknowns
                          Change
                          Many small failures


Source: http://www.biojobblog.com/uploads/image/dominos.jpg

Sunday, June 20, 2010
Why Failure Happens
                            Risk Homeostasis
                            Black Swan
                            Unknown unknowns
                            Change
                            Many small failures
                            Humans
Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Polisher
                 blocked

         Not unusual




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into
                 blocked                                      air system

         Not unusual                                           Not expected




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected        Not good




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken




Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken


                                                                       WTF        Gauge broken

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Polisher                                 Moisture leaks into   Flow of cold water
                 blocked                                      air system             stopped

         Not unusual                                           Not expected
                                                                                Backup disabled


                                                                       Doh!     Indicator blocked


                                                                      Dammit Relief valve broken


                                                                Meltdown          Gauge broken

Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm
Sunday, June 20, 2010
Sunday, June 20, 2010
Source: http://support.rightscale.com/09-Clouds/AWS/02-Amazon_EC2/Designing_Failover_Architectures_on_EC2/03-Advanced_Failover_Architecture
Sunday, June 20, 2010
“accidental power failure”



Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/
Sunday, June 20, 2010
“traffic accident damaged a nearby
                         utility transformer”
Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/
Sunday, June 20, 2010
“unfortunate code change”
Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/
Sunday, June 20, 2010
Sunday, June 20, 2010
“Unhappy customers may get some
             attention, but unhappy networked
             customers can quickly impact your
             business”
                                                                                                                                     -- Clay Shirky

Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
http://labs.webmetrics.com/crowdsourceduptime
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Recap




Sunday, June 20, 2010
Your site will fail




Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad




Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad
          +
          Everyone will find out



Sunday, June 20, 2010
Your site will fail
          +
          Downtime is bad
          +
          Everyone will find out
          =
          Screw it, I’ll become a
          lumberjack
                            Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg
Sunday, June 20, 2010
“Embrace fear of outages and
               degradation. Use it to guide your
               architecture, your code, your
               infrastructure. So lean into it.”
                              -- John Allspaw, VP Tech. Ops at Etsy

Sunday, June 20, 2010
Approach #2
                        Prepare for downtime



Sunday, June 20, 2010
Disclaimer:
         Try hard to avoid downtime



Sunday, June 20, 2010
Learning by example...




Sunday, June 20, 2010
Case Study #1
                          Facebook



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
“The larger issue here isn't just that a portion of
         Facebook's platform has gone down - numerous web
         services have issues from time to time, including
         everything from Gmail to Twitter. An outage of this
         length, however, with no official communication
         from the company itself is disturbing.”
                                                     -- N.Y. Times




Sunday, June 20, 2010
Facebook



         Downtime             Disturbing




Sunday, June 20, 2010
Sunday, June 20, 2010
Case Study #2
                        Google App Engine



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Google App Engine



                        Downtime     Kudos




Sunday, June 20, 2010
Case Study #3
                          Atlassian



Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Atlassian



                 Downtime           Bravo




Sunday, June 20, 2010
http://atlassian.com/

Sunday, June 20, 2010
Downtime:
         Opportunity to Build Trust



Sunday, June 20, 2010
Downtime:
         Opportunity to Destroy Trust



Sunday, June 20, 2010
How To:
         Prepare for Downtime



Sunday, June 20, 2010
Something > Nothing




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




               Life is good     Oh crap     That sucked
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Upside of Downtime Framework 1.0




                        Prepare   Communicate   Explain
         Time




Sunday, June 20, 2010
Prepare   Communicate   Explain




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Communication channel




Sunday, June 20, 2010
Prepare   Communicate          Explain

         1. Communication channel


      Something is                Can’t tell if it’s    I’ll assume it’s
        wrong                      me or you                   you




                                                          You suck


Sunday, June 20, 2010
Prepare   Communicate          Explain

         1. Communication channel


      Something is                Can’t tell if it’s    I’ll assume it’s
        wrong                      me or you                   you




                                   Tell me when         You suck a lot
    I know it’s you
                                    you’re back              less


Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find




Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site




Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated




Sunday, June 20, 2010
7 keys for public health dashboards

          1. Must show current status for each “service”
          2. Data must be accurate and timely
          3. Must be easy to find
          4. Must provide details for events in real time
          5. Provide historical uptime and performance data
          6. Provide a way to be notified of status changes
          7. Provide details on the data is gathered


 Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html

Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process



Sunday, June 20, 2010
Prepare       Communicate   Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority




Sunday, June 20, 2010
Prepare       Communicate    Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority
                        Mean-Time-To-Communicate (MTTC)


Sunday, June 20, 2010
Prepare        Communicate        Explain

         1. Communication channel
                        Easy to find
                        Hosted off-site
                        Real-time / automated

         2. Process
                        Authority
                        Mean-Time-To-Communicate (MTTC)
                        On-call/drills/escalations/etc.
Sunday, June 20, 2010
Your servers




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Communicate




Sunday, June 20, 2010
Prepare     Communicate     Explain

         1. Communicate
                        Use communication channel




Sunday, June 20, 2010
Prepare     Communicate     Explain

         1. Communicate
                        Use communication channel
                        MTTC




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA




Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA
                        Update regularly


Sunday, June 20, 2010
Prepare      Communicate    Explain

         1. Communicate
                        Use communication channel
                        MTTC
                        Who/what is affected
                        When the incident started
                        ETA
                        Update regularly

         2. Fix it!
Sunday, June 20, 2010
Phew, close
                           one!




Sunday, June 20, 2010
Prepare   Communicate   Explain

         1. Postmortem




Sunday, June 20, 2010
Prepare                                 Communicate   Explain

         1. Postmortem
                        Admit failure




Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/
Sunday, June 20, 2010
Prepare                                Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human




Source: http://www.bureauofcommunication.com/compose/apology
Sunday, June 20, 2010
Prepare   Communicate   Explain




                         “We apologize for any
                        inconvenience this may
                             have caused”


Sunday, June 20, 2010
Prepare                                   Communicate                    Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time




Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf
Sunday, June 20, 2010
Prepare                                    Communicate      Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted




Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/
Sunday, June 20, 2010
Prepare                                 Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong




Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html
Sunday, June 20, 2010
Prepare                           Communicate   Explain

          1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned


Source: http://graysky.org/2010/02/downtime-postmortem/
Sunday, June 20, 2010
Prepare         Communicate   Explain

         1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned


Sunday, June 20, 2010
Prepare   Communicate   Explain




                “I was completely overwhelmed by
                the amount of positive feedback and
                support I received.”
Sunday, June 20, 2010
Prepare         Communicate   Explain

         1. Postmortem
                        Admit failure
                        Sound like a human
                        Start time and end time
                        Who/what was impacted
                        What went wrong
                        Lessons learned

          2. Improve for the future
Sunday, June 20, 2010
Prepare                       Communicate   Explain




               “Google is not just saying sorry, they are
               actually implementing serious changes which
               probably represents millions of dollars of
               development to help make sure this doesn't
               happen again.”




Source: http://news.ycombinator.com/item?id=1168493

Sunday, June 20, 2010
Prepare                                  Communicate                     Explain




Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf
Sunday, June 20, 2010
Prepare   Communicate   Explain




                                  Be human




Sunday, June 20, 2010
Prepare     Communicate   Explain




                                  Be authentic




Sunday, June 20, 2010
Prepare      Communicate   Explain




                                  Be transparent




Sunday, June 20, 2010
Prepare   Communicate   Explain




                          Accept responsibility




Sunday, June 20, 2010
Prepare   Communicate   Explain




                            Learn and improve




Sunday, June 20, 2010
Prepare   Communicate   Explain




                                   Trust




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare     Communicate                Explain
        1. Communication channel     1. Communicate         1. Post-mortem
        - Easy to find                 - Use channel          - Admit failure
        - Off-site                    - M.T.T.C.             - Sound like a human
        - Real-time                   - Who/what affected    - Start time and end time
                                      - When started         - Who/what was impacted
        2. Process                    - ETA to resolution    - What went wrong
         - Give authority             - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations       2. Fix it!             2. Learn and improve




                Be Prepared       + Be Transparent +          Be Human




Sunday, June 20, 2010
Upside of Downtime Framework 1.0

                        Prepare     Communicate                Explain
        1. Communication channel     1. Communicate         1. Post-mortem
        - Easy to find                 - Use channel          - Admit failure
        - Off-site                    - M.T.T.C.             - Sound like a human
        - Real-time                   - Who/what affected    - Start time and end time
                                      - When started         - Who/what was impacted
        2. Process                    - ETA to resolution    - What went wrong
         - Give authority             - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations       2. Fix it!             2. Learn and improve




                Be Prepared       + Be Transparent +          Be Human             =



Sunday, June 20, 2010
                                    Trust
Disclaimer:
         Don’t screw up too often



Sunday, June 20, 2010
Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught




                     Not
                    Caught



Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught




                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught                 Big Loss


                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught                     Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught     Win             Win

Sunday, June 20, 2010
Downtime Prisoner’s Dilemma


                             Transparent   Not Transparent



                    Caught   Big Win Big Loss


                     Not
                    Caught     Win             Win

Sunday, June 20, 2010
Benefits
               Gain trust
               Reduce churn, increase loyalty
               Reduce support costs
               Ability to control the message
               Competitive advantage
               More time to focus on the actual problem
               Reduce stress


Sunday, June 20, 2010
Change != Easy




Sunday, June 20, 2010
Change != Impossible




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve
               Pain




Sunday, June 20, 2010
Keys to Adoption
               Getting past a culture of “hide the problem”
               Overriding commitment to want to improve
               Available resources to improve
               Pain
               Buy-in




Sunday, June 20, 2010
Product
         Management



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness



                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support



          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing   Reality: They’ll find out, better from us

Sunday, June 20, 2010
Product        Default: Lets wait for complaints
         Management       Reality: Proactiveness => Forgiveness


                          Default: Too much work
                Support
                          Reality: More upfront, less when it matters


          Engineering/    Default: Don’t want to look bad
           Operations     Reality: Opportunity to learn/improve


               Sales/     Default: I don’t want my customers to know
              Marketing   Reality: They’ll find out, better from us

Sunday, June 20, 2010
Source: http://delicious.com/lennysan/healthdashboard

Sunday, June 20, 2010
Simple as that!




Sunday, June 20, 2010
Your site
                        will still fail!




Sunday, June 20, 2010
“The measure of a society is how
     well it transforms pain and suffering
     into something worthwhile.”
                           -- Fredrick Nietzsche

Sunday, June 20, 2010
“The measure of a company is how
      well it transforms pain of downtime
      into something worthwhile.”
                                                        -- Lenny Rachitsky

Source: Original quote inspired by Fredrick Nietzsche
Sunday, June 20, 2010
Bare minimum:
         Register a Twitter account



Sunday, June 20, 2010
Thank You

             Slides: http://bit.ly/upside-of-downtime

             Lenny Rachitsky
             @lennysan
             http://www.transparentuptime.com/

                        Webmetrics/Neustar
                        @webmetrics
                        http://www.webmetrics.com/
Sunday, June 20, 2010
Bonus




Sunday, June 20, 2010
Sunday, June 20, 2010
Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare                                         Communicate                       Explain
          1. Communication channel                                              1. Communicate         1. Post-mortem
          - Easy to find                                                          - Use channel          - Admit failure
          - Off-site                                                             - M.T.T.C.             - Sound like a human
          - Real-time                                                            - Who/what affected    - Start time and end time
                                                                                 - When started         - Who/what was impacted
          2. Process                                                             - ETA to resolution    - What went wrong
           - Give authority                                                      - Update regularly     - Lessons learned
           - M.T.T.C.
           - On-call/escalations                                                2. Fix it!             2. Learn and improve




        "Unlikely that an accidental surface or subsurface
        oil spill would occur from the proposed activities"
                                                                                -- Exploration and environmental impact plan


Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion

Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
Upside of Downtime Framework 1.0
                        Prepare    Communicate                Explain
        1. Communication channel    1. Communicate         1. Post-mortem
        - Easy to find                - Use channel          - Admit failure
        - Off-site                   - M.T.T.C.             - Sound like a human
        - Real-time                  - Who/what affected    - Start time and end time
                                     - When started         - Who/what was impacted
        2. Process                   - ETA to resolution    - What went wrong
         - Give authority            - Update regularly     - Lessons learned
         - M.T.T.C.
         - On-call/escalations      2. Fix it!             2. Learn and improve




Sunday, June 20, 2010
“Be not afraid of transparency;
          some are born transparent,
          some achieve transparency,
          and others have transparency
         
 
 
 
 
 
 
 
 thrust upon them.”
                        -- Burrowed from William Shakespeare




Sunday, June 20, 2010
Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)




Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)
         8. Build habits - (build process organically)


Sunday, June 20, 2010
Making change
         1. Find the bright spots - (this presentation has a bunch)
         2. Script the critical moves - (framework)
         3. Point to the destination - (W.W.G.D.)

         4. Find the feeling - (how would you feel?)
         5. Shrink the change - (start small)
         6. Grow your people - (everyone is learning as they go)
         7. Tweak the environment - (create a simple process)
         8. Build habits - (build process organically)
         9. Rally the herd - (get buy in, rest will follow)
Sunday, June 20, 2010

More Related Content

More from Lenny Rachitsky

Localmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLocalmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLenny Rachitsky
 
Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Lenny Rachitsky
 
Upside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkUpside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkLenny Rachitsky
 
Google App Engine - Simple Introduction
Google App Engine - Simple IntroductionGoogle App Engine - Simple Introduction
Google App Engine - Simple IntroductionLenny Rachitsky
 
The Cloud - An introduction
The Cloud - An introductionThe Cloud - An introduction
The Cloud - An introductionLenny Rachitsky
 
The Power of Story, Part 1
The Power of Story, Part 1The Power of Story, Part 1
The Power of Story, Part 1Lenny Rachitsky
 
Getting Things Done - Intro
Getting Things Done - IntroGetting Things Done - Intro
Getting Things Done - IntroLenny Rachitsky
 
The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893Lenny Rachitsky
 
Influence - Robert Cialdini
Influence - Robert CialdiniInfluence - Robert Cialdini
Influence - Robert CialdiniLenny Rachitsky
 

More from Lenny Rachitsky (11)

Localmind pitch at NewTech Montreal
Localmind pitch at NewTech MontrealLocalmind pitch at NewTech Montreal
Localmind pitch at NewTech Montreal
 
Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)Losing Serendipity (Bitnorth 2010)
Losing Serendipity (Bitnorth 2010)
 
Upside of Downtime Preparation Framework
Upside of Downtime Preparation FrameworkUpside of Downtime Preparation Framework
Upside of Downtime Preparation Framework
 
Google App Engine - Simple Introduction
Google App Engine - Simple IntroductionGoogle App Engine - Simple Introduction
Google App Engine - Simple Introduction
 
The Cloud - An introduction
The Cloud - An introductionThe Cloud - An introduction
The Cloud - An introduction
 
How to Trust the Cloud
How to Trust the CloudHow to Trust the Cloud
How to Trust the Cloud
 
The Power of Story, Part 1
The Power of Story, Part 1The Power of Story, Part 1
The Power of Story, Part 1
 
Getting Things Done - Intro
Getting Things Done - IntroGetting Things Done - Intro
Getting Things Done - Intro
 
The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893The White City - Chicago World Fair of 1893
The White City - Chicago World Fair of 1893
 
Influence - Robert Cialdini
Influence - Robert CialdiniInfluence - Robert Cialdini
Influence - Robert Cialdini
 
Twitter - An Intro
Twitter - An IntroTwitter - An Intro
Twitter - An Intro
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

The Upside of Downtime (Velocity 2010)

  • 2. The Upside of Downtime Turning disaster into opportunity Sunday, June 20, 2010
  • 3. Who’s had a site go down? Sunday, June 20, 2010
  • 4. Who’s hasn’t had a site go down? Sunday, June 20, 2010
  • 5. There’s always that one guy! Sunday, June 20, 2010
  • 15. Downtime sucks Source: http://www.motivatedphotos.com/?id=8080 Sunday, June 20, 2010
  • 16. Why downtime sucks Business $3,000 $2,250 $1,500 Sales $750 $0 0 2 4 6 8 10 12 14 16 18 20 22 Sunday, June 20, 2010
  • 17. Why downtime sucks Business Brand Sunday, June 20, 2010
  • 18. Why downtime sucks Business Brand You Sunday, June 20, 2010
  • 19. Why downtime sucks Business Brand You Users Sunday, June 20, 2010
  • 20. Downtime = Bad! (Duh) Sunday, June 20, 2010
  • 21. Approach #1 Don’t fail Sunday, June 20, 2010
  • 23. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 24. “Everything fails all the time” -- Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 25. Your site will fail Werner Vogels (Amazon, CTO) Sunday, June 20, 2010
  • 27. Why Failure Happens Risk Homeostasis Source: http://joshuahind.files.wordpress.com/2009/09/bicycle-crash.jpg Sunday, June 20, 2010
  • 28. Why Failure Happens Risk Homeostasis Black Swan Source: Amazon.com Sunday, June 20, 2010
  • 29. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Source: http://www.apoliticus.com/wp-content/uploads/2009/01/6_21_080306_rumsfeld.jpg Sunday, June 20, 2010
  • 30. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Source: http://bozark.net/wordpress/wp-content/uploads/2008/09/barack_obama_change_fairey.jpg Sunday, June 20, 2010
  • 31. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Source: http://www.biojobblog.com/uploads/image/dominos.jpg Sunday, June 20, 2010
  • 32. Why Failure Happens Risk Homeostasis Black Swan Unknown unknowns Change Many small failures Humans Source: http://www.librarian.net/talks/clc/CLC.key/SJ_Shoulder_Shrug.jpg Sunday, June 20, 2010
  • 35. Polisher blocked Not unusual Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 36. Polisher Moisture leaks into blocked air system Not unusual Not expected Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 37. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Not good Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 38. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 39. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 40. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 41. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken WTF Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 42. Polisher Moisture leaks into Flow of cold water blocked air system stopped Not unusual Not expected Backup disabled Doh! Indicator blocked Dammit Relief valve broken Meltdown Gauge broken Source: http://www.gladwell.com/1996/1996_01_22_a_blowup.htm Sunday, June 20, 2010
  • 45. “accidental power failure” Source: http://www.datacenterknowledge.com/archives/2010/06/16/power-failure-kos-intuit-sites-for-24-hours/ Sunday, June 20, 2010
  • 46. “traffic accident damaged a nearby utility transformer” Source: http://www.datacenterknowledge.com/archives/2007/11/13/truck-crash-knocks-rackspace-offline/ Sunday, June 20, 2010
  • 47. “unfortunate code change” Source: http://www.datacenterknowledge.com/archives/2010/06/11/errant-code-change-crashes-10-million-blogs/ Sunday, June 20, 2010
  • 49. “Unhappy customers may get some attention, but unhappy networked customers can quickly impact your business” -- Clay Shirky Source: http://happenupon.files.wordpress.com/2009/02/technology-guru-clay-shir-001.jpg, http://scholarlykitchen.sspnet.org/2010/03/02/shirky-at-nfais-how-abundance-breaks-everything/ Sunday, June 20, 2010
  • 62. Your site will fail Sunday, June 20, 2010
  • 63. Your site will fail + Downtime is bad Sunday, June 20, 2010
  • 64. Your site will fail + Downtime is bad + Everyone will find out Sunday, June 20, 2010
  • 65. Your site will fail + Downtime is bad + Everyone will find out = Screw it, I’ll become a lumberjack Source: http://sbadrinath.files.wordpress.com/2009/03/different26rqcu3.jpg Sunday, June 20, 2010
  • 66. “Embrace fear of outages and degradation. Use it to guide your architecture, your code, your infrastructure. So lean into it.” -- John Allspaw, VP Tech. Ops at Etsy Sunday, June 20, 2010
  • 67. Approach #2 Prepare for downtime Sunday, June 20, 2010
  • 68. Disclaimer: Try hard to avoid downtime Sunday, June 20, 2010
  • 70. Case Study #1 Facebook Sunday, June 20, 2010
  • 77. “The larger issue here isn't just that a portion of Facebook's platform has gone down - numerous web services have issues from time to time, including everything from Gmail to Twitter. An outage of this length, however, with no official communication from the company itself is disturbing.” -- N.Y. Times Sunday, June 20, 2010
  • 78. Facebook Downtime Disturbing Sunday, June 20, 2010
  • 80. Case Study #2 Google App Engine Sunday, June 20, 2010
  • 95. Google App Engine Downtime Kudos Sunday, June 20, 2010
  • 96. Case Study #3 Atlassian Sunday, June 20, 2010
  • 108. Atlassian Downtime Bravo Sunday, June 20, 2010
  • 110. Downtime: Opportunity to Build Trust Sunday, June 20, 2010
  • 111. Downtime: Opportunity to Destroy Trust Sunday, June 20, 2010
  • 112. How To: Prepare for Downtime Sunday, June 20, 2010
  • 113. Something > Nothing Sunday, June 20, 2010
  • 114. Upside of Downtime Framework 1.0 Life is good Oh crap That sucked Time Sunday, June 20, 2010
  • 115. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 116. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 117. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 118. Upside of Downtime Framework 1.0 Prepare Communicate Explain Time Sunday, June 20, 2010
  • 119. Prepare Communicate Explain Sunday, June 20, 2010
  • 120. Prepare Communicate Explain 1. Communication channel Sunday, June 20, 2010
  • 121. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you You suck Sunday, June 20, 2010
  • 122. Prepare Communicate Explain 1. Communication channel Something is Can’t tell if it’s I’ll assume it’s wrong me or you you Tell me when You suck a lot I know it’s you you’re back less Sunday, June 20, 2010
  • 131. Prepare Communicate Explain 1. Communication channel Easy to find Sunday, June 20, 2010
  • 132. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Sunday, June 20, 2010
  • 133. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated Sunday, June 20, 2010
  • 134. 7 keys for public health dashboards 1. Must show current status for each “service” 2. Data must be accurate and timely 3. Must be easy to find 4. Must provide details for events in real time 5. Provide historical uptime and performance data 6. Provide a way to be notified of status changes 7. Provide details on the data is gathered Source: http://www.transparentuptime.com/2008/11/rules-for-successful-public-health.html Sunday, June 20, 2010
  • 135. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Sunday, June 20, 2010
  • 136. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Sunday, June 20, 2010
  • 137. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) Sunday, June 20, 2010
  • 138. Prepare Communicate Explain 1. Communication channel Easy to find Hosted off-site Real-time / automated 2. Process Authority Mean-Time-To-Communicate (MTTC) On-call/drills/escalations/etc. Sunday, June 20, 2010
  • 140. Prepare Communicate Explain 1. Communicate Sunday, June 20, 2010
  • 141. Prepare Communicate Explain 1. Communicate Use communication channel Sunday, June 20, 2010
  • 142. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Sunday, June 20, 2010
  • 143. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected Sunday, June 20, 2010
  • 144. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started Sunday, June 20, 2010
  • 145. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Sunday, June 20, 2010
  • 146. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly Sunday, June 20, 2010
  • 147. Prepare Communicate Explain 1. Communicate Use communication channel MTTC Who/what is affected When the incident started ETA Update regularly 2. Fix it! Sunday, June 20, 2010
  • 148. Phew, close one! Sunday, June 20, 2010
  • 149. Prepare Communicate Explain 1. Postmortem Sunday, June 20, 2010
  • 150. Prepare Communicate Explain 1. Postmortem Admit failure Source: http://en.blog.wordpress.com/2010/02/19/wp-com-downtime-summary/ Sunday, June 20, 2010
  • 151. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Source: http://www.bureauofcommunication.com/compose/apology Sunday, June 20, 2010
  • 152. Prepare Communicate Explain “We apologize for any inconvenience this may have caused” Sunday, June 20, 2010
  • 153. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 154. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted Source: http://techcrunch.com/2009/11/02/large-scale-downtime-at-rackspace-cloud/ Sunday, June 20, 2010
  • 155. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Source: http://www.zendesk.com/2010/03/tuesday-double-whammy.html Sunday, June 20, 2010
  • 156. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Source: http://graysky.org/2010/02/downtime-postmortem/ Sunday, June 20, 2010
  • 157. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned Sunday, June 20, 2010
  • 158. Prepare Communicate Explain “I was completely overwhelmed by the amount of positive feedback and support I received.” Sunday, June 20, 2010
  • 159. Prepare Communicate Explain 1. Postmortem Admit failure Sound like a human Start time and end time Who/what was impacted What went wrong Lessons learned 2. Improve for the future Sunday, June 20, 2010
  • 160. Prepare Communicate Explain “Google is not just saying sorry, they are actually implementing serious changes which probably represents millions of dollars of development to help make sure this doesn't happen again.” Source: http://news.ycombinator.com/item?id=1168493 Sunday, June 20, 2010
  • 161. Prepare Communicate Explain Source: https://groups.google.com/group/google-appengine/browse_thread/thread/a7640a2743922dcf Sunday, June 20, 2010
  • 162. Prepare Communicate Explain Be human Sunday, June 20, 2010
  • 163. Prepare Communicate Explain Be authentic Sunday, June 20, 2010
  • 164. Prepare Communicate Explain Be transparent Sunday, June 20, 2010
  • 165. Prepare Communicate Explain Accept responsibility Sunday, June 20, 2010
  • 166. Prepare Communicate Explain Learn and improve Sunday, June 20, 2010
  • 167. Prepare Communicate Explain Trust Sunday, June 20, 2010
  • 168. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 169. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human Sunday, June 20, 2010
  • 170. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Be Prepared + Be Transparent + Be Human = Sunday, June 20, 2010 Trust
  • 171. Disclaimer: Don’t screw up too often Sunday, June 20, 2010
  • 173. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Sunday, June 20, 2010
  • 174. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Not Caught Win Sunday, June 20, 2010
  • 175. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Loss Not Caught Win Sunday, June 20, 2010
  • 176. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Sunday, June 20, 2010
  • 177. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 178. Downtime Prisoner’s Dilemma Transparent Not Transparent Caught Big Win Big Loss Not Caught Win Win Sunday, June 20, 2010
  • 179. Benefits Gain trust Reduce churn, increase loyalty Reduce support costs Ability to control the message Competitive advantage More time to focus on the actual problem Reduce stress Sunday, June 20, 2010
  • 180. Change != Easy Sunday, June 20, 2010
  • 182. Keys to Adoption Getting past a culture of “hide the problem” Sunday, June 20, 2010
  • 183. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Sunday, June 20, 2010
  • 184. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Sunday, June 20, 2010
  • 185. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Sunday, June 20, 2010
  • 186. Keys to Adoption Getting past a culture of “hide the problem” Overriding commitment to want to improve Available resources to improve Pain Buy-in Sunday, June 20, 2010
  • 187. Product Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 188. Product Default: Lets wait for complaints Management Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 189. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 190. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 191. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Operations Sales/ Marketing Sunday, June 20, 2010
  • 192. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Sales/ Marketing Sunday, June 20, 2010
  • 193. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Marketing Sunday, June 20, 2010
  • 194. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Sunday, June 20, 2010
  • 195. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 196. Product Default: Lets wait for complaints Management Reality: Proactiveness => Forgiveness Default: Too much work Support Reality: More upfront, less when it matters Engineering/ Default: Don’t want to look bad Operations Reality: Opportunity to learn/improve Sales/ Default: I don’t want my customers to know Marketing Reality: They’ll find out, better from us Sunday, June 20, 2010
  • 198. Simple as that! Sunday, June 20, 2010
  • 199. Your site will still fail! Sunday, June 20, 2010
  • 200. “The measure of a society is how well it transforms pain and suffering into something worthwhile.” -- Fredrick Nietzsche Sunday, June 20, 2010
  • 201. “The measure of a company is how well it transforms pain of downtime into something worthwhile.” -- Lenny Rachitsky Source: Original quote inspired by Fredrick Nietzsche Sunday, June 20, 2010
  • 202. Bare minimum: Register a Twitter account Sunday, June 20, 2010
  • 203. Thank You Slides: http://bit.ly/upside-of-downtime Lenny Rachitsky @lennysan http://www.transparentuptime.com/ Webmetrics/Neustar @webmetrics http://www.webmetrics.com/ Sunday, June 20, 2010
  • 207. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 208. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 209. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 210. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve "Unlikely that an accidental surface or subsurface oil spill would occur from the proposed activities" -- Exploration and environmental impact plan Source: http://en.wikipedia.org/wiki/Deepwater_Horizon_drilling_rig_explosion Sunday, June 20, 2010
  • 211. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 212. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 213. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 214. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 215. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 216. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 217. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 218. Upside of Downtime Framework 1.0 Prepare Communicate Explain 1. Communication channel 1. Communicate 1. Post-mortem - Easy to find - Use channel - Admit failure - Off-site - M.T.T.C. - Sound like a human - Real-time - Who/what affected - Start time and end time - When started - Who/what was impacted 2. Process - ETA to resolution - What went wrong - Give authority - Update regularly - Lessons learned - M.T.T.C. - On-call/escalations 2. Fix it! 2. Learn and improve Sunday, June 20, 2010
  • 219. “Be not afraid of transparency; some are born transparent, some achieve transparency, and others have transparency thrust upon them.” -- Burrowed from William Shakespeare Sunday, June 20, 2010
  • 221. Making change 1. Find the bright spots - (this presentation has a bunch) Sunday, June 20, 2010
  • 222. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) Sunday, June 20, 2010
  • 223. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) Sunday, June 20, 2010
  • 224. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) Sunday, June 20, 2010
  • 225. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) Sunday, June 20, 2010
  • 226. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) Sunday, June 20, 2010
  • 227. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) Sunday, June 20, 2010
  • 228. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) Sunday, June 20, 2010
  • 229. Making change 1. Find the bright spots - (this presentation has a bunch) 2. Script the critical moves - (framework) 3. Point to the destination - (W.W.G.D.) 4. Find the feeling - (how would you feel?) 5. Shrink the change - (start small) 6. Grow your people - (everyone is learning as they go) 7. Tweak the environment - (create a simple process) 8. Build habits - (build process organically) 9. Rally the herd - (get buy in, rest will follow) Sunday, June 20, 2010