SlideShare a Scribd company logo
1 of 56
Download to read offline
Make Sure Your Applications Crash




           Moshe Zadka
True story
Python doesn't crash




Memory managed, no direct pointer arithmetic
...except it does




 C bugs, untrapped exception, infinite loops,
blocking calls, thread dead-lock, inconsistent
                 resident state
Recovery is important




"[S]ystem failure can usually be considered to
  be the result of two program errors[...] the
      second, in the recovery routine[...]"
Crashes and inconsistent data




A crash results in data from an arbitrary
            program state.
Avoid storage




Caches are better than master copies.
Databases




Transactions maintain consistency
    Databases can crash too!
Atomic operations




    File rename
Example: Counting
def update_counter():
    fp = file("counter.txt")
    s = fp.read()
    counter = int(s.strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("counter.txt.tmp", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
    # The following is an atomic operation
    os.rename("counter.txt.tmp", "counter.txt")
Efficient caches, reliable masters




     Mark inconsistency of cache
No shutdown




Crash in testing
Availability




If data is consistent, just restart!
Improving availability




        Limit impact
       Fast detection
        Fast start-up
Vertical splitting




Different execution paths, different processes
Horizontal splitting




Different code bases, different processes
Watchdog




Monitor -> Flag -> Remediate
Watchdog principles




Keep it simple, keep it safe!
Watchdog: Heartbeats
## In a Twisted process
def beat():
    file('beats/my-name', 'a').close()
task.LoopingCall(beat).start(30)
Watchdog: Get time-outs
def getTimeout()
    timeout = dict()
    now = time.time()
    for heart in glob.glob('hearts/*'):
        beat = int(file(heart).read().strip())
        timeout[heart] = now-beat
    return timeout
Watchdog: Mark problems
def markProblems():
    timeout = getTimeout()
    for heart in glob.glob('beats/*'):
        mtime = os.path.getmtime(heart)
        problem = 'problems/'+heart
        if (mtime<timeout[heart] and
           not os.path.isfile(problem)):
            fp = file('problems/'+heart, 'w')
            fp.write('watchdog')
            fp.close()
Watchdog: check solutions
def checkSolutions():
    now = time.time()
    problemTimeout = now-30
    for problem in glob.glob('problems/*'):
        mtime = os.path.getmtime(problem)
        if mtime<problemTimeout:
            subprocess.call(['restart-system'])
Watchdog: Loop
## Watchdog
while True:
    markProblems()
    checkSolutions()
    time.sleep(1)
Watchdog: accuracy of




Custom checkers can manufacture problems
Watchdog: reliability of




   Use cron for main loop
Watchdog: reliability of




Use software/hardware watchdogs
Conclusions




Everything crashes -- plan for it
Questions?
Welcome to the back-up slides
         Extra! Extra!
Example: Counting on Windows
def update_counter():
    fp = file("counter.txt")
    s = fp.read()
    counter = int(s.strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("counter.txt.tmp", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
    os.remove("counter.txt")
    # At this point, the state is inconsistent*
    # The following is an atomic operation
os.rename("counter.txt.tmp", "counter.txt")
Example: Counting on Windows
             (Recovery)
def recover():
    if not os.path.exists("counter.txt"):
        # The permanent file has been removed
        # Therefore, the temp file is valid
        os.rename("counter.txt.tmp",
                  "counter.txt")
Example: Counting with versions
def update_counter():
    files = [int(name.split('.')[-1])
               for name in os.listdir('.')
                 if name.startswith('counter.')]
    last = max(files)
    counter = int(file('counter.%s' % last
                      ).read().strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("tmp.counter", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
os.rename('tmp.counter',
          'counter.%s' % (last+1))
os.remove('counter.%s' % last)
Example: Counting with versions
             (cleanup)
# This is not a recovery routine, but a cleanup
# routine.
# Even in its absence, the state is consistent
def cleanup():
    files = [int(name.split('.')[-1])
                for name in os.listdir('.')
                  if name.startswith('counter.')]
    files.sort()
    files.pop()
    for n in files:
        os.remove('counter.%d' % n)
    if os.path.exists('tmp.counter'):
        os.remove('tmp.counter')
Correct ordering
def activate_due():
    scheduled = rs.smembers('scheduled')
    now = time.time()
    for el in scheduled:
        due = int(rs.get(el+':due'))
        if now<due:
            continue
        rs.sadd('activated', el)
        rs.delete(el+':due')
        rs.sremove('scheduled', el)
Correct ordering (recovery)
def recover():
    inconsistent = rs.sinter('activated',
                             'scheduled')
    for el in inconsistent:
        rs.delete(el+':due') #*
        rs.sremove('scheduled', el)
Example: Key/value stores
0.log:
  ['add', 'key-0', 'value-0']
  ['add', 'key-1', 'value-1']
  ['add', 'key-0', 'value-2']
  ['remove', 'key-1']
  .
  .
  .

1.log:
  .
  .
  .

2.log:
.
.
.
Example: Key/value stores (utility
             functions)
## Get the level of a file
def getLevel(s)
    return int(s.split('.')[0])

## Get all files of a given type
def getType(tp):
    return [(getLevel(s), s)
             for s in files if s.endswith(tp)]
Example: Key/value stores
             (classifying files)
## Get all relevant files
def relevant(d):
    files = os.listdir(d):
    mlevel, master = max(getType('.master'))
    logs = getType('.log')
    logs.sort()
    return master+[log for llevel, log in logs
                           if llevel>mlevel]
Example: Key/value stores (reading)
## Read in a single file
def update(result, fp):
    for line in fp:
        val = json.loads(line)
        if val[0] == 'add':
            result[val[1]] = val[2]
        else:
            del result[val[1]]

## Read in several files
def read(files):
    result = dict()
    for fname in files:
        try:
             update(result, file(fname))
except ValueError:
        pass
return result
Example: Key/value stores (writer
               class)
class Writer(object):
    def __init__(self, level):
        self.level = level
        self.fp = None
        self._next()
    def _next(self):
        self.level += 1
        if self.fp:
            self.fp.close()
        name ='%3d.log' % self.currentLevel
        self.fp = file(name, 'w')
        self.rows = 0
    def write(self, value):
print >>self.fp, json.dumps(value)
self.fp.flush()
self.rows += 1
if self.rows>200:
    self._next()
Example: Key/value stores (storage
               class)
## The actual data store abstraction.
class Store(object):
    def __init__(self):
        files = relevant(d)
        self.result = read(files)
        level = getLevel(files[-1])
        self.writer = Writer(level)
    def get(self, key):
        return self.result[key]
    def add(self, key, value):
        self.writer.write(['add', key, value])
    def remove(self, key):
        self.writer.write(['remove', key])
Example: Key/value stores
            (compression code)
## This should be run periodically
# from a different thread
def compress(d):
    files = relevant(d)[:-1]
    if len(files)<2:
        return
    result = read(files)
    master = getLevel(files[-1])+1
    fp = file('%3d.master.tmp' % master, 'w')
    for key, value in result.iteritems():
        towrite = ['add', key, value])
        print >>fp, json.dumps(towrite)
    fp.close()
Vertical splitting: Example
def forking_server():
    s = socket.socket()
    s.bind(('', 8080))
    s.listen(5)
    while True:
        client = s.accept()
        newpid = os.fork()
        if newpid:
            f = client.makefile()
            f.write("Sunday, May 22, 1983 "
                    "18:45:59-PST")
            f.close()
            os._exit()
Horizontal splitting: front-end
## Process one
class SchedulerResource(resource.Resource):
    isLeaf = True
    def __init__(self, filepath):
        resource.Resource.__init__(self)
        self.filepath = filepath
    def render_PUT(self, request):
        uuid, = request.postpath
        content = request.content.read()
        child = self.filepath.child(uuid)
        child.setContent(content)
fp = filepath.FilePath("things")
r = SchedulerResource(fp)
s = server.Site(r)
reactor.listenTCP(8080, s)
Horizontal splitting: scheduler
## Process two
rs = redis.Redis(host='localhost',
                  port=6379, db=9)
while True:
    for fname in os.listdir("things"):
        when = int(file(fname).read().strip())
        rs.set(uuid+':due', when)
        rs.sadd('scheduled', uuid)
        os.remove(fname)
    time.sleep(1)
Horizontal splitting: runner
## Process three
rs = redis.Redis(host='localhost',
                  port=6379, db=9)
recover()
while True:
    activate_due()
    time.sleep(1)
Horizontal splitting: message
           queues
     No direct dependencies
Horizontal splitting: message
            queues: sender
## Process four
rs = redis.Redis(host='localhost',
                 port=6379, db=9)
params = pika.ConnectionParameters('localhost')
conn = pika.BlockingConnection(params)
channel = conn.channel()
channel.queue_declare(queue='active')
while True:
    activated = rs.smembers('activated')
    finished = set(rs.smembers('finished'))
    for el in activated:
        if el in finished:
            continue
channel.basic_publish(
    exchange='', routing_key='active',
    body=el)
rs.add('finished', el)
Horizontal splitting: message
            queues: receiver
## Process five
# It is possible to get "dups" of bodies.
# Application logic should deal with that
params = pika.ConnectionParameters('localhost')
conn = pika.BlockingConnection(params)
channel = conn.channel()
channel.queue_declare(queue='active')
def callback(ch, method, properties, el):
    syslog.syslog('Activated %s' % el)
channel.basic_consume(callback, queue='hello', no_ack=True)
channel.start_consuming()
Horizontal splitting: point-to-point
      Use HTTP (preferably, REST)

More Related Content

What's hot

Ansible for Beginners
Ansible for BeginnersAnsible for Beginners
Ansible for BeginnersArie Bregman
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsMathias Herberts
 
Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Kiyotaka Oku
 
Apache Airflow
Apache AirflowApache Airflow
Apache AirflowJason Kim
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)MongoDB
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxSoumen Santra
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)Jacek Laskowski
 
Python profiling
Python profilingPython profiling
Python profilingdreampuf
 
Assignment no39
Assignment no39Assignment no39
Assignment no39Jay Patel
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced ReplicationMongoDB
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performancesource{d}
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight RecorderYoshiro Tokumasu
 
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -Yoshiro Tokumasu
 
The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185Mahmoud Samir Fayed
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189Mahmoud Samir Fayed
 

What's hot (20)

Ansible for Beginners
Ansible for BeginnersAnsible for Beginners
Ansible for Beginners
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy Systems
 
Ns2programs
Ns2programsNs2programs
Ns2programs
 
Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Hadoop
HadoopHadoop
Hadoop
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with Linux
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
 
Python profiling
Python profilingPython profiling
Python profiling
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced Replication
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
 
The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
How To Recoord
How To RecoordHow To Recoord
How To Recoord
 
The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189
 

Viewers also liked

My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3David Sommer
 
Strategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationStrategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationJohn Collins
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Internationalization in Rails 2.2
Internationalization in Rails 2.2Internationalization in Rails 2.2
Internationalization in Rails 2.2Nicolas Jacobeus
 
Sample of instructions
Sample of instructionsSample of instructions
Sample of instructionsDavid Sommer
 
Designing for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsDesigning for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsRobert Douglas
 
2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentaryalghanim
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsDavid Sommer
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsHeatherRivers
 
mobile development platforms
mobile development platformsmobile development platforms
mobile development platformsguestfa9375
 
Sample email submission
Sample email submissionSample email submission
Sample email submissionDavid Sommer
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web appsiapain
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)John Collins
 
The ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThe ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThoughtWorks
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageJohn Collins
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)John Collins
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)John Collins
 

Viewers also liked (20)

My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3
 
Glossary
GlossaryGlossary
Glossary
 
Strategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationStrategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful Localization
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Internationalization in Rails 2.2
Internationalization in Rails 2.2Internationalization in Rails 2.2
Internationalization in Rails 2.2
 
Sample of instructions
Sample of instructionsSample of instructions
Sample of instructions
 
Designing for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsDesigning for Multiple Mobile Platforms
Designing for Multiple Mobile Platforms
 
2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kits
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with Rails
 
Shrunken Head
 Shrunken Head  Shrunken Head
Shrunken Head
 
mobile development platforms
mobile development platformsmobile development platforms
mobile development platforms
 
Silmeyiniz
SilmeyinizSilmeyiniz
Silmeyiniz
 
Sample email submission
Sample email submissionSample email submission
Sample email submission
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web apps
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)
 
The ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThe ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj Kumar
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any Language
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
 

Similar to Make Sure Your Applications Crash

3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib웅식 전
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingMuthu Vinayagam
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Pythonkwatch
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsagniklal
 
Python Asíncrono - Async Python
Python Asíncrono - Async PythonPython Asíncrono - Async Python
Python Asíncrono - Async PythonJavier Abadía
 
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...Yashpatel821746
 
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Yashpatel821746
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...Yashpatel821746
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheetJogesh Rao
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in pythonKarin Lagesen
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
Commit2015 kharchenko - python generators - ext
Commit2015   kharchenko - python generators - extCommit2015   kharchenko - python generators - ext
Commit2015 kharchenko - python generators - extMaxym Kharchenko
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSAdam L Barrett
 
Terminal linux commands_ Fedora based
Terminal  linux commands_ Fedora basedTerminal  linux commands_ Fedora based
Terminal linux commands_ Fedora basedNavin Thapa
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing DaeHyung Lee
 

Similar to Make Sure Your Applications Crash (20)

Five
FiveFive
Five
 
3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsag
 
Linux cheat sheet
Linux cheat sheetLinux cheat sheet
Linux cheat sheet
 
Python Asíncrono - Async Python
Python Asíncrono - Async PythonPython Asíncrono - Async Python
Python Asíncrono - Async Python
 
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
 
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
 
python codes
python codespython codes
python codes
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
Commit2015 kharchenko - python generators - ext
Commit2015   kharchenko - python generators - extCommit2015   kharchenko - python generators - ext
Commit2015 kharchenko - python generators - ext
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJS
 
Terminal linux commands_ Fedora based
Terminal  linux commands_ Fedora basedTerminal  linux commands_ Fedora based
Terminal linux commands_ Fedora based
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing
 

Recently uploaded

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Alexander Turgeon
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 

Recently uploaded (20)

VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
UiPath Clipboard AI: "A TIME Magazine Best Invention of 2023 Unveiled"
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024Valere | Digital Solutions & AI Transformation Portfolio | 2024
Valere | Digital Solutions & AI Transformation Portfolio | 2024
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 

Make Sure Your Applications Crash

  • 1. Make Sure Your Applications Crash Moshe Zadka
  • 3. Python doesn't crash Memory managed, no direct pointer arithmetic
  • 4. ...except it does C bugs, untrapped exception, infinite loops, blocking calls, thread dead-lock, inconsistent resident state
  • 5. Recovery is important "[S]ystem failure can usually be considered to be the result of two program errors[...] the second, in the recovery routine[...]"
  • 6. Crashes and inconsistent data A crash results in data from an arbitrary program state.
  • 7. Avoid storage Caches are better than master copies.
  • 9. Atomic operations File rename
  • 10. Example: Counting def update_counter(): fp = file("counter.txt") s = fp.read() counter = int(s.strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("counter.txt.tmp", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified # The following is an atomic operation os.rename("counter.txt.tmp", "counter.txt")
  • 11. Efficient caches, reliable masters Mark inconsistency of cache
  • 13. Availability If data is consistent, just restart!
  • 14. Improving availability Limit impact Fast detection Fast start-up
  • 15. Vertical splitting Different execution paths, different processes
  • 16. Horizontal splitting Different code bases, different processes
  • 17. Watchdog Monitor -> Flag -> Remediate
  • 18. Watchdog principles Keep it simple, keep it safe!
  • 19. Watchdog: Heartbeats ## In a Twisted process def beat(): file('beats/my-name', 'a').close() task.LoopingCall(beat).start(30)
  • 20. Watchdog: Get time-outs def getTimeout() timeout = dict() now = time.time() for heart in glob.glob('hearts/*'): beat = int(file(heart).read().strip()) timeout[heart] = now-beat return timeout
  • 21. Watchdog: Mark problems def markProblems(): timeout = getTimeout() for heart in glob.glob('beats/*'): mtime = os.path.getmtime(heart) problem = 'problems/'+heart if (mtime<timeout[heart] and not os.path.isfile(problem)): fp = file('problems/'+heart, 'w') fp.write('watchdog') fp.close()
  • 22. Watchdog: check solutions def checkSolutions(): now = time.time() problemTimeout = now-30 for problem in glob.glob('problems/*'): mtime = os.path.getmtime(problem) if mtime<problemTimeout: subprocess.call(['restart-system'])
  • 23. Watchdog: Loop ## Watchdog while True: markProblems() checkSolutions() time.sleep(1)
  • 24. Watchdog: accuracy of Custom checkers can manufacture problems
  • 25. Watchdog: reliability of Use cron for main loop
  • 26. Watchdog: reliability of Use software/hardware watchdogs
  • 29. Welcome to the back-up slides Extra! Extra!
  • 30. Example: Counting on Windows def update_counter(): fp = file("counter.txt") s = fp.read() counter = int(s.strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("counter.txt.tmp", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified os.remove("counter.txt") # At this point, the state is inconsistent* # The following is an atomic operation
  • 32. Example: Counting on Windows (Recovery) def recover(): if not os.path.exists("counter.txt"): # The permanent file has been removed # Therefore, the temp file is valid os.rename("counter.txt.tmp", "counter.txt")
  • 33. Example: Counting with versions def update_counter(): files = [int(name.split('.')[-1]) for name in os.listdir('.') if name.startswith('counter.')] last = max(files) counter = int(file('counter.%s' % last ).read().strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("tmp.counter", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified
  • 34. os.rename('tmp.counter', 'counter.%s' % (last+1)) os.remove('counter.%s' % last)
  • 35. Example: Counting with versions (cleanup) # This is not a recovery routine, but a cleanup # routine. # Even in its absence, the state is consistent def cleanup(): files = [int(name.split('.')[-1]) for name in os.listdir('.') if name.startswith('counter.')] files.sort() files.pop() for n in files: os.remove('counter.%d' % n) if os.path.exists('tmp.counter'): os.remove('tmp.counter')
  • 36. Correct ordering def activate_due(): scheduled = rs.smembers('scheduled') now = time.time() for el in scheduled: due = int(rs.get(el+':due')) if now<due: continue rs.sadd('activated', el) rs.delete(el+':due') rs.sremove('scheduled', el)
  • 37. Correct ordering (recovery) def recover(): inconsistent = rs.sinter('activated', 'scheduled') for el in inconsistent: rs.delete(el+':due') #* rs.sremove('scheduled', el)
  • 38. Example: Key/value stores 0.log: ['add', 'key-0', 'value-0'] ['add', 'key-1', 'value-1'] ['add', 'key-0', 'value-2'] ['remove', 'key-1'] . . . 1.log: . . . 2.log:
  • 39. . . .
  • 40. Example: Key/value stores (utility functions) ## Get the level of a file def getLevel(s) return int(s.split('.')[0]) ## Get all files of a given type def getType(tp): return [(getLevel(s), s) for s in files if s.endswith(tp)]
  • 41. Example: Key/value stores (classifying files) ## Get all relevant files def relevant(d): files = os.listdir(d): mlevel, master = max(getType('.master')) logs = getType('.log') logs.sort() return master+[log for llevel, log in logs if llevel>mlevel]
  • 42. Example: Key/value stores (reading) ## Read in a single file def update(result, fp): for line in fp: val = json.loads(line) if val[0] == 'add': result[val[1]] = val[2] else: del result[val[1]] ## Read in several files def read(files): result = dict() for fname in files: try: update(result, file(fname))
  • 43. except ValueError: pass return result
  • 44. Example: Key/value stores (writer class) class Writer(object): def __init__(self, level): self.level = level self.fp = None self._next() def _next(self): self.level += 1 if self.fp: self.fp.close() name ='%3d.log' % self.currentLevel self.fp = file(name, 'w') self.rows = 0 def write(self, value):
  • 46. Example: Key/value stores (storage class) ## The actual data store abstraction. class Store(object): def __init__(self): files = relevant(d) self.result = read(files) level = getLevel(files[-1]) self.writer = Writer(level) def get(self, key): return self.result[key] def add(self, key, value): self.writer.write(['add', key, value]) def remove(self, key): self.writer.write(['remove', key])
  • 47. Example: Key/value stores (compression code) ## This should be run periodically # from a different thread def compress(d): files = relevant(d)[:-1] if len(files)<2: return result = read(files) master = getLevel(files[-1])+1 fp = file('%3d.master.tmp' % master, 'w') for key, value in result.iteritems(): towrite = ['add', key, value]) print >>fp, json.dumps(towrite) fp.close()
  • 48. Vertical splitting: Example def forking_server(): s = socket.socket() s.bind(('', 8080)) s.listen(5) while True: client = s.accept() newpid = os.fork() if newpid: f = client.makefile() f.write("Sunday, May 22, 1983 " "18:45:59-PST") f.close() os._exit()
  • 49. Horizontal splitting: front-end ## Process one class SchedulerResource(resource.Resource): isLeaf = True def __init__(self, filepath): resource.Resource.__init__(self) self.filepath = filepath def render_PUT(self, request): uuid, = request.postpath content = request.content.read() child = self.filepath.child(uuid) child.setContent(content) fp = filepath.FilePath("things") r = SchedulerResource(fp) s = server.Site(r) reactor.listenTCP(8080, s)
  • 50. Horizontal splitting: scheduler ## Process two rs = redis.Redis(host='localhost', port=6379, db=9) while True: for fname in os.listdir("things"): when = int(file(fname).read().strip()) rs.set(uuid+':due', when) rs.sadd('scheduled', uuid) os.remove(fname) time.sleep(1)
  • 51. Horizontal splitting: runner ## Process three rs = redis.Redis(host='localhost', port=6379, db=9) recover() while True: activate_due() time.sleep(1)
  • 52. Horizontal splitting: message queues No direct dependencies
  • 53. Horizontal splitting: message queues: sender ## Process four rs = redis.Redis(host='localhost', port=6379, db=9) params = pika.ConnectionParameters('localhost') conn = pika.BlockingConnection(params) channel = conn.channel() channel.queue_declare(queue='active') while True: activated = rs.smembers('activated') finished = set(rs.smembers('finished')) for el in activated: if el in finished: continue
  • 54. channel.basic_publish( exchange='', routing_key='active', body=el) rs.add('finished', el)
  • 55. Horizontal splitting: message queues: receiver ## Process five # It is possible to get "dups" of bodies. # Application logic should deal with that params = pika.ConnectionParameters('localhost') conn = pika.BlockingConnection(params) channel = conn.channel() channel.queue_declare(queue='active') def callback(ch, method, properties, el): syslog.syslog('Activated %s' % el) channel.basic_consume(callback, queue='hello', no_ack=True) channel.start_consuming()
  • 56. Horizontal splitting: point-to-point Use HTTP (preferably, REST)