SlideShare a Scribd company logo
1 of 56
Download to read offline
Make Sure Your Applications Crash




           Moshe Zadka
True story
Python doesn't crash




Memory managed, no direct pointer arithmetic
...except it does




 C bugs, untrapped exception, infinite loops,
blocking calls, thread dead-lock, inconsistent
                 resident state
Recovery is important




"[S]ystem failure can usually be considered to
  be the result of two program errors[...] the
      second, in the recovery routine[...]"
Crashes and inconsistent data




A crash results in data from an arbitrary
            program state.
Avoid storage




Caches are better than master copies.
Databases




Transactions maintain consistency
    Databases can crash too!
Atomic operations




    File rename
Example: Counting
def update_counter():
    fp = file("counter.txt")
    s = fp.read()
    counter = int(s.strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("counter.txt.tmp", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
    # The following is an atomic operation
    os.rename("counter.txt.tmp", "counter.txt")
Efficient caches, reliable masters




     Mark inconsistency of cache
No shutdown




Crash in testing
Availability




If data is consistent, just restart!
Improving availability




        Limit impact
       Fast detection
        Fast start-up
Vertical splitting




Different execution paths, different processes
Horizontal splitting




Different code bases, different processes
Watchdog




Monitor -> Flag -> Remediate
Watchdog principles




Keep it simple, keep it safe!
Watchdog: Heartbeats
## In a Twisted process
def beat():
    file('beats/my-name', 'a').close()
task.LoopingCall(beat).start(30)
Watchdog: Get time-outs
def getTimeout()
    timeout = dict()
    now = time.time()
    for heart in glob.glob('hearts/*'):
        beat = int(file(heart).read().strip())
        timeout[heart] = now-beat
    return timeout
Watchdog: Mark problems
def markProblems():
    timeout = getTimeout()
    for heart in glob.glob('beats/*'):
        mtime = os.path.getmtime(heart)
        problem = 'problems/'+heart
        if (mtime<timeout[heart] and
           not os.path.isfile(problem)):
            fp = file('problems/'+heart, 'w')
            fp.write('watchdog')
            fp.close()
Watchdog: check solutions
def checkSolutions():
    now = time.time()
    problemTimeout = now-30
    for problem in glob.glob('problems/*'):
        mtime = os.path.getmtime(problem)
        if mtime<problemTimeout:
            subprocess.call(['restart-system'])
Watchdog: Loop
## Watchdog
while True:
    markProblems()
    checkSolutions()
    time.sleep(1)
Watchdog: accuracy of




Custom checkers can manufacture problems
Watchdog: reliability of




   Use cron for main loop
Watchdog: reliability of




Use software/hardware watchdogs
Conclusions




Everything crashes -- plan for it
Questions?
Welcome to the back-up slides
         Extra! Extra!
Example: Counting on Windows
def update_counter():
    fp = file("counter.txt")
    s = fp.read()
    counter = int(s.strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("counter.txt.tmp", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
    os.remove("counter.txt")
    # At this point, the state is inconsistent*
    # The following is an atomic operation
os.rename("counter.txt.tmp", "counter.txt")
Example: Counting on Windows
             (Recovery)
def recover():
    if not os.path.exists("counter.txt"):
        # The permanent file has been removed
        # Therefore, the temp file is valid
        os.rename("counter.txt.tmp",
                  "counter.txt")
Example: Counting with versions
def update_counter():
    files = [int(name.split('.')[-1])
               for name in os.listdir('.')
                 if name.startswith('counter.')]
    last = max(files)
    counter = int(file('counter.%s' % last
                      ).read().strip())
    counter += 1
    # If there is a crash before this point,
    # no changes have been done.
    fp = file("tmp.counter", 'w')
    print >>fp, counter
    fp.close()
    # If there is a crash before this point,
    # only a temp file has been modified
os.rename('tmp.counter',
          'counter.%s' % (last+1))
os.remove('counter.%s' % last)
Example: Counting with versions
             (cleanup)
# This is not a recovery routine, but a cleanup
# routine.
# Even in its absence, the state is consistent
def cleanup():
    files = [int(name.split('.')[-1])
                for name in os.listdir('.')
                  if name.startswith('counter.')]
    files.sort()
    files.pop()
    for n in files:
        os.remove('counter.%d' % n)
    if os.path.exists('tmp.counter'):
        os.remove('tmp.counter')
Correct ordering
def activate_due():
    scheduled = rs.smembers('scheduled')
    now = time.time()
    for el in scheduled:
        due = int(rs.get(el+':due'))
        if now<due:
            continue
        rs.sadd('activated', el)
        rs.delete(el+':due')
        rs.sremove('scheduled', el)
Correct ordering (recovery)
def recover():
    inconsistent = rs.sinter('activated',
                             'scheduled')
    for el in inconsistent:
        rs.delete(el+':due') #*
        rs.sremove('scheduled', el)
Example: Key/value stores
0.log:
  ['add', 'key-0', 'value-0']
  ['add', 'key-1', 'value-1']
  ['add', 'key-0', 'value-2']
  ['remove', 'key-1']
  .
  .
  .

1.log:
  .
  .
  .

2.log:
.
.
.
Example: Key/value stores (utility
             functions)
## Get the level of a file
def getLevel(s)
    return int(s.split('.')[0])

## Get all files of a given type
def getType(tp):
    return [(getLevel(s), s)
             for s in files if s.endswith(tp)]
Example: Key/value stores
             (classifying files)
## Get all relevant files
def relevant(d):
    files = os.listdir(d):
    mlevel, master = max(getType('.master'))
    logs = getType('.log')
    logs.sort()
    return master+[log for llevel, log in logs
                           if llevel>mlevel]
Example: Key/value stores (reading)
## Read in a single file
def update(result, fp):
    for line in fp:
        val = json.loads(line)
        if val[0] == 'add':
            result[val[1]] = val[2]
        else:
            del result[val[1]]

## Read in several files
def read(files):
    result = dict()
    for fname in files:
        try:
             update(result, file(fname))
except ValueError:
        pass
return result
Example: Key/value stores (writer
               class)
class Writer(object):
    def __init__(self, level):
        self.level = level
        self.fp = None
        self._next()
    def _next(self):
        self.level += 1
        if self.fp:
            self.fp.close()
        name ='%3d.log' % self.currentLevel
        self.fp = file(name, 'w')
        self.rows = 0
    def write(self, value):
print >>self.fp, json.dumps(value)
self.fp.flush()
self.rows += 1
if self.rows>200:
    self._next()
Example: Key/value stores (storage
               class)
## The actual data store abstraction.
class Store(object):
    def __init__(self):
        files = relevant(d)
        self.result = read(files)
        level = getLevel(files[-1])
        self.writer = Writer(level)
    def get(self, key):
        return self.result[key]
    def add(self, key, value):
        self.writer.write(['add', key, value])
    def remove(self, key):
        self.writer.write(['remove', key])
Example: Key/value stores
            (compression code)
## This should be run periodically
# from a different thread
def compress(d):
    files = relevant(d)[:-1]
    if len(files)<2:
        return
    result = read(files)
    master = getLevel(files[-1])+1
    fp = file('%3d.master.tmp' % master, 'w')
    for key, value in result.iteritems():
        towrite = ['add', key, value])
        print >>fp, json.dumps(towrite)
    fp.close()
Vertical splitting: Example
def forking_server():
    s = socket.socket()
    s.bind(('', 8080))
    s.listen(5)
    while True:
        client = s.accept()
        newpid = os.fork()
        if newpid:
            f = client.makefile()
            f.write("Sunday, May 22, 1983 "
                    "18:45:59-PST")
            f.close()
            os._exit()
Horizontal splitting: front-end
## Process one
class SchedulerResource(resource.Resource):
    isLeaf = True
    def __init__(self, filepath):
        resource.Resource.__init__(self)
        self.filepath = filepath
    def render_PUT(self, request):
        uuid, = request.postpath
        content = request.content.read()
        child = self.filepath.child(uuid)
        child.setContent(content)
fp = filepath.FilePath("things")
r = SchedulerResource(fp)
s = server.Site(r)
reactor.listenTCP(8080, s)
Horizontal splitting: scheduler
## Process two
rs = redis.Redis(host='localhost',
                  port=6379, db=9)
while True:
    for fname in os.listdir("things"):
        when = int(file(fname).read().strip())
        rs.set(uuid+':due', when)
        rs.sadd('scheduled', uuid)
        os.remove(fname)
    time.sleep(1)
Horizontal splitting: runner
## Process three
rs = redis.Redis(host='localhost',
                  port=6379, db=9)
recover()
while True:
    activate_due()
    time.sleep(1)
Horizontal splitting: message
           queues
     No direct dependencies
Horizontal splitting: message
            queues: sender
## Process four
rs = redis.Redis(host='localhost',
                 port=6379, db=9)
params = pika.ConnectionParameters('localhost')
conn = pika.BlockingConnection(params)
channel = conn.channel()
channel.queue_declare(queue='active')
while True:
    activated = rs.smembers('activated')
    finished = set(rs.smembers('finished'))
    for el in activated:
        if el in finished:
            continue
channel.basic_publish(
    exchange='', routing_key='active',
    body=el)
rs.add('finished', el)
Horizontal splitting: message
            queues: receiver
## Process five
# It is possible to get "dups" of bodies.
# Application logic should deal with that
params = pika.ConnectionParameters('localhost')
conn = pika.BlockingConnection(params)
channel = conn.channel()
channel.queue_declare(queue='active')
def callback(ch, method, properties, el):
    syslog.syslog('Activated %s' % el)
channel.basic_consume(callback, queue='hello', no_ack=True)
channel.start_consuming()
Horizontal splitting: point-to-point
      Use HTTP (preferably, REST)

More Related Content

What's hot

Ansible for Beginners
Ansible for BeginnersAnsible for Beginners
Ansible for BeginnersArie Bregman
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsMathias Herberts
 
Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Kiyotaka Oku
 
Apache Airflow
Apache AirflowApache Airflow
Apache AirflowJason Kim
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)MongoDB
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxSoumen Santra
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)Jacek Laskowski
 
Python profiling
Python profilingPython profiling
Python profilingdreampuf
 
Assignment no39
Assignment no39Assignment no39
Assignment no39Jay Patel
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced ReplicationMongoDB
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performancesource{d}
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight RecorderYoshiro Tokumasu
 
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -Yoshiro Tokumasu
 
The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185Mahmoud Samir Fayed
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189Mahmoud Samir Fayed
 

What's hot (20)

Ansible for Beginners
Ansible for BeginnersAnsible for Beginners
Ansible for Beginners
 
Leveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy SystemsLeveraging Hadoop for Legacy Systems
Leveraging Hadoop for Legacy Systems
 
Ns2programs
Ns2programsNs2programs
Ns2programs
 
Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介Grails/Groovyによる開発事例紹介
Grails/Groovyによる開発事例紹介
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Hadoop
HadoopHadoop
Hadoop
 
Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)Replica Sets (NYC NoSQL Meetup)
Replica Sets (NYC NoSQL Meetup)
 
Threads Advance in System Administration with Linux
Threads Advance in System Administration with LinuxThreads Advance in System Administration with Linux
Threads Advance in System Administration with Linux
 
(map Clojure everyday-tasks)
(map Clojure everyday-tasks)(map Clojure everyday-tasks)
(map Clojure everyday-tasks)
 
Python profiling
Python profilingPython profiling
Python profiling
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
 
Advanced Replication
Advanced ReplicationAdvanced Replication
Advanced Replication
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder使ってみよう!JDK Flight Recorder
使ってみよう!JDK Flight Recorder
 
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp KrennJavantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
 
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
JFR Event StreamingによるAP監視 - JDK Flight Recorder の活用 -
 
The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185The Ring programming language version 1.5.4 book - Part 25 of 185
The Ring programming language version 1.5.4 book - Part 25 of 185
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
How To Recoord
How To RecoordHow To Recoord
How To Recoord
 
The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189The Ring programming language version 1.6 book - Part 71 of 189
The Ring programming language version 1.6 book - Part 71 of 189
 

Viewers also liked

My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3David Sommer
 
Strategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationStrategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationJohn Collins
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Internationalization in Rails 2.2
Internationalization in Rails 2.2Internationalization in Rails 2.2
Internationalization in Rails 2.2Nicolas Jacobeus
 
Sample of instructions
Sample of instructionsSample of instructions
Sample of instructionsDavid Sommer
 
Designing for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsDesigning for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsRobert Douglas
 
2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentaryalghanim
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsDavid Sommer
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsHeatherRivers
 
mobile development platforms
mobile development platformsmobile development platforms
mobile development platformsguestfa9375
 
Sample email submission
Sample email submissionSample email submission
Sample email submissionDavid Sommer
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web appsiapain
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)John Collins
 
The ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThe ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThoughtWorks
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageJohn Collins
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)John Collins
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)John Collins
 

Viewers also liked (20)

My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3My trans kit checklist gw1 ds1_gw3
My trans kit checklist gw1 ds1_gw3
 
Glossary
GlossaryGlossary
Glossary
 
Strategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful LocalizationStrategies for Friendly English and Successful Localization
Strategies for Friendly English and Successful Localization
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Internationalization in Rails 2.2
Internationalization in Rails 2.2Internationalization in Rails 2.2
Internationalization in Rails 2.2
 
Sample of instructions
Sample of instructionsSample of instructions
Sample of instructions
 
Designing for Multiple Mobile Platforms
Designing for Multiple Mobile PlatformsDesigning for Multiple Mobile Platforms
Designing for Multiple Mobile Platforms
 
2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary2008 Fourth Quarter Real Estate Commentary
2008 Fourth Quarter Real Estate Commentary
 
Stc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kitsStc 2014 unraveling the mysteries of localization kits
Stc 2014 unraveling the mysteries of localization kits
 
Linguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with RailsLinguistic Potluck: Crowdsourcing localization with Rails
Linguistic Potluck: Crowdsourcing localization with Rails
 
Shrunken Head
 Shrunken Head  Shrunken Head
Shrunken Head
 
mobile development platforms
mobile development platformsmobile development platforms
mobile development platforms
 
Silmeyiniz
SilmeyinizSilmeyiniz
Silmeyiniz
 
Sample email submission
Sample email submissionSample email submission
Sample email submission
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web apps
 
Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)Putting Out Fires with Content Strategy (STC Academic SIG)
Putting Out Fires with Content Strategy (STC Academic SIG)
 
The ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj KumarThe ruby on rails i18n core api-Neeraj Kumar
The ruby on rails i18n core api-Neeraj Kumar
 
Building Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any LanguageBuilding Quality Experiences for Users in Any Language
Building Quality Experiences for Users in Any Language
 
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
Strategies for Friendly English and Successful Localization (InfoDevWorld 2014)
 
Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)Putting Out Fires with Content Strategy (InfoDevDC meetup)
Putting Out Fires with Content Strategy (InfoDevDC meetup)
 

Similar to Make Sure Your Applications Crash

3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib웅식 전
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingMuthu Vinayagam
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Pythonkwatch
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsagniklal
 
Python Asíncrono - Async Python
Python Asíncrono - Async PythonPython Asíncrono - Async Python
Python Asíncrono - Async PythonJavier Abadía
 
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...Yashpatel821746
 
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Yashpatel821746
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...Yashpatel821746
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheetJogesh Rao
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in pythonKarin Lagesen
 
Can you fix the errors- It isn't working when I try to run import s.pdf
Can you fix the errors- It isn't working when I try to run    import s.pdfCan you fix the errors- It isn't working when I try to run    import s.pdf
Can you fix the errors- It isn't working when I try to run import s.pdfaksachdevahosymills
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+ConFoo
 
Commit2015 kharchenko - python generators - ext
Commit2015   kharchenko - python generators - extCommit2015   kharchenko - python generators - ext
Commit2015 kharchenko - python generators - extMaxym Kharchenko
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSAdam L Barrett
 
Terminal linux commands_ Fedora based
Terminal  linux commands_ Fedora basedTerminal  linux commands_ Fedora based
Terminal linux commands_ Fedora basedNavin Thapa
 

Similar to Make Sure Your Applications Crash (20)

Five
FiveFive
Five
 
3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib
 
What's new in Python 3.11
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
 
Python utan-stodhjul-motorsag
Python utan-stodhjul-motorsagPython utan-stodhjul-motorsag
Python utan-stodhjul-motorsag
 
Linux cheat sheet
Linux cheat sheetLinux cheat sheet
Linux cheat sheet
 
Python Asíncrono - Async Python
Python Asíncrono - Async PythonPython Asíncrono - Async Python
Python Asíncrono - Async Python
 
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
8799.pdfOr else the work is fine only. Lot to learn buddy.... Improve your ba...
 
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
Or else the work is fine only. Lot to learn buddy.... Improve your basics in ...
 
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
PYTHONOr else the work is fine only. Lot to learn buddy.... Improve your basi...
 
python codes
python codespython codes
python codes
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
 
Bash cheat sheet
Bash cheat sheetBash cheat sheet
Bash cheat sheet
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
 
Can you fix the errors- It isn't working when I try to run import s.pdf
Can you fix the errors- It isn't working when I try to run    import s.pdfCan you fix the errors- It isn't working when I try to run    import s.pdf
Can you fix the errors- It isn't working when I try to run import s.pdf
 
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
 
Commit2015 kharchenko - python generators - ext
Commit2015   kharchenko - python generators - extCommit2015   kharchenko - python generators - ext
Commit2015 kharchenko - python generators - ext
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJS
 
Terminal linux commands_ Fedora based
Terminal  linux commands_ Fedora basedTerminal  linux commands_ Fedora based
Terminal linux commands_ Fedora based
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Recently uploaded (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

Make Sure Your Applications Crash

  • 1. Make Sure Your Applications Crash Moshe Zadka
  • 3. Python doesn't crash Memory managed, no direct pointer arithmetic
  • 4. ...except it does C bugs, untrapped exception, infinite loops, blocking calls, thread dead-lock, inconsistent resident state
  • 5. Recovery is important "[S]ystem failure can usually be considered to be the result of two program errors[...] the second, in the recovery routine[...]"
  • 6. Crashes and inconsistent data A crash results in data from an arbitrary program state.
  • 7. Avoid storage Caches are better than master copies.
  • 9. Atomic operations File rename
  • 10. Example: Counting def update_counter(): fp = file("counter.txt") s = fp.read() counter = int(s.strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("counter.txt.tmp", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified # The following is an atomic operation os.rename("counter.txt.tmp", "counter.txt")
  • 11. Efficient caches, reliable masters Mark inconsistency of cache
  • 13. Availability If data is consistent, just restart!
  • 14. Improving availability Limit impact Fast detection Fast start-up
  • 15. Vertical splitting Different execution paths, different processes
  • 16. Horizontal splitting Different code bases, different processes
  • 17. Watchdog Monitor -> Flag -> Remediate
  • 18. Watchdog principles Keep it simple, keep it safe!
  • 19. Watchdog: Heartbeats ## In a Twisted process def beat(): file('beats/my-name', 'a').close() task.LoopingCall(beat).start(30)
  • 20. Watchdog: Get time-outs def getTimeout() timeout = dict() now = time.time() for heart in glob.glob('hearts/*'): beat = int(file(heart).read().strip()) timeout[heart] = now-beat return timeout
  • 21. Watchdog: Mark problems def markProblems(): timeout = getTimeout() for heart in glob.glob('beats/*'): mtime = os.path.getmtime(heart) problem = 'problems/'+heart if (mtime<timeout[heart] and not os.path.isfile(problem)): fp = file('problems/'+heart, 'w') fp.write('watchdog') fp.close()
  • 22. Watchdog: check solutions def checkSolutions(): now = time.time() problemTimeout = now-30 for problem in glob.glob('problems/*'): mtime = os.path.getmtime(problem) if mtime<problemTimeout: subprocess.call(['restart-system'])
  • 23. Watchdog: Loop ## Watchdog while True: markProblems() checkSolutions() time.sleep(1)
  • 24. Watchdog: accuracy of Custom checkers can manufacture problems
  • 25. Watchdog: reliability of Use cron for main loop
  • 26. Watchdog: reliability of Use software/hardware watchdogs
  • 29. Welcome to the back-up slides Extra! Extra!
  • 30. Example: Counting on Windows def update_counter(): fp = file("counter.txt") s = fp.read() counter = int(s.strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("counter.txt.tmp", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified os.remove("counter.txt") # At this point, the state is inconsistent* # The following is an atomic operation
  • 32. Example: Counting on Windows (Recovery) def recover(): if not os.path.exists("counter.txt"): # The permanent file has been removed # Therefore, the temp file is valid os.rename("counter.txt.tmp", "counter.txt")
  • 33. Example: Counting with versions def update_counter(): files = [int(name.split('.')[-1]) for name in os.listdir('.') if name.startswith('counter.')] last = max(files) counter = int(file('counter.%s' % last ).read().strip()) counter += 1 # If there is a crash before this point, # no changes have been done. fp = file("tmp.counter", 'w') print >>fp, counter fp.close() # If there is a crash before this point, # only a temp file has been modified
  • 34. os.rename('tmp.counter', 'counter.%s' % (last+1)) os.remove('counter.%s' % last)
  • 35. Example: Counting with versions (cleanup) # This is not a recovery routine, but a cleanup # routine. # Even in its absence, the state is consistent def cleanup(): files = [int(name.split('.')[-1]) for name in os.listdir('.') if name.startswith('counter.')] files.sort() files.pop() for n in files: os.remove('counter.%d' % n) if os.path.exists('tmp.counter'): os.remove('tmp.counter')
  • 36. Correct ordering def activate_due(): scheduled = rs.smembers('scheduled') now = time.time() for el in scheduled: due = int(rs.get(el+':due')) if now<due: continue rs.sadd('activated', el) rs.delete(el+':due') rs.sremove('scheduled', el)
  • 37. Correct ordering (recovery) def recover(): inconsistent = rs.sinter('activated', 'scheduled') for el in inconsistent: rs.delete(el+':due') #* rs.sremove('scheduled', el)
  • 38. Example: Key/value stores 0.log: ['add', 'key-0', 'value-0'] ['add', 'key-1', 'value-1'] ['add', 'key-0', 'value-2'] ['remove', 'key-1'] . . . 1.log: . . . 2.log:
  • 39. . . .
  • 40. Example: Key/value stores (utility functions) ## Get the level of a file def getLevel(s) return int(s.split('.')[0]) ## Get all files of a given type def getType(tp): return [(getLevel(s), s) for s in files if s.endswith(tp)]
  • 41. Example: Key/value stores (classifying files) ## Get all relevant files def relevant(d): files = os.listdir(d): mlevel, master = max(getType('.master')) logs = getType('.log') logs.sort() return master+[log for llevel, log in logs if llevel>mlevel]
  • 42. Example: Key/value stores (reading) ## Read in a single file def update(result, fp): for line in fp: val = json.loads(line) if val[0] == 'add': result[val[1]] = val[2] else: del result[val[1]] ## Read in several files def read(files): result = dict() for fname in files: try: update(result, file(fname))
  • 43. except ValueError: pass return result
  • 44. Example: Key/value stores (writer class) class Writer(object): def __init__(self, level): self.level = level self.fp = None self._next() def _next(self): self.level += 1 if self.fp: self.fp.close() name ='%3d.log' % self.currentLevel self.fp = file(name, 'w') self.rows = 0 def write(self, value):
  • 46. Example: Key/value stores (storage class) ## The actual data store abstraction. class Store(object): def __init__(self): files = relevant(d) self.result = read(files) level = getLevel(files[-1]) self.writer = Writer(level) def get(self, key): return self.result[key] def add(self, key, value): self.writer.write(['add', key, value]) def remove(self, key): self.writer.write(['remove', key])
  • 47. Example: Key/value stores (compression code) ## This should be run periodically # from a different thread def compress(d): files = relevant(d)[:-1] if len(files)<2: return result = read(files) master = getLevel(files[-1])+1 fp = file('%3d.master.tmp' % master, 'w') for key, value in result.iteritems(): towrite = ['add', key, value]) print >>fp, json.dumps(towrite) fp.close()
  • 48. Vertical splitting: Example def forking_server(): s = socket.socket() s.bind(('', 8080)) s.listen(5) while True: client = s.accept() newpid = os.fork() if newpid: f = client.makefile() f.write("Sunday, May 22, 1983 " "18:45:59-PST") f.close() os._exit()
  • 49. Horizontal splitting: front-end ## Process one class SchedulerResource(resource.Resource): isLeaf = True def __init__(self, filepath): resource.Resource.__init__(self) self.filepath = filepath def render_PUT(self, request): uuid, = request.postpath content = request.content.read() child = self.filepath.child(uuid) child.setContent(content) fp = filepath.FilePath("things") r = SchedulerResource(fp) s = server.Site(r) reactor.listenTCP(8080, s)
  • 50. Horizontal splitting: scheduler ## Process two rs = redis.Redis(host='localhost', port=6379, db=9) while True: for fname in os.listdir("things"): when = int(file(fname).read().strip()) rs.set(uuid+':due', when) rs.sadd('scheduled', uuid) os.remove(fname) time.sleep(1)
  • 51. Horizontal splitting: runner ## Process three rs = redis.Redis(host='localhost', port=6379, db=9) recover() while True: activate_due() time.sleep(1)
  • 52. Horizontal splitting: message queues No direct dependencies
  • 53. Horizontal splitting: message queues: sender ## Process four rs = redis.Redis(host='localhost', port=6379, db=9) params = pika.ConnectionParameters('localhost') conn = pika.BlockingConnection(params) channel = conn.channel() channel.queue_declare(queue='active') while True: activated = rs.smembers('activated') finished = set(rs.smembers('finished')) for el in activated: if el in finished: continue
  • 54. channel.basic_publish( exchange='', routing_key='active', body=el) rs.add('finished', el)
  • 55. Horizontal splitting: message queues: receiver ## Process five # It is possible to get "dups" of bodies. # Application logic should deal with that params = pika.ConnectionParameters('localhost') conn = pika.BlockingConnection(params) channel = conn.channel() channel.queue_declare(queue='active') def callback(ch, method, properties, el): syslog.syslog('Activated %s' % el) channel.basic_consume(callback, queue='hello', no_ack=True) channel.start_consuming()
  • 56. Horizontal splitting: point-to-point Use HTTP (preferably, REST)