SlideShare a Scribd company logo
1 of 75
Download to read offline
In Search of the Perfect
                    Global Interpreter Lock
                                                    David Beazley
                                               http://www.dabeaz.com
                                                       @dabeaz
                                                       October 15, 2011
                                                   Presented at RuPy 2011
                                                       Poznan, Poland

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                    1
Introduction

                • As many programmers know, Python and Ruby
                        feature a Global Interpreter Lock (GIL)
                • More precise: CPython and MRI
                • It limits thread performance on multicore
                • Theoretically restricts code to a single CPU

Copyright (C) 2010, David Beazley, http://www.dabeaz.com          2
An Experiment
                 • Consider a trivial CPU-bound function
                         def countdown(n):
                             while n > 0:
                                 n -= 1


                  • Run it once with a lot of work
                           COUNT = 100000000               # 100 million
                           countdown(COUNT)

                  • Now, divide the work across two threads
                          t1 = Thread(target=count,args=(COUNT//2,))
                          t2 = Thread(target=count,args=(COUNT//2,))
                          t1.start(); t2.start()
                          t1.join(); t2.join()



Copyright (C) 2010, David Beazley, http://www.dabeaz.com                   3
An Experiment
                  • Some Ruby
                         def countdown(n)
                             while n > 0
                                 n -= 1
                             end
                         end

                  • Sequential
                           COUNT = 100000000               # 100 million
                           countdown(COUNT)


                  • Subdivided across threads
                          t1 = Thread.new { countdown(COUNT/2) }
                          t2 = Thread.new { countdown(COUNT/2) }
                          t1.join
                          t2.join


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                   4
Expectations

                 • Sequential and threaded versions perform the
                        same amount of work (same # calculations)
                 • There is the GIL... so no parallelism
                 • Performance should be about the same


Copyright (C) 2010, David Beazley, http://www.dabeaz.com            5
Results
                • Ruby 1.9 on OS-X (4 cores)
                         Sequential                          : 2.46s
                         Threaded (2 threads)                : 2.55s (~ same)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                        6
Results
                • Ruby 1.9 on OS-X (4 cores)
                         Sequential                          : 2.46s
                         Threaded (2 threads)                : 2.55s (~ same)

                • Python 2.7
                         Sequential                          : 6.12s
                         Threaded (2 threads)                : 9.28s (1.5x slower!)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                              7
Results
                • Ruby 1.9 on OS-X (4 cores)
                         Sequential                          : 2.46s
                         Threaded (2 threads)                : 2.55s (~ same)

                • Python 2.7
                         Sequential                          : 6.12s
                         Threaded (2 threads)                : 9.28s (1.5x slower!)

               • Question: Why does it get slower in Python?

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                              8
Results
                • Ruby 1.9 on Windows Server 2008 (2 cores)
                         Sequential                          : 3.32s
                         Threaded (2 threads)                : 3.45s (~ same)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                        9
Results
                • Ruby 1.9 on Windows Server 2008 (2 cores)
                         Sequential                          : 3.32s
                         Threaded (2 threads)                : 3.45s (~ same)

                 • Python 2.7
                         Sequential                          : 6.9s
                         Threaded (2 threads)                : 63.0s (9.1x slower!)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                              10
Results
                • Ruby 1.9 on Windows Server 2008 (2 cores)
                         Sequential                          : 3.32s
                         Threaded (2 threads)                : 3.45s (~ same)

                 • Python 2.7
                         Sequential                          : 6.9s
                         Threaded (2 threads)                : 63.0s (9.1x slower!)

                 • Why does it get that much slower on Windows?

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                              11
Experiment: Messaging

        • A request/reply server for size-prefixed messages
                                  Client                   Server



          • Each message: a size header + payload
          • Similar: ZeroMQ
Copyright (C) 2010, David Beazley, http://www.dabeaz.com            12
An Experiment: Messaging
          • A simple test - message echo (pseudocode)
        def client(nummsg,msg):                            def server():
            while nummsg > 0:                                  while True:
               send(msg)                                           msg = recv()
               resp = recv()                                       send(msg)
               sleep(0.001)
               nummsg -= 1




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                          13
An Experiment: Messaging
          • A simple test - message echo (pseudocode)
        def client(nummsg,msg):                            def server():
            while nummsg > 0:                                  while True:
               send(msg)                                           msg = recv()
               resp = recv()                                       send(msg)
               sleep(0.001)
               nummsg -= 1


            • To be less evil, it's throttled (<1000 msg/sec)
            • Not a messaging stress test
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                          14
An Experiment: Messaging
          • A test: send/receive 1000 8K messages
          • Scenario 1: Unloaded server
                               Client                       Server


           • Scenario 2 : Server competing with one CPU-thread
                                                           CPU-Thread

                              Client                        Server


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                15
Results
               • Messaging with no threads (OS-X, 4 cores)
                         C                                   : 1.26s
                         Python 2.7                          : 1.29s
                         Ruby 1.9                            : 1.29s




Copyright (C) 2010, David Beazley, http://www.dabeaz.com               16
Results
               • Messaging with no threads (OS-X, 4 cores)
                         C                                   : 1.26s
                         Python 2.7                          : 1.29s
                         Ruby 1.9                            : 1.29s

               •       Messaging with one CPU-bound thread*
                        C                                    : 1.16s (~8% faster!?)
                        Python 2.7                           : 12.3s (10x slower)
                        Ruby 1.9                             : 42.0s (33x slower)

               • Hmmm. Curious.
                                                               *   On Ruby, the CPU-bound thread
                                                                   was also given lower priority
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                           17
Results
               • Messaging with no threads (Linux, 8 CPUs)
                         C                                   : 1.13s
                         Python 2.7                          : 1.18s
                         Ruby 1.9                            : 1.18s




Copyright (C) 2010, David Beazley, http://www.dabeaz.com               18
Results
               • Messaging with no threads (Linux, 8 CPUs)
                         C                                   : 1.13s
                         Python 2.7                          : 1.18s
                         Ruby 1.9                            : 1.18s

               • Messaging with one CPU-bound thread
                        C                                    : 1.11s (same)
                        Python 2.7                           : 1.60s (1.4x slower) - better
                        Ruby 1.9                             : 5839.4s (~5000x slower) - worse!




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                          19
Results
               • Messaging with no threads (Linux, 8 CPUs)
                         C                                    : 1.13s
                         Python 2.7                           : 1.18s
                         Ruby 1.9                             : 1.18s

               • Messaging with one CPU-bound thread
                        C                                     : 1.11s (same)
                        Python 2.7                            : 1.60s (1.4x slower) - better
                        Ruby 1.9                              : 5839.4s (~5000x slower) - worse!

               • 5000x slower?                             Really? Why?


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                           20
The Mystery Deepens
               • Disable all but one CPU core
               • CPU-bound threads (OS-X)
                        Python 2.7 (4 cores+hyperthreading) : 9.28s
                        Python 2.7 (1 core)                 : 7.9s (faster!)

               • Messaging with one CPU-bound thread
                        Ruby 1.9 (4 cores+hyperthreading)   : 42.0s
                        Ruby 1.9 (1 core)                   : 10.5s (much faster!)

               • ?!?!?!?!?!?
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                             21
Better is Worse
               • Change software versions
               • Let's upgrade to Python 3 (Linux)
                       Python 2.7 (Messaging)              : 12.3s
                       Python 3.2 (Messaging)              : 20.1s   (1.6x slower)

               • Let's downgrade to Ruby 1.8 (Linux)
                       Ruby 1.9 (Messaging)                : 42.0
                       Ruby 1.8.7 (Messaging)              : 10.0s   (4x faster)

               • So much for progress (sigh)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                             22
What's Happening?

                • The GIL does far more than limit cores
                • It can make performance much worse
                • Better performance by turning off cores?
                • 5000x performance hit on Linux?
                • Why?

Copyright (C) 2010, David Beazley, http://www.dabeaz.com     23
Why You Might Care
           • Must you abandon Python/Ruby for concurrency?
           • Having threads restricted to one CPU core might
                  be okay if it were sane
           • Analogy: A multitasking operating system
                  (e.g., Linux) runs fine on a single CPU
           • Plus, threads get used a lot behind the scenes
                  (even in thread alternatives, e.g., async)


Copyright (C) 2010, David Beazley, http://www.dabeaz.com       24
Why I Care


             • It's an interesting little systems problem
             • How do you make a better GIL?
             • It's fun.


Copyright (C) 2010, David Beazley, http://www.dabeaz.com         25
Some Background
                   • I have been discussing some of these issues
                           in the Python community since 2009

                                         http://www.dabeaz.com/GIL

                    • I'm less familiar with Ruby, but I've looked at
                           its GIL implementation and experimented
                    • Very interested in commonalities/differences

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                26
A Tale of Two GILs




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                        27
Thread Implementation

            • System threads                               • System threads
                   (e.g., pthreads)                          (e.g., pthreads)

            • Managed by OS                                • Managed by OS
            • Concurrent                                   • Concurrent
                   execution of the                          execution of the
                   Python interpreter                        Ruby VM
                   (written in C)                            (written in C)

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                        28
Alas, the GIL

                • Parallel execution is forbidden
                • There is a "global interpreter lock"
                • The GIL ensures that only one thread runs in
                        the interpreter at once
                • Simplifies many low-level details (memory
                        management, callouts to C extensions, etc.)



Copyright (C) 2010, David Beazley, http://www.dabeaz.com              29
GIL Implementation
           int gil_locked = 0;                             mutex_t gil;
           mutex_t gil_mutex;
           cond_t gil_cond;                                void gil_acquire() {
                                                               mutex_lock(gil);
           void gil_acquire() {                            }
               mutex_lock(gil_mutex);                      void gil_release() {
               while (gil_locked)                              mutex_unlock(gil);
                  cond_wait(gil_cond);                     }
               gil_locked = 1;
               mutex_unlock(gil_mutex);
           }                                               Simple mutex lock
           void gil_release() {
               mutex_lock(gil_mutex);
               gil_locked = 0;
               cond_notify();
               mutex_unlock(gil_mutex);
                                                           Condition variable
           }
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                            30
Thread Execution Model
                • The GIL results in cooperative multitasking
                                                 block           block block block block
               Thread 1
                                         run                               run

               Thread 2                                    run
                                                                     run         run
               Thread 3
                                     release acquire         release acquire
                                       GIL     GIL             GIL     GIL

                • When a thread is running, it holds the GIL
                • GIL released on blocking (e.g., I/O operations)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                   31
Threads for I/O

                • For I/O it works great
                • GIL is never held very long
                • Most threads just sit around sleeping
                • Life is good


Copyright (C) 2010, David Beazley, http://www.dabeaz.com   32
Threads for Computation
              • You may actually want to compute something!
                 • Fibonacci numbers
                 • Image/audio processing
                 • Parsing
              • The CPU will be busy
              • And it won't give up the GIL on its own
Copyright (C) 2010, David Beazley, http://www.dabeaz.com      33
CPU-Bound Switching

      • Releases and                                       • Background thread
             reacquires the GIL                              generates a timer
             every 100 "ticks"                               interrupt every 10ms

      • 1 Tick ~= 1 interpreter • GIL released and
             instruction                                     reacquired by current
                                                             thread on interrupt



Copyright (C) 2010, David Beazley, http://www.dabeaz.com                             34
Python Thread Switching
                                                   Run 100              Run 100                Run 100
                                                    ticks                ticks                  ticks
                   CPU Bound
                     Thread                                      e e                    e e                e e
                                                               as uir
                                                             le q                     as uir
                                                                                    le q                 as uir
                                                                                                       le q
                                                           re ac                  re ac              re ac




                 • Every 100 VM instructions, GIL is dropped,
                        allowing other threads to run if they want
                 • Not time based--switching interval depends on
                        kind of instructions executed


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                                          35
Ruby Thread Switching
                      Timer                                  Timer (10ms)          Timer (10ms)
                      Thread

                 CPU Bound                                 Run                    Run
                   Thread                                                  e e
                                                                         as uir
                                                                                                e e
                                                                                              as uir
                                                                       le q                 le q
                                                                     re ac                re ac



                 • Loosely mimics the time-slice of the OS
                 • Every 10ms, GIL is released/acquired

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                               36
A Common Theme
              • Both Python and Ruby have C code like this:
                           void execute() {
                             while (inst = next_instruction()) {
                                 // Run the VM instruction
                                 ...
                                 if (must_release_gil) {
                                     GIL_release();
                                     /* Other threads may run now */
                                     GIL_acquire();
                                 }
                             }
                           }

              • Exact details vary, but concept is the same
              • Each thread has periodic release/acquire in the
                      VM to allow other threads to run
Copyright (C) 2010, David Beazley, http://www.dabeaz.com               37
Question
                    • What can go wrong with this bit of code?
                                    if (must_release_gil) {
                                        GIL_release();
                                        /* Other threads may run now */
                                        GIL_acquire();
                                    }


                    • Short answer: Everything!


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                  38
Pathology




Copyright (C) 2010, David Beazley, http://www.dabeaz.com               39
Thread Switching
                • Suppose you have two threads
                                                      Running
                         Thread 1




                         Thread 2                      READY




                • Thread 1 : Running
                • Thread 2 : Ready (Waiting for GIL)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com        40
Thread Switching
                • Easy case : Thread 1 performs I/O (read/write)
                                                                     I/O
                                                      Running
                          Thread 1                                                BLOCKED
                                                             release
                                                               GIL
                                                                pthreads/OS

                                                                schedule
                                                                              Running
                         Thread 2                          READY

                                                                    acquire GIL


                • Thread 1 : Releases GIL and blocks for I/O
                • Thread 2 : Gets scheduled, starts running
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                    41
Thread Switching
                • Tricky case : Thread 1 runs until preempted                  pt
                                                                           m
                                                                         ee
                                                     Running           pr
                         Thread 1                                          ???
                                                             release
                                                               GIL
                                                               pthreads/OS          Which thread runs?

                         Thread 2                          READY           ???




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                                 42
Thread Switching
                • You might expect that Thread 2 will run                      pt
                                                                           m
                                                                         ee
                                                                       pr
                                                     Running
                         Thread 1                                                   READY
                                                             release
                                                               GIL
                                                               pthreads/OS
                                                                         acquire
                                                              schedule
                                                                           GIL
                                                                                     Running
                         Thread 2                          READY




                • But you assume the GIL plays nice...
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                       43
Thread Switching
               • What might actually happen on multicore                       pt
                                                                           m
                                                                         ee
                                                                       pr
                                                     Running                    Running
                         Thread 1
                                                             release     acquire
                                                               GIL         GIL
                                                               pthreads/OS

                                                             schedule           fails (GIL locked)

                         Thread 2                          READY                      READY



                 • Both threads attempt to run simultaneously
                 • ... but only one will succeed (depends on timing)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                             44
Fallacy
                    • This code doesn't actually switch threads
                                    if (must_release_gil) {
                                        GIL_release();
                                        /* Other threads may run now */
                                        GIL_acquire();
                                    }


                    • It might switch threads, but it depends
                       • What operating system
                       • # cores
                       • Lock scheduling policy (if any)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                  45
Fallacy
                    • This doesn't force switching (sleeping)
                                    if (must_release_gil) {
                                        GIL_release();
                                        sleep(0);
                                        /* Other threads may run now */
                                        GIL_acquire();
                                    }

                    • It might switch threads, but it depends
                       • What operating system
                       • # cores
                       • Lock scheduling policy (if any)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                  46
Fallacy
                    • Neither does this (calling the scheduler)
                                    if (must_release_gil) {
                                        GIL_release();
                                        sched_yield()
                                        /* Other threads may run now */
                                        GIL_acquire();
                                    }

                    • It might switch threads, but it depends
                       • What operating system
                       • # cores
                       • Lock scheduling policy (if any)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                  47
A Conflict
                 • There are conflicting goals
                    • Python/Ruby - wants to run on a single
                                    CPU, but doesn't want to do thread
                                    scheduling (i.e., let the OS do it).
                             • OS - "Oooh. Multiple cores."
                                    Schedules as many runnable tasks as
                                    possible at any instant
                 • Result: Threads fight with each other
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                   48
Multicore GIL Battle
            • Python 2.7 on OS-X                                          (4 cores)
                         Sequential                                       : 6.12s
                         Threaded (2 threads)                             : 9.28s (1.5x slower!)

                                                                pt                      pt         pt
                                                                em                 em            em
                                                           p re               p re            pr
                                                                                                e
                                       100 ticks              100 ticks
                Thread 1                                                                ...              READY

                                              release acquire release acquire
                                                           pthreads/OS                          Eventually...
                                             schedule         fail schedule     fail
                                                                                                run
               Thread 2                  READY                  READY              READY



            • Millions of failed GIL acquisitions
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                                         49
Multicore GIL Battle
            • You can see it! (2 CPU-bound threads)
                                                           Why >100%?




            • Comment: In Python, it's very rapid
            • GIL is released every few microseconds!

Copyright (C) 2010, David Beazley, http://www.dabeaz.com           50
I/O Handling
                • If there is a CPU-bound thread, I/O bound
                       threads have a hard time getting the GIL
             Thread 1 (CPU 1)                              Thread 2 (CPU 2)
                                run                                sleep
             preempt                                                        Network Packet
                                run                                  Acquire GIL (fails)
             preempt
                                run                                  Acquire GIL (fails)
                                                                                                Might repeat
             preempt
                                                                                             100s-1000s of times
                                run                                  Acquire GIL (fails)
             preempt
                                                                     Acquire GIL (success)
                                                                    run


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                                      51
Messaging Pathology
               • Messaging on Linux (8 Cores)
                                 Ruby 1.9 (no threads)   : 1.18s
                                 Ruby 1.9 (1 CPU thread) : 5839.4s

               • Locks in Linux have no fairness
               • Consequence: Really hard to steal the GIL
               • And Ruby only retries every 10ms

Copyright (C) 2010, David Beazley, http://www.dabeaz.com             52
Let's Talk Fairness
             • Fair-locking means that locks have some notion
                    of priorities, arrival order, queuing, etc.
                               running                             waiting
                                 t0                Lock    t1    t2 t3 t4    t5


                                                           release

                                running                            waiting
                                  t1                Lock   t2    t3 t4 t5    t0


               • Releasing means you go to end of line
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                          53
Effect of Fair-Locking
             • Ruby 1.9 (multiple cores)
                        Messages + 1 CPU Thread (OS-X)     : 42.0s
                        Messages + 1 CPU Thread (Linux)    : 5839.4s

             • Question: Which one uses fair locking?




Copyright (C) 2010, David Beazley, http://www.dabeaz.com               54
Effect of Fair-Locking
             • Ruby 1.9 (multiple cores)
                        Messages + 1 CPU Thread (OS-X)     : 42.0s (Fair)
                        Messages + 1 CPU Thread (Linux)    : 5839.4s

             • Benefit : I/O threads get their turn (yay!)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com                    55
Effect of Fair-Locking
             • Ruby 1.9 (multiple cores)
                        Messages + 1 CPU Thread (OS-X)     : 42.0s (Fair)
                        Messages + 1 CPU Thread (Linux)    : 5839.4s

             • Benefit : I/O threads get their turn (yay!)
             • Python 2.7 (multiple cores)
                        2 CPU-Bound Threads (OS-X)          : 9.28s
                        2 CPU-Bound Threads (Windows)       : 63.0s

             • Question: Which one uses fair-locking?
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                    56
Effect of Fair-Locking
             • Ruby 1.9 (multiple cores)
                        Messages + 1 CPU Thread (OS-X)     : 42.0s (Fair)
                        Messages + 1 CPU Thread (Linux)    : 5839.4s

             • Benefit : I/O threads get their turn (yay!)
             • Python 2.7 (multiple cores)
                        2 CPU-Bound Threads (OS-X)          : 9.28s
                        2 CPU-Bound Threads (Windows)       : 63.0s (Fair)

             • Problem: Too much context switching
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                     57
Fair-Locking - Bah!
                 • In reality, you don't want fairness
                 • Messaging Revisited (OS X, 4 Cores)
                               Ruby 1.9 (No Threads)           : 1.29s
                               Ruby 1.9 (1 CPU-Bound thread)   : 42.0s (33x slower)

                 • Why is it still 33x slower?
                 • Answer: Fair locking! (and convoying)

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                              58
Messaging Revisited
          • Go back to the messaging server
                                               def server():
                                                   while True:
                                                       msg = recv()
                                                       send(msg)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com              59
Messaging Revisited
          • The actual implementation (size-prefixed messages)
                                           def server():
                                               while True:
                                                   size = recv(4)
                                                   msg = recv(size)
                                                   send(size)
                                                   send(msg)




Copyright (C) 2010, David Beazley, http://www.dabeaz.com              60
Performance Explained
          • What actually happens under the covers
                                           def server():
                                               while True:
             GIL release                           size = recv(4)
             GIL release
                                                   msg = recv(size)
             GIL release
                                                   send(size)
             GIL release
                                                   send(msg)



          • Why? Each operation might block
          • Catch: Passes control back to CPU-bound thread
Copyright (C) 2010, David Beazley, http://www.dabeaz.com              61
Performance Illustrated
           Timer                                     10ms         10ms    10ms    10ms     10ms
           Thread
                                             run
      CPU Bound
        Thread
                                                            run     run     run     run      run
             I/O                                           recv    recv    send     send     done
            Thread
                                             Data
                                            Arrives



           • Each message has 40ms response cycle
           • 1000 messages x 40ms = 40s (42.0s measured)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                            62
Despair




Copyright (C) 2010, David Beazley, http://www.dabeaz.com             63
A Solution?
                                                       Don't use threads!


                • Yes, yes, everyone hates threads
                • However, that's only because they're useful!
                • Threads are used for all sorts of things
                • Even if they're hidden behind the scenes

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                    64
A Better Solution
                                                    Make the GIL better



                  • It's probably not going away (very difficult)
                  • However, does it have to thrash wildly?
                  • Question: Can you do anything?


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                  65
GIL Efforts in Python 3


                 • Python 3.2 has a new GIL implementation
                 • It's imperfect--in fact, it has a lot of problems
                 • However, people are experimenting with it


Copyright (C) 2010, David Beazley, http://www.dabeaz.com               66
Python 3 GIL
                • GIL acquisition now based on timeouts
                                                       running
                              Thread 1
                                                                           drop_request      release

                                                                              5ms
                                                                                                 running
                              Thread 2                IOWAIT             READY
                                                                     wait(gil, TIMEOUT)   wait(gil, TIMEOUT)

                                                            data
                                                           arrives


                 • Involves waiting on a condition variable
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                                       67
Problem: Convoying
               • CPU-bound threads significantly degrade I/O
                                                   running                             running           running
                          Thread 1
                                                                      release
                                                                      5ms                  5ms                5ms
                                                                                 run               run
                          Thread 2                                   READY                READY             READY


                                                            data                 data              data
                                                           arrives              arrives           arrives



                 • This is the same problem as in Ruby
                 • Just a shorter time delay (5ms)
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                                            68
Problem: Convoying

            • You can directly observe the delays (messaging)
                         Python/Ruby (No threads)          : 1.29s   (no delays)
                         Python 3.2 (1 Thread)             : 20.1s   (5ms delays)
                         Ruby 1.9 (1 Thread)               : 42.0s   (10ms delays)

           • Still not great, but problem is understood


Copyright (C) 2010, David Beazley, http://www.dabeaz.com                             69
Promise




Copyright (C) 2010, David Beazley, http://www.dabeaz.com             70
Priorities
               • Best promise : Priority scheduling
               • Earlier versions of Ruby had it
               • It works (OS-X, 4 cores)
                            Ruby 1.9 (1 Thread)                         : 42.0s
                            Ruby 1.8.7 (1 Thread)                       : 40.2s
                            Ruby 1.8.7 (1 Thread, lower priority)       : 10.0s

                • Comment: Ruby-1.9 allows thread priorities to be
                        set in pthreads, but it doesn't seem to have much
                        (if any) effect
Copyright (C) 2010, David Beazley, http://www.dabeaz.com                          71
Priorities
             • Experimental Python-3.2 with priority scheduler
             • Also features immediate preemption
             • Messages (OS X, 4 Cores)
                         Python 3.2 (No threads)                        : 1.29s
                         Python 3.2 (1 Thread)                          : 20.2s
                         Python 3.2+priorities (1 Thread)               : 1.21s (faster?)

             • That's a lot more promising!

Copyright (C) 2010, David Beazley, http://www.dabeaz.com                                    72
New Problems
               • Priorities bring new challenges
                  • Starvation
                  • Priority inversion
                  • Implementation complexity
               • Do you have to write a full OS scheduler?
               • Hopefully not, but it's an open question
Copyright (C) 2010, David Beazley, http://www.dabeaz.com     73
Final Words

              • Implementing a GIL is a lot trickier than it looks
              • Even work with priorities has problems
              • Good example of how multicore is diabolical


Copyright (C) 2010, David Beazley, http://www.dabeaz.com             74
Thanks for Listening!

              • I hope you learned at least one new thing
              • I'm always interested in feedback
              • Follow me on Twitter (@dabeaz)


Copyright (C) 2010, David Beazley, http://www.dabeaz.com    75

More Related Content

What's hot

Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...David Beazley (Dabeaz LLC)
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++David Beazley (Dabeaz LLC)
 
Using SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonUsing SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonDavid Beazley (Dabeaz LLC)
 
Using Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard IIUsing Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard IIDavid Beazley (Dabeaz LLC)
 
The Common Debian Build System (CDBS)
The Common Debian Build System (CDBS)The Common Debian Build System (CDBS)
The Common Debian Build System (CDBS)Peter Eisentraut
 
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
rake puppetexpert:create - Puppet Camp Silicon Valley 2014rake puppetexpert:create - Puppet Camp Silicon Valley 2014
rake puppetexpert:create - Puppet Camp Silicon Valley 2014nvpuppet
 
The Lives of Others: Open-Source Development Practices Elsewhere
The Lives of Others: Open-Source Development Practices ElsewhereThe Lives of Others: Open-Source Development Practices Elsewhere
The Lives of Others: Open-Source Development Practices ElsewherePeter Eisentraut
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Cosimo Streppone
 
Large Files without the Trials
Large Files without the TrialsLarge Files without the Trials
Large Files without the TrialsJazkarta, Inc.
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking systemJesse Vincent
 
Docker Online Meetup #3: Docker in Production
Docker Online Meetup #3: Docker in ProductionDocker Online Meetup #3: Docker in Production
Docker Online Meetup #3: Docker in ProductionDocker, Inc.
 
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetPuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetWalter Heck
 
Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Charles Nutter
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingGuy K. Kloss
 
JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011Charles Nutter
 
Fast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaFast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaCharles Nutter
 

What's hot (20)

Generator Tricks for Systems Programmers
Generator Tricks for Systems ProgrammersGenerator Tricks for Systems Programmers
Generator Tricks for Systems Programmers
 
Python in Action (Part 1)
Python in Action (Part 1)Python in Action (Part 1)
Python in Action (Part 1)
 
Perl-C/C++ Integration with Swig
Perl-C/C++ Integration with SwigPerl-C/C++ Integration with Swig
Perl-C/C++ Integration with Swig
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
 
Using SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with PythonUsing SWIG to Control, Prototype, and Debug C Programs with Python
Using SWIG to Control, Prototype, and Debug C Programs with Python
 
Using Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard IIUsing Python3 to Build a Cloud Computing Service for my Superboard II
Using Python3 to Build a Cloud Computing Service for my Superboard II
 
The Common Debian Build System (CDBS)
The Common Debian Build System (CDBS)The Common Debian Build System (CDBS)
The Common Debian Build System (CDBS)
 
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
rake puppetexpert:create - Puppet Camp Silicon Valley 2014rake puppetexpert:create - Puppet Camp Silicon Valley 2014
rake puppetexpert:create - Puppet Camp Silicon Valley 2014
 
The Lives of Others: Open-Source Development Practices Elsewhere
The Lives of Others: Open-Source Development Practices ElsewhereThe Lives of Others: Open-Source Development Practices Elsewhere
The Lives of Others: Open-Source Development Practices Elsewhere
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013
 
Large Files without the Trials
Large Files without the TrialsLarge Files without the Trials
Large Files without the Trials
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking system
 
Docker Online Meetup #3: Docker in Production
Docker Online Meetup #3: Docker in ProductionDocker Online Meetup #3: Docker in Production
Docker Online Meetup #3: Docker in Production
 
PuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of PuppetPuppetCamp SEA 1 - Use of Puppet
PuppetCamp SEA 1 - Use of Puppet
 
Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013Beyond JVM - YOW! Sydney 2013
Beyond JVM - YOW! Sydney 2013
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
 
JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011JVM for Dummies - OSCON 2011
JVM for Dummies - OSCON 2011
 
Fast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible JavaFast as C: How to Write Really Terrible Java
Fast as C: How to Write Really Terrible Java
 

Viewers also liked

An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...David Beazley (Dabeaz LLC)
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyDavid Beazley (Dabeaz LLC)
 
A study of the social turn in Interpreting Studies_Ruth Pike
A study of the social turn in Interpreting Studies_Ruth PikeA study of the social turn in Interpreting Studies_Ruth Pike
A study of the social turn in Interpreting Studies_Ruth PikeRuth Pike
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail SelectionKamil Sindi
 
Pycon11: Python threads: Dive into GIL!
Pycon11: Python threads: Dive into GIL!Pycon11: Python threads: Dive into GIL!
Pycon11: Python threads: Dive into GIL!Chetan Giridhar
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsDavid Beazley (Dabeaz LLC)
 
Intranet Governance
Intranet GovernanceIntranet Governance
Intranet GovernancePebbleRoad
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...Spark Summit
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreMark Wong
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Spark Summit
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanGalder Zamarreño
 

Viewers also liked (16)

An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
 
Writing Parsers and Compilers with PLY
Writing Parsers and Compilers with PLYWriting Parsers and Compilers with PLY
Writing Parsers and Compilers with PLY
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
 
Interfacing C/C++ and Python with SWIG
Interfacing C/C++ and Python with SWIGInterfacing C/C++ and Python with SWIG
Interfacing C/C++ and Python with SWIG
 
A study of the social turn in Interpreting Studies_Ruth Pike
A study of the social turn in Interpreting Studies_Ruth PikeA study of the social turn in Interpreting Studies_Ruth Pike
A study of the social turn in Interpreting Studies_Ruth Pike
 
Intelligent Thumbnail Selection
Intelligent Thumbnail SelectionIntelligent Thumbnail Selection
Intelligent Thumbnail Selection
 
Pycon11: Python threads: Dive into GIL!
Pycon11: Python threads: Dive into GIL!Pycon11: Python threads: Dive into GIL!
Pycon11: Python threads: Dive into GIL!
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
 
Intranet Governance
Intranet GovernanceIntranet Governance
Intranet Governance
 
How To Govern An Intranet
How To Govern An IntranetHow To Govern An Intranet
How To Govern An Intranet
 
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1Open Stack Cheat Sheet V1
Open Stack Cheat Sheet V1
 
Linux Filesystems, RAID, and more
Linux Filesystems, RAID, and moreLinux Filesystems, RAID, and more
Linux Filesystems, RAID, and more
 
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
Lessons Learned with Spark at the US Patent & Trademark Office-(Christopher B...
 
The Hot Rod Protocol in Infinispan
The Hot Rod Protocol in InfinispanThe Hot Rod Protocol in Infinispan
The Hot Rod Protocol in Infinispan
 

Similar to In Search of the Perfect Global Interpreter Lock

Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data eraBill GU
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Felix Geisendörfer
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Scaling CouchDB with BigCouch
Scaling CouchDB with BigCouchScaling CouchDB with BigCouch
Scaling CouchDB with BigCouchCloudant
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...Ambassador Labs
 
Better DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseBetter DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseAndrew Eisenberg
 
GR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL SupportGR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL SupportGR8Conf
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container imagesAkihiro Suda
 
Docker: do's and don'ts
Docker: do's and don'tsDocker: do's and don'ts
Docker: do's and don'tsPaolo Tonin
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performanceDaum DNA
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 

Similar to In Search of the Perfect Global Interpreter Lock (20)

Guild Prototype
Guild PrototypeGuild Prototype
Guild Prototype
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
Bigdata roundtable-storm
Bigdata roundtable-stormBigdata roundtable-storm
Bigdata roundtable-storm
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Scaling CouchDB with BigCouch
Scaling CouchDB with BigCouchScaling CouchDB with BigCouch
Scaling CouchDB with BigCouch
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
[KubeCon NA 2018] Telepresence Deep Dive Session - Rafael Schloming & Luke Sh...
 
Better DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-EclipseBetter DSL Support for Groovy-Eclipse
Better DSL Support for Groovy-Eclipse
 
GR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL SupportGR8Conf 2011: STS DSL Support
GR8Conf 2011: STS DSL Support
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images
 
Dcc
DccDcc
Dcc
 
Boycott Docker
Boycott DockerBoycott Docker
Boycott Docker
 
Docker: do's and don'ts
Docker: do's and don'tsDocker: do's and don'ts
Docker: do's and don'ts
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

In Search of the Perfect Global Interpreter Lock

  • 1. In Search of the Perfect Global Interpreter Lock David Beazley http://www.dabeaz.com @dabeaz October 15, 2011 Presented at RuPy 2011 Poznan, Poland Copyright (C) 2010, David Beazley, http://www.dabeaz.com 1
  • 2. Introduction • As many programmers know, Python and Ruby feature a Global Interpreter Lock (GIL) • More precise: CPython and MRI • It limits thread performance on multicore • Theoretically restricts code to a single CPU Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2
  • 3. An Experiment • Consider a trivial CPU-bound function def countdown(n): while n > 0: n -= 1 • Run it once with a lot of work COUNT = 100000000 # 100 million countdown(COUNT) • Now, divide the work across two threads t1 = Thread(target=count,args=(COUNT//2,)) t2 = Thread(target=count,args=(COUNT//2,)) t1.start(); t2.start() t1.join(); t2.join() Copyright (C) 2010, David Beazley, http://www.dabeaz.com 3
  • 4. An Experiment • Some Ruby def countdown(n) while n > 0 n -= 1 end end • Sequential COUNT = 100000000 # 100 million countdown(COUNT) • Subdivided across threads t1 = Thread.new { countdown(COUNT/2) } t2 = Thread.new { countdown(COUNT/2) } t1.join t2.join Copyright (C) 2010, David Beazley, http://www.dabeaz.com 4
  • 5. Expectations • Sequential and threaded versions perform the same amount of work (same # calculations) • There is the GIL... so no parallelism • Performance should be about the same Copyright (C) 2010, David Beazley, http://www.dabeaz.com 5
  • 6. Results • Ruby 1.9 on OS-X (4 cores) Sequential : 2.46s Threaded (2 threads) : 2.55s (~ same) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 6
  • 7. Results • Ruby 1.9 on OS-X (4 cores) Sequential : 2.46s Threaded (2 threads) : 2.55s (~ same) • Python 2.7 Sequential : 6.12s Threaded (2 threads) : 9.28s (1.5x slower!) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 7
  • 8. Results • Ruby 1.9 on OS-X (4 cores) Sequential : 2.46s Threaded (2 threads) : 2.55s (~ same) • Python 2.7 Sequential : 6.12s Threaded (2 threads) : 9.28s (1.5x slower!) • Question: Why does it get slower in Python? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 8
  • 9. Results • Ruby 1.9 on Windows Server 2008 (2 cores) Sequential : 3.32s Threaded (2 threads) : 3.45s (~ same) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 9
  • 10. Results • Ruby 1.9 on Windows Server 2008 (2 cores) Sequential : 3.32s Threaded (2 threads) : 3.45s (~ same) • Python 2.7 Sequential : 6.9s Threaded (2 threads) : 63.0s (9.1x slower!) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 10
  • 11. Results • Ruby 1.9 on Windows Server 2008 (2 cores) Sequential : 3.32s Threaded (2 threads) : 3.45s (~ same) • Python 2.7 Sequential : 6.9s Threaded (2 threads) : 63.0s (9.1x slower!) • Why does it get that much slower on Windows? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 11
  • 12. Experiment: Messaging • A request/reply server for size-prefixed messages Client Server • Each message: a size header + payload • Similar: ZeroMQ Copyright (C) 2010, David Beazley, http://www.dabeaz.com 12
  • 13. An Experiment: Messaging • A simple test - message echo (pseudocode) def client(nummsg,msg): def server(): while nummsg > 0: while True: send(msg) msg = recv() resp = recv() send(msg) sleep(0.001) nummsg -= 1 Copyright (C) 2010, David Beazley, http://www.dabeaz.com 13
  • 14. An Experiment: Messaging • A simple test - message echo (pseudocode) def client(nummsg,msg): def server(): while nummsg > 0: while True: send(msg) msg = recv() resp = recv() send(msg) sleep(0.001) nummsg -= 1 • To be less evil, it's throttled (<1000 msg/sec) • Not a messaging stress test Copyright (C) 2010, David Beazley, http://www.dabeaz.com 14
  • 15. An Experiment: Messaging • A test: send/receive 1000 8K messages • Scenario 1: Unloaded server Client Server • Scenario 2 : Server competing with one CPU-thread CPU-Thread Client Server Copyright (C) 2010, David Beazley, http://www.dabeaz.com 15
  • 16. Results • Messaging with no threads (OS-X, 4 cores) C : 1.26s Python 2.7 : 1.29s Ruby 1.9 : 1.29s Copyright (C) 2010, David Beazley, http://www.dabeaz.com 16
  • 17. Results • Messaging with no threads (OS-X, 4 cores) C : 1.26s Python 2.7 : 1.29s Ruby 1.9 : 1.29s • Messaging with one CPU-bound thread* C : 1.16s (~8% faster!?) Python 2.7 : 12.3s (10x slower) Ruby 1.9 : 42.0s (33x slower) • Hmmm. Curious. * On Ruby, the CPU-bound thread was also given lower priority Copyright (C) 2010, David Beazley, http://www.dabeaz.com 17
  • 18. Results • Messaging with no threads (Linux, 8 CPUs) C : 1.13s Python 2.7 : 1.18s Ruby 1.9 : 1.18s Copyright (C) 2010, David Beazley, http://www.dabeaz.com 18
  • 19. Results • Messaging with no threads (Linux, 8 CPUs) C : 1.13s Python 2.7 : 1.18s Ruby 1.9 : 1.18s • Messaging with one CPU-bound thread C : 1.11s (same) Python 2.7 : 1.60s (1.4x slower) - better Ruby 1.9 : 5839.4s (~5000x slower) - worse! Copyright (C) 2010, David Beazley, http://www.dabeaz.com 19
  • 20. Results • Messaging with no threads (Linux, 8 CPUs) C : 1.13s Python 2.7 : 1.18s Ruby 1.9 : 1.18s • Messaging with one CPU-bound thread C : 1.11s (same) Python 2.7 : 1.60s (1.4x slower) - better Ruby 1.9 : 5839.4s (~5000x slower) - worse! • 5000x slower? Really? Why? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 20
  • 21. The Mystery Deepens • Disable all but one CPU core • CPU-bound threads (OS-X) Python 2.7 (4 cores+hyperthreading) : 9.28s Python 2.7 (1 core) : 7.9s (faster!) • Messaging with one CPU-bound thread Ruby 1.9 (4 cores+hyperthreading) : 42.0s Ruby 1.9 (1 core) : 10.5s (much faster!) • ?!?!?!?!?!? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 21
  • 22. Better is Worse • Change software versions • Let's upgrade to Python 3 (Linux) Python 2.7 (Messaging) : 12.3s Python 3.2 (Messaging) : 20.1s (1.6x slower) • Let's downgrade to Ruby 1.8 (Linux) Ruby 1.9 (Messaging) : 42.0 Ruby 1.8.7 (Messaging) : 10.0s (4x faster) • So much for progress (sigh) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 22
  • 23. What's Happening? • The GIL does far more than limit cores • It can make performance much worse • Better performance by turning off cores? • 5000x performance hit on Linux? • Why? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 23
  • 24. Why You Might Care • Must you abandon Python/Ruby for concurrency? • Having threads restricted to one CPU core might be okay if it were sane • Analogy: A multitasking operating system (e.g., Linux) runs fine on a single CPU • Plus, threads get used a lot behind the scenes (even in thread alternatives, e.g., async) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 24
  • 25. Why I Care • It's an interesting little systems problem • How do you make a better GIL? • It's fun. Copyright (C) 2010, David Beazley, http://www.dabeaz.com 25
  • 26. Some Background • I have been discussing some of these issues in the Python community since 2009 http://www.dabeaz.com/GIL • I'm less familiar with Ruby, but I've looked at its GIL implementation and experimented • Very interested in commonalities/differences Copyright (C) 2010, David Beazley, http://www.dabeaz.com 26
  • 27. A Tale of Two GILs Copyright (C) 2010, David Beazley, http://www.dabeaz.com 27
  • 28. Thread Implementation • System threads • System threads (e.g., pthreads) (e.g., pthreads) • Managed by OS • Managed by OS • Concurrent • Concurrent execution of the execution of the Python interpreter Ruby VM (written in C) (written in C) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 28
  • 29. Alas, the GIL • Parallel execution is forbidden • There is a "global interpreter lock" • The GIL ensures that only one thread runs in the interpreter at once • Simplifies many low-level details (memory management, callouts to C extensions, etc.) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 29
  • 30. GIL Implementation int gil_locked = 0; mutex_t gil; mutex_t gil_mutex; cond_t gil_cond; void gil_acquire() { mutex_lock(gil); void gil_acquire() { } mutex_lock(gil_mutex); void gil_release() { while (gil_locked) mutex_unlock(gil); cond_wait(gil_cond); } gil_locked = 1; mutex_unlock(gil_mutex); } Simple mutex lock void gil_release() { mutex_lock(gil_mutex); gil_locked = 0; cond_notify(); mutex_unlock(gil_mutex); Condition variable } Copyright (C) 2010, David Beazley, http://www.dabeaz.com 30
  • 31. Thread Execution Model • The GIL results in cooperative multitasking block block block block block Thread 1 run run Thread 2 run run run Thread 3 release acquire release acquire GIL GIL GIL GIL • When a thread is running, it holds the GIL • GIL released on blocking (e.g., I/O operations) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 31
  • 32. Threads for I/O • For I/O it works great • GIL is never held very long • Most threads just sit around sleeping • Life is good Copyright (C) 2010, David Beazley, http://www.dabeaz.com 32
  • 33. Threads for Computation • You may actually want to compute something! • Fibonacci numbers • Image/audio processing • Parsing • The CPU will be busy • And it won't give up the GIL on its own Copyright (C) 2010, David Beazley, http://www.dabeaz.com 33
  • 34. CPU-Bound Switching • Releases and • Background thread reacquires the GIL generates a timer every 100 "ticks" interrupt every 10ms • 1 Tick ~= 1 interpreter • GIL released and instruction reacquired by current thread on interrupt Copyright (C) 2010, David Beazley, http://www.dabeaz.com 34
  • 35. Python Thread Switching Run 100 Run 100 Run 100 ticks ticks ticks CPU Bound Thread e e e e e e as uir le q as uir le q as uir le q re ac re ac re ac • Every 100 VM instructions, GIL is dropped, allowing other threads to run if they want • Not time based--switching interval depends on kind of instructions executed Copyright (C) 2010, David Beazley, http://www.dabeaz.com 35
  • 36. Ruby Thread Switching Timer Timer (10ms) Timer (10ms) Thread CPU Bound Run Run Thread e e as uir e e as uir le q le q re ac re ac • Loosely mimics the time-slice of the OS • Every 10ms, GIL is released/acquired Copyright (C) 2010, David Beazley, http://www.dabeaz.com 36
  • 37. A Common Theme • Both Python and Ruby have C code like this: void execute() { while (inst = next_instruction()) { // Run the VM instruction ... if (must_release_gil) { GIL_release(); /* Other threads may run now */ GIL_acquire(); } } } • Exact details vary, but concept is the same • Each thread has periodic release/acquire in the VM to allow other threads to run Copyright (C) 2010, David Beazley, http://www.dabeaz.com 37
  • 38. Question • What can go wrong with this bit of code? if (must_release_gil) { GIL_release(); /* Other threads may run now */ GIL_acquire(); } • Short answer: Everything! Copyright (C) 2010, David Beazley, http://www.dabeaz.com 38
  • 39. Pathology Copyright (C) 2010, David Beazley, http://www.dabeaz.com 39
  • 40. Thread Switching • Suppose you have two threads Running Thread 1 Thread 2 READY • Thread 1 : Running • Thread 2 : Ready (Waiting for GIL) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 40
  • 41. Thread Switching • Easy case : Thread 1 performs I/O (read/write) I/O Running Thread 1 BLOCKED release GIL pthreads/OS schedule Running Thread 2 READY acquire GIL • Thread 1 : Releases GIL and blocks for I/O • Thread 2 : Gets scheduled, starts running Copyright (C) 2010, David Beazley, http://www.dabeaz.com 41
  • 42. Thread Switching • Tricky case : Thread 1 runs until preempted pt m ee Running pr Thread 1 ??? release GIL pthreads/OS Which thread runs? Thread 2 READY ??? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 42
  • 43. Thread Switching • You might expect that Thread 2 will run pt m ee pr Running Thread 1 READY release GIL pthreads/OS acquire schedule GIL Running Thread 2 READY • But you assume the GIL plays nice... Copyright (C) 2010, David Beazley, http://www.dabeaz.com 43
  • 44. Thread Switching • What might actually happen on multicore pt m ee pr Running Running Thread 1 release acquire GIL GIL pthreads/OS schedule fails (GIL locked) Thread 2 READY READY • Both threads attempt to run simultaneously • ... but only one will succeed (depends on timing) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 44
  • 45. Fallacy • This code doesn't actually switch threads if (must_release_gil) { GIL_release(); /* Other threads may run now */ GIL_acquire(); } • It might switch threads, but it depends • What operating system • # cores • Lock scheduling policy (if any) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 45
  • 46. Fallacy • This doesn't force switching (sleeping) if (must_release_gil) { GIL_release(); sleep(0); /* Other threads may run now */ GIL_acquire(); } • It might switch threads, but it depends • What operating system • # cores • Lock scheduling policy (if any) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 46
  • 47. Fallacy • Neither does this (calling the scheduler) if (must_release_gil) { GIL_release(); sched_yield() /* Other threads may run now */ GIL_acquire(); } • It might switch threads, but it depends • What operating system • # cores • Lock scheduling policy (if any) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 47
  • 48. A Conflict • There are conflicting goals • Python/Ruby - wants to run on a single CPU, but doesn't want to do thread scheduling (i.e., let the OS do it). • OS - "Oooh. Multiple cores." Schedules as many runnable tasks as possible at any instant • Result: Threads fight with each other Copyright (C) 2010, David Beazley, http://www.dabeaz.com 48
  • 49. Multicore GIL Battle • Python 2.7 on OS-X (4 cores) Sequential : 6.12s Threaded (2 threads) : 9.28s (1.5x slower!) pt pt pt em em em p re p re pr e 100 ticks 100 ticks Thread 1 ... READY release acquire release acquire pthreads/OS Eventually... schedule fail schedule fail run Thread 2 READY READY READY • Millions of failed GIL acquisitions Copyright (C) 2010, David Beazley, http://www.dabeaz.com 49
  • 50. Multicore GIL Battle • You can see it! (2 CPU-bound threads) Why >100%? • Comment: In Python, it's very rapid • GIL is released every few microseconds! Copyright (C) 2010, David Beazley, http://www.dabeaz.com 50
  • 51. I/O Handling • If there is a CPU-bound thread, I/O bound threads have a hard time getting the GIL Thread 1 (CPU 1) Thread 2 (CPU 2) run sleep preempt Network Packet run Acquire GIL (fails) preempt run Acquire GIL (fails) Might repeat preempt 100s-1000s of times run Acquire GIL (fails) preempt Acquire GIL (success) run Copyright (C) 2010, David Beazley, http://www.dabeaz.com 51
  • 52. Messaging Pathology • Messaging on Linux (8 Cores) Ruby 1.9 (no threads) : 1.18s Ruby 1.9 (1 CPU thread) : 5839.4s • Locks in Linux have no fairness • Consequence: Really hard to steal the GIL • And Ruby only retries every 10ms Copyright (C) 2010, David Beazley, http://www.dabeaz.com 52
  • 53. Let's Talk Fairness • Fair-locking means that locks have some notion of priorities, arrival order, queuing, etc. running waiting t0 Lock t1 t2 t3 t4 t5 release running waiting t1 Lock t2 t3 t4 t5 t0 • Releasing means you go to end of line Copyright (C) 2010, David Beazley, http://www.dabeaz.com 53
  • 54. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s Messages + 1 CPU Thread (Linux) : 5839.4s • Question: Which one uses fair locking? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 54
  • 55. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s (Fair) Messages + 1 CPU Thread (Linux) : 5839.4s • Benefit : I/O threads get their turn (yay!) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 55
  • 56. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s (Fair) Messages + 1 CPU Thread (Linux) : 5839.4s • Benefit : I/O threads get their turn (yay!) • Python 2.7 (multiple cores) 2 CPU-Bound Threads (OS-X) : 9.28s 2 CPU-Bound Threads (Windows) : 63.0s • Question: Which one uses fair-locking? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 56
  • 57. Effect of Fair-Locking • Ruby 1.9 (multiple cores) Messages + 1 CPU Thread (OS-X) : 42.0s (Fair) Messages + 1 CPU Thread (Linux) : 5839.4s • Benefit : I/O threads get their turn (yay!) • Python 2.7 (multiple cores) 2 CPU-Bound Threads (OS-X) : 9.28s 2 CPU-Bound Threads (Windows) : 63.0s (Fair) • Problem: Too much context switching Copyright (C) 2010, David Beazley, http://www.dabeaz.com 57
  • 58. Fair-Locking - Bah! • In reality, you don't want fairness • Messaging Revisited (OS X, 4 Cores) Ruby 1.9 (No Threads) : 1.29s Ruby 1.9 (1 CPU-Bound thread) : 42.0s (33x slower) • Why is it still 33x slower? • Answer: Fair locking! (and convoying) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 58
  • 59. Messaging Revisited • Go back to the messaging server def server(): while True: msg = recv() send(msg) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 59
  • 60. Messaging Revisited • The actual implementation (size-prefixed messages) def server(): while True: size = recv(4) msg = recv(size) send(size) send(msg) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 60
  • 61. Performance Explained • What actually happens under the covers def server(): while True: GIL release size = recv(4) GIL release msg = recv(size) GIL release send(size) GIL release send(msg) • Why? Each operation might block • Catch: Passes control back to CPU-bound thread Copyright (C) 2010, David Beazley, http://www.dabeaz.com 61
  • 62. Performance Illustrated Timer 10ms 10ms 10ms 10ms 10ms Thread run CPU Bound Thread run run run run run I/O recv recv send send done Thread Data Arrives • Each message has 40ms response cycle • 1000 messages x 40ms = 40s (42.0s measured) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 62
  • 63. Despair Copyright (C) 2010, David Beazley, http://www.dabeaz.com 63
  • 64. A Solution? Don't use threads! • Yes, yes, everyone hates threads • However, that's only because they're useful! • Threads are used for all sorts of things • Even if they're hidden behind the scenes Copyright (C) 2010, David Beazley, http://www.dabeaz.com 64
  • 65. A Better Solution Make the GIL better • It's probably not going away (very difficult) • However, does it have to thrash wildly? • Question: Can you do anything? Copyright (C) 2010, David Beazley, http://www.dabeaz.com 65
  • 66. GIL Efforts in Python 3 • Python 3.2 has a new GIL implementation • It's imperfect--in fact, it has a lot of problems • However, people are experimenting with it Copyright (C) 2010, David Beazley, http://www.dabeaz.com 66
  • 67. Python 3 GIL • GIL acquisition now based on timeouts running Thread 1 drop_request release 5ms running Thread 2 IOWAIT READY wait(gil, TIMEOUT) wait(gil, TIMEOUT) data arrives • Involves waiting on a condition variable Copyright (C) 2010, David Beazley, http://www.dabeaz.com 67
  • 68. Problem: Convoying • CPU-bound threads significantly degrade I/O running running running Thread 1 release 5ms 5ms 5ms run run Thread 2 READY READY READY data data data arrives arrives arrives • This is the same problem as in Ruby • Just a shorter time delay (5ms) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 68
  • 69. Problem: Convoying • You can directly observe the delays (messaging) Python/Ruby (No threads) : 1.29s (no delays) Python 3.2 (1 Thread) : 20.1s (5ms delays) Ruby 1.9 (1 Thread) : 42.0s (10ms delays) • Still not great, but problem is understood Copyright (C) 2010, David Beazley, http://www.dabeaz.com 69
  • 70. Promise Copyright (C) 2010, David Beazley, http://www.dabeaz.com 70
  • 71. Priorities • Best promise : Priority scheduling • Earlier versions of Ruby had it • It works (OS-X, 4 cores) Ruby 1.9 (1 Thread) : 42.0s Ruby 1.8.7 (1 Thread) : 40.2s Ruby 1.8.7 (1 Thread, lower priority) : 10.0s • Comment: Ruby-1.9 allows thread priorities to be set in pthreads, but it doesn't seem to have much (if any) effect Copyright (C) 2010, David Beazley, http://www.dabeaz.com 71
  • 72. Priorities • Experimental Python-3.2 with priority scheduler • Also features immediate preemption • Messages (OS X, 4 Cores) Python 3.2 (No threads) : 1.29s Python 3.2 (1 Thread) : 20.2s Python 3.2+priorities (1 Thread) : 1.21s (faster?) • That's a lot more promising! Copyright (C) 2010, David Beazley, http://www.dabeaz.com 72
  • 73. New Problems • Priorities bring new challenges • Starvation • Priority inversion • Implementation complexity • Do you have to write a full OS scheduler? • Hopefully not, but it's an open question Copyright (C) 2010, David Beazley, http://www.dabeaz.com 73
  • 74. Final Words • Implementing a GIL is a lot trickier than it looks • Even work with priorities has problems • Good example of how multicore is diabolical Copyright (C) 2010, David Beazley, http://www.dabeaz.com 74
  • 75. Thanks for Listening! • I hope you learned at least one new thing • I'm always interested in feedback • Follow me on Twitter (@dabeaz) Copyright (C) 2010, David Beazley, http://www.dabeaz.com 75