SlideShare a Scribd company logo
1 of 41
Using Coroutines to Create Efficient, High-Concurrency Web Applications Matt Spitz meebo, inc.
What’s a Web Application, Anyway? 2 Application Database Application
High-Concurrency Web Applications 3 Application Database Application
High-Concurrency Web Applications Many requests per second Optimization opportunities Hardware cost Response time Concurrency Database impact 4
5 Meebo Bar
Meebo Bar 1000+ sites Quantcast: 197 MM monthly uniques* LOTS of pageviews LOTS of ad requests 6 * http://bit.ly/xAPXx
Meebo’s Ad Server Given User features Available ads Objective Maximize revenue P(click) Price Satisfy advertisers Respect targeting Smooth campaign delivery Complex application Lots of concurrent requests 7
Sample App: FortuneTeller 8
Sample App: FortuneTeller Given Username Available fortunes Objective Select fortune for user JaccardSimilarity(username, fortune) username=PyConIsForLovers “Generosity and perfection are your everlasting goals.” 9
Hosting FortuneTeller Apache CGI Apache mod_wsgi Twisted gevent + gunicorn 10
Hosting FortuneTeller Evaluation metrics Code complexity Library support Memory efficiency Multi-core support 11
Take One: Apache CGI One process per request O/S schedules CPU 12
Take One: Apache CGI Advantages Straightforward, synchronous code Isolated requests Disadvantages Process overhead Cold cache 13
Evaluation 14
Performance Environment 4-core VM, 1 GB RAM Ubuntu Server 10.10 MySQL on host machine 25ms interface delay 1024 requests, X concurrent 15
Performance 16
Take Two: Apache mod_wsgi Using mpm_prefork Worker processes handle requests One concurrent request per process Memory cached between requests O/S schedules CPU 17
Take Two: Apache mod_wsgi Advantages Straightforward, synchronous code Cached memory Disadvantages Resource inefficient Need working set in each process Cold cache on restart Managing worker count Too few: 502 Too many: OOM? Database DoS? 18
Evaluation 19
Performance 20
Take Three: Twisted Asynchronous framework Events and callbacks Twisted orchestrates context switches Twisted server Single event loop Concurrent requests 21
Quick Break: Event Loops s = socket.socket(…) s.setblocking(ISBLOCKING) s.connect((HOST, PORT)) greeting = s.recv(1024) s.close() Blocking Wait for data Nonblocking Initiate, return immediately Data (if available) Exception: “I’m not done yet” Requires more plumbing 22
Quick Break: Event Loops Nonblocking sockets in an event loop 23 1. f(x): s = NonBlockingSocket(…) 	greeting = s.recv(1024) 	print x, “|”, greeting Events fd=5, fp=g, {s: ‘hi’, a: 5} fd=5, fp=g, {s: ‘hi’, a: 5} 2. Call recv(). fd=2, fp=f, {x: 8080} 3. Create context, add to the event loop. fd=3, fp=myfunc, {} fd=3, fp=myfunc, {} 4. Process events that are ready (select/poll). fd=18, fp=f, {x: 80} fd=18, fp=f, {x: 80} fd=18, fp=f, {x: 80} 5. Return to context when data is ready. 6. “80 | Hello from socket s!”
Take Three: Twisted Asynchronous framework Events and callbacks Twisted orchestrates context switches Twisted server Single event loop Concurrent requests 24
Take Three: Twisted Advantages Shared memory User space context switches Disadvantages Develop asynchronously Stuck in the framework Asynchronous libraries No I/O in C Unfair scheduling Using multiple cores 25
Evaluation 26
Performance 27
Take Four: gevent + gunicorn gevent Networking library Uses event loop Synchronous API Synchronous code running asynchronously Monkey patching Rewrites standard modules Coroutines for function context Lightweight threads, no stack greenlet implementation 28
Take Four: gevent + gunicorn gunicorn (“Green Unicorn”) Lightweight WSGI server Multiple worker processes Share queued requests gevent support 29
Take Four: gevent + gunicorn Advantages Best of both worlds! mod_wsgi Straightforward, synchronous code No framework, just python Multicore support Twisted Shared memory User space context switches Disadvantages Pure-python libraries Unfair scheduling 30
Evaluation 31
Performance 32
Performance 33
Performance 34
Performance 35
Performance 36
“Evented” Development Synchronous code still runs asynchronously Requests aren’t independent Things to keep in mind Duplicate work Socket caching CPU hogging 37
gunicorn + gevent in Production Managing gunicorn greins Randall Leeds (tilgovi): github/meebo/greins Multiple apps URL routing Server hooks Worker launch Pre/post requests Daemon interface Debugging gevent gevent-profiler Shaun Lindsay (srlindsay): github/meebo/gevent-profiler Execution trace Time spent 38
Load-tested, unicorn-approved! Blocking code is simple Nonblocking code is efficient gevent + gunicorn Simple  Efficient Reliable 39
Load-tested, unicorn-approved! 40
Thanks! 41

More Related Content

Similar to Using Coroutines to Create Efficient, High-Concurrency Web Applications

Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data projectMichael Peacock
 
Hopping in clouds: a tale of migration from one cloud provider to another
Hopping in clouds: a tale of migration from one cloud provider to anotherHopping in clouds: a tale of migration from one cloud provider to another
Hopping in clouds: a tale of migration from one cloud provider to anotherMichele Orselli
 
GemFire In Memory Data Grid
GemFire In Memory Data GridGemFire In Memory Data Grid
GemFire In Memory Data GridDmitry Buzdin
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitorInfluxData
 
Building with Watson - Serverless Chatbots with PubNub and Conversation
Building with Watson - Serverless Chatbots with PubNub and ConversationBuilding with Watson - Serverless Chatbots with PubNub and Conversation
Building with Watson - Serverless Chatbots with PubNub and ConversationIBM Watson
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Fwdays
 
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdfsash236
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWAREFIWARE
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB
 
Scale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabricScale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabricWim Van den Broeck
 
Samza Demo @scale 2017
Samza Demo @scale 2017Samza Demo @scale 2017
Samza Demo @scale 2017Xinyu Liu
 
Event-Based API Patterns and Practices
Event-Based API Patterns and PracticesEvent-Based API Patterns and Practices
Event-Based API Patterns and PracticesLaunchAny
 

Similar to Using Coroutines to Create Efficient, High-Concurrency Web Applications (20)

Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Evolution of a big data project
Evolution of a big data projectEvolution of a big data project
Evolution of a big data project
 
Hopping in clouds: a tale of migration from one cloud provider to another
Hopping in clouds: a tale of migration from one cloud provider to anotherHopping in clouds: a tale of migration from one cloud provider to another
Hopping in clouds: a tale of migration from one cloud provider to another
 
GemFire In Memory Data Grid
GemFire In Memory Data GridGemFire In Memory Data Grid
GemFire In Memory Data Grid
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
GemFire In-Memory Data Grid
GemFire In-Memory Data GridGemFire In-Memory Data Grid
GemFire In-Memory Data Grid
 
PDC Highlights
PDC HighlightsPDC Highlights
PDC Highlights
 
Learning with F#
Learning with F#Learning with F#
Learning with F#
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Advanced kapacitor
Advanced kapacitorAdvanced kapacitor
Advanced kapacitor
 
Building with Watson - Serverless Chatbots with PubNub and Conversation
Building with Watson - Serverless Chatbots with PubNub and ConversationBuilding with Watson - Serverless Chatbots with PubNub and Conversation
Building with Watson - Serverless Chatbots with PubNub and Conversation
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
 
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
Developing your first application using FIWARE
Developing your first application using FIWAREDeveloping your first application using FIWARE
Developing your first application using FIWARE
 
MongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB.local Atlanta: Introduction to Serverless MongoDB
MongoDB.local Atlanta: Introduction to Serverless MongoDB
 
Scale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabricScale Your Data Tier with Windows Server AppFabric
Scale Your Data Tier with Windows Server AppFabric
 
Samza Demo @scale 2017
Samza Demo @scale 2017Samza Demo @scale 2017
Samza Demo @scale 2017
 
Event-Based API Patterns and Practices
Event-Based API Patterns and PracticesEvent-Based API Patterns and Practices
Event-Based API Patterns and Practices
 

Using Coroutines to Create Efficient, High-Concurrency Web Applications

  • 1. Using Coroutines to Create Efficient, High-Concurrency Web Applications Matt Spitz meebo, inc.
  • 2. What’s a Web Application, Anyway? 2 Application Database Application
  • 3. High-Concurrency Web Applications 3 Application Database Application
  • 4. High-Concurrency Web Applications Many requests per second Optimization opportunities Hardware cost Response time Concurrency Database impact 4
  • 6. Meebo Bar 1000+ sites Quantcast: 197 MM monthly uniques* LOTS of pageviews LOTS of ad requests 6 * http://bit.ly/xAPXx
  • 7. Meebo’s Ad Server Given User features Available ads Objective Maximize revenue P(click) Price Satisfy advertisers Respect targeting Smooth campaign delivery Complex application Lots of concurrent requests 7
  • 9. Sample App: FortuneTeller Given Username Available fortunes Objective Select fortune for user JaccardSimilarity(username, fortune) username=PyConIsForLovers “Generosity and perfection are your everlasting goals.” 9
  • 10. Hosting FortuneTeller Apache CGI Apache mod_wsgi Twisted gevent + gunicorn 10
  • 11. Hosting FortuneTeller Evaluation metrics Code complexity Library support Memory efficiency Multi-core support 11
  • 12. Take One: Apache CGI One process per request O/S schedules CPU 12
  • 13. Take One: Apache CGI Advantages Straightforward, synchronous code Isolated requests Disadvantages Process overhead Cold cache 13
  • 15. Performance Environment 4-core VM, 1 GB RAM Ubuntu Server 10.10 MySQL on host machine 25ms interface delay 1024 requests, X concurrent 15
  • 17. Take Two: Apache mod_wsgi Using mpm_prefork Worker processes handle requests One concurrent request per process Memory cached between requests O/S schedules CPU 17
  • 18. Take Two: Apache mod_wsgi Advantages Straightforward, synchronous code Cached memory Disadvantages Resource inefficient Need working set in each process Cold cache on restart Managing worker count Too few: 502 Too many: OOM? Database DoS? 18
  • 21. Take Three: Twisted Asynchronous framework Events and callbacks Twisted orchestrates context switches Twisted server Single event loop Concurrent requests 21
  • 22. Quick Break: Event Loops s = socket.socket(…) s.setblocking(ISBLOCKING) s.connect((HOST, PORT)) greeting = s.recv(1024) s.close() Blocking Wait for data Nonblocking Initiate, return immediately Data (if available) Exception: “I’m not done yet” Requires more plumbing 22
  • 23. Quick Break: Event Loops Nonblocking sockets in an event loop 23 1. f(x): s = NonBlockingSocket(…) greeting = s.recv(1024) print x, “|”, greeting Events fd=5, fp=g, {s: ‘hi’, a: 5} fd=5, fp=g, {s: ‘hi’, a: 5} 2. Call recv(). fd=2, fp=f, {x: 8080} 3. Create context, add to the event loop. fd=3, fp=myfunc, {} fd=3, fp=myfunc, {} 4. Process events that are ready (select/poll). fd=18, fp=f, {x: 80} fd=18, fp=f, {x: 80} fd=18, fp=f, {x: 80} 5. Return to context when data is ready. 6. “80 | Hello from socket s!”
  • 24. Take Three: Twisted Asynchronous framework Events and callbacks Twisted orchestrates context switches Twisted server Single event loop Concurrent requests 24
  • 25. Take Three: Twisted Advantages Shared memory User space context switches Disadvantages Develop asynchronously Stuck in the framework Asynchronous libraries No I/O in C Unfair scheduling Using multiple cores 25
  • 28. Take Four: gevent + gunicorn gevent Networking library Uses event loop Synchronous API Synchronous code running asynchronously Monkey patching Rewrites standard modules Coroutines for function context Lightweight threads, no stack greenlet implementation 28
  • 29. Take Four: gevent + gunicorn gunicorn (“Green Unicorn”) Lightweight WSGI server Multiple worker processes Share queued requests gevent support 29
  • 30. Take Four: gevent + gunicorn Advantages Best of both worlds! mod_wsgi Straightforward, synchronous code No framework, just python Multicore support Twisted Shared memory User space context switches Disadvantages Pure-python libraries Unfair scheduling 30
  • 37. “Evented” Development Synchronous code still runs asynchronously Requests aren’t independent Things to keep in mind Duplicate work Socket caching CPU hogging 37
  • 38. gunicorn + gevent in Production Managing gunicorn greins Randall Leeds (tilgovi): github/meebo/greins Multiple apps URL routing Server hooks Worker launch Pre/post requests Daemon interface Debugging gevent gevent-profiler Shaun Lindsay (srlindsay): github/meebo/gevent-profiler Execution trace Time spent 38
  • 39. Load-tested, unicorn-approved! Blocking code is simple Nonblocking code is efficient gevent + gunicorn Simple Efficient Reliable 39

Editor's Notes

  1. IntroductionMatt Spitz. Software Engineer at meebo. Here today to talk about how building web applications in python and the pros/cons of the various means by which we can serve them up.
  2. Users make requests to an application, which uses a shared storage backend.
  3. Same thing, just lots and lots and lots of concurrent requests
  4. With such a large-scale application, small optimizations can have a huge impactSave money on hardware (machines, RAM, CPU)Faster response time, better user experienceHandling more concurrent requestsSubstantially decrease impact on shared resourcesone example of a high-concurrency web application is theadserver we run at meebobefore I talk about the adserver, let me introduce the meebo bar
  5. Themeebo bar is deployed to our partner sites and offers a neat way to share content on the site and allows users to chat with other members of the site.
  6. Show off the chat in the corner, the sharing buttons, and the ad unitCan’t give you numbers, but suffice it to say that any adserver to which you’re making those calls can be considered a “high-concurrency web application”
  7. Selecting the ad a user is most likely to click onServing the most valuable ads (e.g. highest CPC)Respect whatever targeting the advertisers have selectedEnsuring smooth, complete delivery for each ad campaignTheadserver is a pretty complicated beast and I think that going through it wouldn’t really help in making my point for this talk, so I wrote a sample application that has a similar structure and resource-usage patterns
  8. describeJaccardSimilarity (size(intersection(x,y))/size(union(x,y))) super arbitrary, just to represent some CPU processing in the applicationSHOW OFF THE CODE(make sure to show off the user fortune caching)
  9. We’re gonna try four different serving implementations
  10. How difficult is it to write code for these applications?What’s the extent to which these applications allow us to use 3rd party libraries?How efficient is the application in terms of memory?Can we take advantage of multi-core machines?
  11. SHOW OFF THE CODE
  12. Simple to writeRequests don’t affect one another--Need to reload all working set (all fortunes) with each requestNo database connection cachingIt’s a start, but it doesn’t scale
  13. Before I show you a performance graph, want to go over the benchmarks25ms delay on interface between guest and host to exaggerate the effects of I/O on response time
  14. 8 processes maximumRequires loading all fortunes with each request
  15. Apache spins up a number of worker processes to handle requestsWorkers handle a configurable number of requests before being replacedWorkers handle exactly one request at a timeMemory is cached in the worker, so we can re-use the set of fortunes between requestsOperating system handles schedulingMAKE SURE TO SHOW OFF THE HANDLER
  16. Using almost the same simple, synchronous code as we had in the CGIMemory is cached across requests in the same workerNo shared memory between workersNeed to load set of all fortunes in each workerMore workers requires more RAMEach worker load requires a DB requestHammers the database on apache restart
  17. Using 8 worker processes
  18. Twisted is an asynchronous framework for building network applications Developer structures code as events and callbacksTwisted orchestrates context switches among requests, typically on things that take a long time (I/O)Twisted server Single event loop => single process Handles multiple requests simultaneously in the event loopAnd since we’re all in one process, memory is shared among requests
  19. Some of this may be review, but it’s important that everyone understands thisBlocking: connect and recv wait until their actions complete before returningNonblocking: connect and recv initiate the action (if it hasn’t been already) and return the data or raise an exception immediatelyRequires a lot more plumbing than the example above
  20. …so let’s go back to this slide (the next one)
  21. Twisted is a framework built around an event loopProvides a nice interface for setting up your functions and callbacks (for success or error)Keeps track of multiple execution paths simultaneously, just as we saw in the previous exampleThe big problem with Twisted is that you can’t just plug in your synchronous app. You have to set up these events and callbacks for every piece of code might block.MAKE SURE TO SHOW OFF THE CODE AND HOW MUCH OF A PAIN IT IS
  22. AdvantagesMemory is shared among requests (we only have to load the fortunes once to service many simultaneous requests)Context switches happen in user space (fast)DisadvantagesNeed to rewrite code to be asynchronous Guido sez: “I hate callback-based programming.” It’s hard to wrap your brain around. stuck in the framework– everything has to be asynchronous, you have to use Twisted’s standard libraries, which may not behave quite as you’d like3rd- party libraries must also be asynchronous No I/O in C libraries (at least not out of the box)CPU-intense requests monopolize the processormod_wsgi: O/S handles scheduling, processes scheduled at any time, and CPU time is shared “fairly” Twisted: CPU scheduled explicitly, CPU-bound blocks of code prevent other requests from runningTaking advantage of multiple cores isn’t trivial-- load balancer? multiprocessing module?
  23. Note that Twisted is running only on a single core
  24. geventNetworking library using libevent Has an event loop, but its API is synchronousTransforms synchronous applications to be asynchronous automatically!!!“Monkey patches” python system modules (socket)Rewrites socket calls to set up a callback and a context after writing the request to the socketFunction context in coroutinesThink of coroutines as lightweight threadsPointer to code + context, no stacke.g. Closures and generatorsUses an event loop to manage all concurrent requestsContext switch on network I/O (just like Twisted)
  25. gunicornFast, lightweight WSGI server written by Benoit Uses multiple workers to handle requestsBig win: Supports gevent workers out of the boxEach worker maintains a pool of coroutines to handle incoming requests Those workers share memory among requestsAt this point, we look at the code.
  26. AdvantagesBest of both worlds!mod_wsgiEasy to writeNo framework to do everything asynchronously, just pythonCan take advantage of multiple coresTwistedShared memory among requests within each workerContext switches in user spaceDisadvantagesSimilar to TwistedNo I/O in C librariesCPU-intense requests monopolize the processor
  27. gunicorn_1 is comparable to TwistedNegligible performance impact when the application is made asynchronous
  28. gunicorn_4-8 is faster than mod_wsgiMaking context switch deterministically and in user space is more efficient than OS scheduling
  29. gevent takes care of transforming synchronous code, but it’s still executed in an event loop Synchronous code is not necessarily executed synchronouslyDuplicated loads: simultaneous database requests 1) no fortunes? load up the fortunes! 2) no fortunes? load up the fortunes! => use “events” to protect duplicate effortsSocket caching: can’t naively cache socketsCan’t use the same socket for two simultaneous operationsMust create a new socket per connection or use a poolCPU hogging Might want to offload CPU-intense things to another daemon/process
  30. Managing gunicorngreinsRandall LeedsEnables running multiple apps in a single gunicorn instanceRoutes traffic based on URLAllows for global and per-app server hooksOn worker startup (preloading a working set)Pre/post requests (Apache-style request logging)Provides standard start/stop/reload/restart interface to gunicornDebugging gevent applicationsgevent-profilerShaun LindsayProvides a linear trace of all function calls and context switchesAnalyzes where CPU time is spent in a given application
  31. Blocking code is easy to understand, but traditional deployments aren’t very efficientAsynchronous applications make the best use of resources, but they’re a pain to writeRunning gevent workers in gunicorn is both simple and efficient, as it allows you to write blocking code that is converted to be asynchronous automatically.At meebo, we've found this setup to be amazingly efficient and reliable, even under extreme loadA number of our mission-critical, high-concurrency web applications have been running under this setup for the last 7 months with no major issues or outages. Been able to save money on hardware with no impact on response time…we even got a Halloween costume out of it.