SlideShare a Scribd company logo
1 of 44
Web Server
                                      Bottlenecks and
                                    Performance Tuning
                                                     Graham Dumpleton
                                                     PyCon – March 2012


Graham Dumpleton @GrahamDumpleton

Starting my PyCon talk. Lets hope I don't loose my
voice completely while doing this.
Follow along



✴http://www.slideshare.net/GrahamDumpleton
The big picture
Front end
Web application



Web application   Front end time
 0.15 seconds      3.1 seconds
Performance Golden Rule



    "80-90% of the end-user
 response time is spent on the
     frontend. Start there."

http://www.stevesouders.com/blog/2012/02/10/
        the-performance-golden-rule/
Application breakdown
Are benchmarks stupid?




http://nichol.as/benchmark-of-python-web-servers
Benchmarks as a tool


✴Web server benchmarks are of more
 value when used as an exploratory tool to
 understand how a specific system
 works, not to compare systems.
What about load tests?

✴Hitting a site with extreme load will only
 show you that it will likely fail under a
 denial of service attack.
✴Your typical web server load test isn't
 alone going to help you understand how a
 web server is contributing to it failing.
What should you test?


✴You should use a range of purpose built
 tests to trigger certain scenarios.
✴Use the tests to explore corner cases and
 not just the typical use case.
Environment factors


✴Amount of memory available.
✴Number of processors available.
✴Use of threads vs processes.
✴Python global interpreter lock (GIL)
Client impacts


✴Slow HTTP browsers/clients.
✴Browser keep alive connections.
Application requirements




✴Need to handle static assets.
Use cases to explore

✴Memory used by web application.
✴Using processes versus threads.
✴Impacts of long running requests.
✴Restarting of server processes.
✴Startup costs and lazy loading.
Memory usage


                  1 process     1 process
                 1000 threads   1 thread




http://nichol.as/benchmark-of-python-web-servers
What effects memory use?


✴Web server base memory usage.
✴Web server per thread memory usage.
✴Application base memory usage.
 ✴Is application loaded prior to forking?
✴Per request transient memory usage.
Processes Vs Threads

            150

            135

            120

            105

             90

             75
Processes




             60

             45

             30

             15

              0
                  0    15   30   45   60   75   90   105   120   135   150

                            Threads
Apache/mod_wsgi defaults



   Configuration       Max Processes   Threads

  Apache (prefork)
mod_wsgi (embedded)       150           1
  Apache (worker)
mod_wsgi (embedded)        6            25
  Apache (prefork)
 mod_wsgi (daemon)         1            15
  Apache (worker)
 mod_wsgi (daemon)         1            15
Other WSGI servers

Configuration     Max Processes   Threads

  FASTCGI
flup (prefork)         50           1
  FASTCGI
flup (threaded)        1            5
  gunicorn            1            1
   uWSGI              1            1

   tornado            1            1
Less than fair
            150

            135
                              Apache (prefork)
            120             mod_wsgi (embedded)
            105
                                         FASTCGI
             90                        flup (prefork)
             75
                                      Apache (prefork/worker)
Processes




             60
                                        mod_wsgi (daemon)
             45
                                          Apache (worker)
             30
                                        mod_wsgi (embedded)
             15

              0
                  0   15    30   45     60   75   90   105   120   135   150

                           Threads
What to use?

✴Number of overall threads dictated by:
 ✴Number of concurrent users.
 ✴Response time for requests.
✴Processes preferred over threads, but:
 ✴Restricted by amount of memory.
 ✴Choice influenced by number of processors.
Thread utilisation
                    1 second

          6

          5
Threads




          4

          3

          2

          1
Request backlog
                      Backlog occurred and queue
                       time increased to 750 ms




150ms




        60 requests       Thread utilisation jumped from
        per second        2.5 to 7.5 and maxed out at 9
Processes are better
                             Backlog only started
                       at higher throughput and queue
                          time mostly under 100ms




100ms




                          Thread utilisation only jumped from
        75 requests
                            2.5 to 7.5 at higher throughput
        per second
                               and didn't actually reach 9
CPU bound




  Bulk of time is from doing
things within the process itself
I/O wait




 Waiting on responses from
backend services a significant
    proportion of time
Long running requests

✴Complex calculations.
✴Slow backend services.
✴Large file uploads.
✴Large responses.
✴Slow HTTP clients.
Varying request times




Average:   1385 ms
Minimum:   4.7 ms
Maximum:   20184 ms
Std Dev:   3896 ms
Performance breakdown




Why is creating the connection
to PostgreSQL taking up 40%
   of overall response time
Slow HTTP clients

✴Add nginx as a front end to the WSGI server.
✴Brings the following benefits to the WSGI server.
 ✴Isolation from slow clients.
 ✴No need to handle keep alive in the WSGI server.
 ✴Can offload serving of static files.
 ✴Can use X-Accel-Redirect for dynamically
    generated files.
Request funnelling


  nginx
front end

Apache
workers

mod_wsgi
daemons
Complete overload
Forced restarts
✴Triggers for restarts:
 ✴Manual restart to fix issues/configuration.
 ✴Maximum number of requests reached.
 ✴Reloading of new application code.
 ✴Individual requests block/timeout.
✴Restarts can make things worse.
Auto scaling
✴Apache/mod_wsgi embedded mode.
 ✴Apache prefork MPM defaults.
  ✴Initial 1 / Maximum 150
 ✴Apache worker MPM defaults.
  ✴Initial 2 / Maximum 6
✴Auto scaling can make things worse.
Pre load everything

✴Start maximum processes up front.
✴Pre load your web application when the
 process starts and not lazily loaded on
 the first request.
✴Keep processes persistent in memory
 and avoid unnecessary restarts.
Horizontal scaling

✴Using more servers is fine.
✴Load balance across dedicated hosts.
✴Or add additional hosts as required.
✴Ensure though that if adding more hosts
 that you have preloaded the web
 application before directing traffic to it.
Monitoring is key

✴Treat your server
 as a black box and
 you will never
 know what is going
 on inside.
Server monitoring

✴Open source tools.
 ✴Monit
 ✴Munin
 ✴Cacti
 ✴Nagios
Python web tools
✴Django debug toolbar.
 ✴Only useful for debugging a single request
  in a development setting.
✴Sentry.
 ✴Useful for capturing runtime errors, but
  performance issues don't generate
  exceptions.
New Relic APM
Apache/mod_wsgi
Summing up

✴Use benchmarks to explore a specific
 system, not to compare different systems.
✴Don't trust the defaults of any server, you
 need to tune it for your web application.
✴Monitor your live production systems.
✴New Relic for really deep introspection.
Try New Relic

✴ Graham.Dumpleton@gmail.com
✴ http://www.slideshare.net/GrahamDumpleton

✴ Find out more about New Relic:
  ✴ http://newrelic.com
✴ Extended Pro Trial for PyCon attendees:
  ✴ http://newrelic.com/30
✴ Come work for New Relic:
  ✴ http://newrelic.com/about/jobs

More Related Content

What's hot

Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeMichael May
 
VCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to FastlyVCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to FastlyFastly
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanizecoreygoldberg
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Fastly
 
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013Puppet
 
Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015Fastly
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpAll Things Open
 
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentCaching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentFastly
 
Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with ExamplesGabriele Lana
 
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksHow to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksCarlos Sanchez
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014Amazon Web Services
 
ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4Jim Jagielski
 
Test-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpec
Test-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpecTest-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpec
Test-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpecMartin Etmajer
 
Exploring the Titanium CLI - Codestrong 2012
Exploring the Titanium CLI - Codestrong 2012Exploring the Titanium CLI - Codestrong 2012
Exploring the Titanium CLI - Codestrong 2012Chris Barber
 
Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...
Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...
Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...Puppet
 
The Puppet Master on the JVM - PuppetConf 2014
The Puppet Master on the JVM - PuppetConf 2014The Puppet Master on the JVM - PuppetConf 2014
The Puppet Master on the JVM - PuppetConf 2014Puppet
 
Wordpress y Docker, de desarrollo a produccion
Wordpress y Docker, de desarrollo a produccionWordpress y Docker, de desarrollo a produccion
Wordpress y Docker, de desarrollo a produccionSysdig
 

What's hot (20)

Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the Edge
 
VCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to FastlyVCL template abstraction model and automated deployments to Fastly
VCL template abstraction model and automated deployments to Fastly
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanize
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101
 
FreeBSD: Dev to Prod
FreeBSD: Dev to ProdFreeBSD: Dev to Prod
FreeBSD: Dev to Prod
 
Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013Vagrant + Rouster at salesforce.com - PuppetConf 2013
Vagrant + Rouster at salesforce.com - PuppetConf 2013
 
Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015Design & Performance - Steve Souders at Fastly Altitude 2015
Design & Performance - Steve Souders at Fastly Altitude 2015
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and Gulp
 
Nginx
NginxNginx
Nginx
 
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic ContentCaching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
 
Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with Examples
 
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero ClicksHow to Develop Puppet Modules: From Source to the Forge With Zero Clicks
How to Develop Puppet Modules: From Source to the Forge With Zero Clicks
 
Tuning Solr for Logs
Tuning Solr for LogsTuning Solr for Logs
Tuning Solr for Logs
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4ApacheConNA 2015: What's new in Apache httpd 2.4
ApacheConNA 2015: What's new in Apache httpd 2.4
 
Test-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpec
Test-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpecTest-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpec
Test-Driven Infrastructure with Puppet, Test Kitchen, Serverspec and RSpec
 
Exploring the Titanium CLI - Codestrong 2012
Exploring the Titanium CLI - Codestrong 2012Exploring the Titanium CLI - Codestrong 2012
Exploring the Titanium CLI - Codestrong 2012
 
Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...
Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...
Continuous Infrastructure: Modern Puppet for the Jenkins Project - PuppetConf...
 
The Puppet Master on the JVM - PuppetConf 2014
The Puppet Master on the JVM - PuppetConf 2014The Puppet Master on the JVM - PuppetConf 2014
The Puppet Master on the JVM - PuppetConf 2014
 
Wordpress y Docker, de desarrollo a produccion
Wordpress y Docker, de desarrollo a produccionWordpress y Docker, de desarrollo a produccion
Wordpress y Docker, de desarrollo a produccion
 

Similar to PyCon US 2012 - Web Server Bottlenecks and Performance Tuning

Scale Apache with Nginx
Scale Apache with NginxScale Apache with Nginx
Scale Apache with NginxBud Siddhisena
 
Choosing a Web Architecture for Perl
Choosing a Web Architecture for PerlChoosing a Web Architecture for Perl
Choosing a Web Architecture for PerlPerrin Harkins
 
Distributed Queue System using Gearman
Distributed Queue System using GearmanDistributed Queue System using Gearman
Distributed Queue System using GearmanEric Cho
 
Joomla! Performance on Steroids
Joomla! Performance on SteroidsJoomla! Performance on Steroids
Joomla! Performance on SteroidsSiteGround.com
 
MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011Mike Willbanks
 
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI serverPyruvate, a reasonably fast, non-blocking, multithreaded WSGI server
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI serverPloneFoundation
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsАНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsWDDay
 
Extending Kubernetes – Admission webhooks
Extending Kubernetes – Admission webhooksExtending Kubernetes – Admission webhooks
Extending Kubernetes – Admission webhooksStefan Schimanski
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologiesSachin Aggarwal
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold startsYan Cui
 
ApacheCon 2014 - What's New in Apache httpd 2.4
ApacheCon 2014 - What's New in Apache httpd 2.4ApacheCon 2014 - What's New in Apache httpd 2.4
ApacheCon 2014 - What's New in Apache httpd 2.4Jim Jagielski
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardContinuent
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case Kai Sasaki
 

Similar to PyCon US 2012 - Web Server Bottlenecks and Performance Tuning (20)

Scale Apache with Nginx
Scale Apache with NginxScale Apache with Nginx
Scale Apache with Nginx
 
Choosing a Web Architecture for Perl
Choosing a Web Architecture for PerlChoosing a Web Architecture for Perl
Choosing a Web Architecture for Perl
 
Distributed Queue System using Gearman
Distributed Queue System using GearmanDistributed Queue System using Gearman
Distributed Queue System using Gearman
 
slides (PPT)
slides (PPT)slides (PPT)
slides (PPT)
 
Apache
ApacheApache
Apache
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
 
Joomla! Performance on Steroids
Joomla! Performance on SteroidsJoomla! Performance on Steroids
Joomla! Performance on Steroids
 
MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI serverPyruvate, a reasonably fast, non-blocking, multithreaded WSGI server
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 jsАНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
АНДРІЙ ШУМАДА «To Cover Uncoverable» Online WDDay 2022 js
 
Extending Kubernetes – Admission webhooks
Extending Kubernetes – Admission webhooksExtending Kubernetes – Admission webhooks
Extending Kubernetes – Admission webhooks
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
 
How to improve lambda cold starts
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold starts
 
Velocity 2010 - ATS
Velocity 2010 - ATSVelocity 2010 - ATS
Velocity 2010 - ATS
 
ApacheCon 2014 - What's New in Apache httpd 2.4
ApacheCon 2014 - What's New in Apache httpd 2.4ApacheCon 2014 - What's New in Apache httpd 2.4
ApacheCon 2014 - What's New in Apache httpd 2.4
 
Optimizing wordpress
Optimizing wordpressOptimizing wordpress
Optimizing wordpress
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 

More from Graham Dumpleton

Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.Graham Dumpleton
 
Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Graham Dumpleton
 
“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.Graham Dumpleton
 
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.Graham Dumpleton
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSGraham Dumpleton
 
Automated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesAutomated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesGraham Dumpleton
 
PyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating DecoratorsPyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating DecoratorsGraham Dumpleton
 
PyCon US 2013 Making Apache suck less for hosting Python web applications
PyCon US 2013 Making Apache suck less for hosting Python web applicationsPyCon US 2013 Making Apache suck less for hosting Python web applications
PyCon US 2013 Making Apache suck less for hosting Python web applicationsGraham Dumpleton
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
DjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New RelicDjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New RelicGraham Dumpleton
 

More from Graham Dumpleton (11)

Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.
 
Not Tom Eastman
Not Tom EastmanNot Tom Eastman
Not Tom Eastman
 
Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.
 
“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.
 
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaS
 
Automated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesAutomated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and Kubernetes
 
PyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating DecoratorsPyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating Decorators
 
PyCon US 2013 Making Apache suck less for hosting Python web applications
PyCon US 2013 Making Apache suck less for hosting Python web applicationsPyCon US 2013 Making Apache suck less for hosting Python web applications
PyCon US 2013 Making Apache suck less for hosting Python web applications
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
DjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New RelicDjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New Relic
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

PyCon US 2012 - Web Server Bottlenecks and Performance Tuning

  • 1. Web Server Bottlenecks and Performance Tuning Graham Dumpleton PyCon – March 2012 Graham Dumpleton @GrahamDumpleton Starting my PyCon talk. Lets hope I don't loose my voice completely while doing this.
  • 5. Web application Web application Front end time 0.15 seconds 3.1 seconds
  • 6. Performance Golden Rule "80-90% of the end-user response time is spent on the frontend. Start there." http://www.stevesouders.com/blog/2012/02/10/ the-performance-golden-rule/
  • 9. Benchmarks as a tool ✴Web server benchmarks are of more value when used as an exploratory tool to understand how a specific system works, not to compare systems.
  • 10. What about load tests? ✴Hitting a site with extreme load will only show you that it will likely fail under a denial of service attack. ✴Your typical web server load test isn't alone going to help you understand how a web server is contributing to it failing.
  • 11. What should you test? ✴You should use a range of purpose built tests to trigger certain scenarios. ✴Use the tests to explore corner cases and not just the typical use case.
  • 12. Environment factors ✴Amount of memory available. ✴Number of processors available. ✴Use of threads vs processes. ✴Python global interpreter lock (GIL)
  • 13. Client impacts ✴Slow HTTP browsers/clients. ✴Browser keep alive connections.
  • 14. Application requirements ✴Need to handle static assets.
  • 15. Use cases to explore ✴Memory used by web application. ✴Using processes versus threads. ✴Impacts of long running requests. ✴Restarting of server processes. ✴Startup costs and lazy loading.
  • 16. Memory usage 1 process 1 process 1000 threads 1 thread http://nichol.as/benchmark-of-python-web-servers
  • 17. What effects memory use? ✴Web server base memory usage. ✴Web server per thread memory usage. ✴Application base memory usage. ✴Is application loaded prior to forking? ✴Per request transient memory usage.
  • 18. Processes Vs Threads 150 135 120 105 90 75 Processes 60 45 30 15 0 0 15 30 45 60 75 90 105 120 135 150 Threads
  • 19. Apache/mod_wsgi defaults Configuration Max Processes Threads Apache (prefork) mod_wsgi (embedded) 150 1 Apache (worker) mod_wsgi (embedded) 6 25 Apache (prefork) mod_wsgi (daemon) 1 15 Apache (worker) mod_wsgi (daemon) 1 15
  • 20. Other WSGI servers Configuration Max Processes Threads FASTCGI flup (prefork) 50 1 FASTCGI flup (threaded) 1 5 gunicorn 1 1 uWSGI 1 1 tornado 1 1
  • 21. Less than fair 150 135 Apache (prefork) 120 mod_wsgi (embedded) 105 FASTCGI 90 flup (prefork) 75 Apache (prefork/worker) Processes 60 mod_wsgi (daemon) 45 Apache (worker) 30 mod_wsgi (embedded) 15 0 0 15 30 45 60 75 90 105 120 135 150 Threads
  • 22. What to use? ✴Number of overall threads dictated by: ✴Number of concurrent users. ✴Response time for requests. ✴Processes preferred over threads, but: ✴Restricted by amount of memory. ✴Choice influenced by number of processors.
  • 23. Thread utilisation 1 second 6 5 Threads 4 3 2 1
  • 24. Request backlog Backlog occurred and queue time increased to 750 ms 150ms 60 requests Thread utilisation jumped from per second 2.5 to 7.5 and maxed out at 9
  • 25. Processes are better Backlog only started at higher throughput and queue time mostly under 100ms 100ms Thread utilisation only jumped from 75 requests 2.5 to 7.5 at higher throughput per second and didn't actually reach 9
  • 26. CPU bound Bulk of time is from doing things within the process itself
  • 27. I/O wait Waiting on responses from backend services a significant proportion of time
  • 28. Long running requests ✴Complex calculations. ✴Slow backend services. ✴Large file uploads. ✴Large responses. ✴Slow HTTP clients.
  • 29. Varying request times Average: 1385 ms Minimum: 4.7 ms Maximum: 20184 ms Std Dev: 3896 ms
  • 30. Performance breakdown Why is creating the connection to PostgreSQL taking up 40% of overall response time
  • 31. Slow HTTP clients ✴Add nginx as a front end to the WSGI server. ✴Brings the following benefits to the WSGI server. ✴Isolation from slow clients. ✴No need to handle keep alive in the WSGI server. ✴Can offload serving of static files. ✴Can use X-Accel-Redirect for dynamically generated files.
  • 32. Request funnelling nginx front end Apache workers mod_wsgi daemons
  • 34. Forced restarts ✴Triggers for restarts: ✴Manual restart to fix issues/configuration. ✴Maximum number of requests reached. ✴Reloading of new application code. ✴Individual requests block/timeout. ✴Restarts can make things worse.
  • 35. Auto scaling ✴Apache/mod_wsgi embedded mode. ✴Apache prefork MPM defaults. ✴Initial 1 / Maximum 150 ✴Apache worker MPM defaults. ✴Initial 2 / Maximum 6 ✴Auto scaling can make things worse.
  • 36. Pre load everything ✴Start maximum processes up front. ✴Pre load your web application when the process starts and not lazily loaded on the first request. ✴Keep processes persistent in memory and avoid unnecessary restarts.
  • 37. Horizontal scaling ✴Using more servers is fine. ✴Load balance across dedicated hosts. ✴Or add additional hosts as required. ✴Ensure though that if adding more hosts that you have preloaded the web application before directing traffic to it.
  • 38. Monitoring is key ✴Treat your server as a black box and you will never know what is going on inside.
  • 39. Server monitoring ✴Open source tools. ✴Monit ✴Munin ✴Cacti ✴Nagios
  • 40. Python web tools ✴Django debug toolbar. ✴Only useful for debugging a single request in a development setting. ✴Sentry. ✴Useful for capturing runtime errors, but performance issues don't generate exceptions.
  • 43. Summing up ✴Use benchmarks to explore a specific system, not to compare different systems. ✴Don't trust the defaults of any server, you need to tune it for your web application. ✴Monitor your live production systems. ✴New Relic for really deep introspection.
  • 44. Try New Relic ✴ Graham.Dumpleton@gmail.com ✴ http://www.slideshare.net/GrahamDumpleton ✴ Find out more about New Relic: ✴ http://newrelic.com ✴ Extended Pro Trial for PyCon attendees: ✴ http://newrelic.com/30 ✴ Come work for New Relic: ✴ http://newrelic.com/about/jobs

Editor's Notes

  1. \n
  2. If you want to follow along with the slides on your laptop as the talk goes on, you can view them on slideshare.net. Just search for my name.\n
  3. Before we talk about web server performance tuning, it is important to step back and look at the bigger picture. Although newbies especially have an obsession with trying to find the fastest web server, reality is that things are more complicated than that. Systems have many moving parts. The actual per request latency introduced by a web server is very small in relation to time spent in other parts of the system. \n
  4. As far as a user is concerned, the main delays they will notice are those resulting from how long it takes their web browser to render the page returned by the web application. This will be followed up by network delays when talking to the web application, and when grabbing down static assets from media servers or content data networks.\n
  5. The time spent in the web application is therefore a small percentage of the time as perceived by the user for rendering a web page being served. Any time delays introduced by the web server will be a much smaller percentage again.\n
  6. Steve Sounders summaries this disparity between front end time and web application time in what he calls the "Performance Golden Rule". This is that "80-90% of the end-user response time is spent on the frontend". If you are after an easy win for improving end user satisfaction with your web site, the front end is where you should start.\n
  7. Although the big immediate gains may be won in the front end, the web application still presents lots of opportunity to further reduce response times through means other than fiddling with the web server. Improvements can be had in the application code, but also in databases or backend services used by the web application.\n
  8. Is running benchmarks on web servers a complete waste of time then? The answer is yes and no. The sort of benchmarks that are usually published on web sites to compare web servers usually serve little value. They generally only serve to give newbies a false sense of security over any decision they make as to which web server to use. Worse is that people forever reference them as the gospel truth when they can be far from it.\n
  9. The main reason that the typical web server benchmark is useless is that it tests only a single narrow idealised use case. Web servers are implemented using different architectures and using different code. You are better off choosing a web server that you believe has the features you require and then use benchmarks to help explore the behaviour of that system.\n
  10. Often the documented benchmarks you find are nothing more than a hello world program. The test then consists of running it at maximum request throughput with some arbitrary number of concurrent users. This does not mirror what real traffic a public facing web server would receive. It certainly doesn't show what causes the server to fail as load increases, just that it will.\n
  11. What should you test then? There are many different use cases one could test and how any one performs can be dictated by the architecture of the system, how the code was written and how the system is configured when the test is run. The more interesting tests are those which deliberately go out to trigger specific problems. This is because it is the corner cases that are usually going to cause an issue rather than the typical use case.\n
  12. What sort of factors can come into play and affect performance? These are varied and can arise from the hardware or virtualised system being used. They can derive from the configuration you use for the specific web server, but also can be influenced by how the Python language interpreter works. To make it hard, these can all interplay with each other in unexpected ways.\n
  13. Some things can be out of your control altogether, such as the type of web browser and what type of network the traffic between you and the user has to traverse. Very few published benchmarks try to account for these issues in a realistic way.\n
  14. Requirements as dictated by your own web application or how you decided to architect your overall system can also contribute. Such as whether you try and use the same server to serve up static assets.\n
  15. To illustrate how some of these different factors can come into play, I will go through a few specific use cases that present issues in practice and where possible relate them to those factors. These include memory usage, use of processes vs threads, impacts of long running requests, restarting of server processes and startup costs.\n
  16. A simple place to get started is memory usage. This is always a hot topic of contention with benchmarks. It isn't hard to find people claiming that Apache is a big bloated memory hog. This benchmark in particular is representative of a poorly chosen Apache configuration. Of course it will use more memory if you configure it to have 1000 threads. If servers tested aren't set up in a comparable way, you can hardly expect it to be a fair comparison.\n
  17. Actually estimating the overall amount of memory used is not a difficult exercise, it is after all a simple formula which takes into consideration the number of processes, the base memory used by the web server, memory for each additional thread and the application itself. Things get more complicated when one considers per request transient memory, but ignoring that, one can easily visualise what you are dealing with.\n
  18. In short, adding more processes is going to see memory usage grow quicker than adding more threads to existing processes. Although some of that per process base memory usage is the web server, the majority of it will in the end be your fat Python web application. To blame a web server for using too much memory is plain silly when your web application could be using up to 50 times as much memory. The issue is really about what configuration you chose to set the web server up with.\n
  19. What usually happens is that people will blindly use whatever the defaults are for a server. For fat Python web applications which use a lot of memory this can be disastrous. Apache with its prefork MPM can for example dynamically create up to 150 processes. That is potentially 150 copies of your fat Python web application. So of course it will use a lot of memory.\n
  20. Those servers which are generally seen as fairing best as far as memory usage are those whose default configurations use only a single process and single thread. Guess what, if you configure Apache that way then the amount of memory it uses will not be much different. Granted, it does help to also strip unneeded modules out of Apache that you don't use to really get the best from it. \n
  21. So don't start things off by using whatever the default processes/threads configurations are, especially if looking at memory usage. Do so and you can easily get the wrong impression. Also don't pick arbitrary values when you have no idea whether it is reasonable. At this scale a configuration with 1000 threads will not even fit on the chart, would almost be in the next room, and again in the red zone.\n
  22. How many processes and threads should you use then? The total number of threads across all processes is dictated by the number of overlapping concurrent requests. How much overlap there is depends on response times and throughput. Processes are preferred over threads, but constrained by memory. The optimal number can also be dictated by how many processors are available. Gunicorn recommends for example using 2 to 4 the number of processes as you have CPU cores.\n
  23. One can get a feel for how many threads you will need by looking at thread utilisation. That is, how much do the requests take up of the potential capacity. In this example, by adding up the green areas representing the requests coming in over time, we have here a thread utilisation of about 2.0. This means that if all requests were serialised, we would need only two threads. Requests don't arrive in such an orderly fashion though, so we need more threads to ensure they aren't delayed.\n
  24. Because response times are generally quite short, it is actually surprising how few threads you can get away with. If the number of threads is too low and response times or throughput grows though, then thread utilisation will increase. Eventually what happens then is that requests will start to backup as they wait for available threads and queueing time will increase. This will add to the delays that end users see in their total page load time.\n
  25. If we add processes rather than threads we can delay the on set of such problems. The reason that processes work better is that the Python global interpreter lock (GIL) effectively serialises execution within distinct threads of a single process. Adding more processes though obviously means more memory. This has got nothing to do with which server now and a choice you make which is going to be bound by how much memory you have available.\n
  26. If you are memory constrained, finding the right balance and what you can get away with in order to still reduce memory usage is a tricky problem. It is all made harder when you have no idea what is going on inside of your web application. If a web application has a heavy bias towards CPU bound activity within the process, then you are forced towards the direction of needing more processes.\n
  27. If your web application is making lots of call outs to backend services and so threads are blocked waiting on I/O more of the time than not, you can get away with using more threads because the threads aren't competing as much with each other for use of the CPU within the same process. If you have no idea though what your web application is doing, this judgement is going to be a hit and miss affair as far as tuning the processes/threads balance.\n
  28. To make such judgements even harder you also have long running requests to contend with. These can arise due to issues in your own code or backend services, but also due to how much data you are moving around and how slow the HTTP clients are. The basic problem here is that a long running request, because it ties up a thread, will reduce the maximum throughput you could achieve during that period of time.\n
  29. The unpredictability of request times means you need to always ensure you have a good amount of extra capacity in the number of processes/threads allocated. Don't provide sufficient head room and when a number of long running requests coincide you will suddenly find thread availability drops, requests can start backlogging and overall response times as seen by the user will increase.\n
  30. Where your application code or backend service is slow, you obviously need to work out why. Sometimes issues can come from places you least expect them. For example, especially with Django, watch out for how long PostgreSQL database connections take. One thing you can consider in this case is a local external connection pooler such as pgbouncer. \n
  31. If you're using Apache/mod_wsgi or gunicorn, stick nginx in front of it and proxy requests through to your WSGI server. This will make your WSGI server perform better as you will be isolated from slow clients. The threads in the backend will be tied up for less time, meaning lower thread utilisation, thus allowing you to handle a higher throughput with less resources. You can also offload tasks such as static file serving to nginx, which is going to do a better job of it anyway.\n
  32. When introducing a front end, do be careful though of the funnelling effect, especially if the number of concurrent requests that can be handled reduces at each step. If your web application backlogs, users may give up, but requests are still queued and have to be handled. Your web application wastes time and may have trouble catching up with the backlog. It is perhaps better to setup servers so requests time out with a 503 before getting to your web application if you can.\n
  33. Worst case scenario here is a complete overload where the server never really recovers for an extended period or until you can shutdown the server. Request timeouts within the web application where supported can help a bit, but only to throw out long running requests. As already mentioned, you really need to stop the requests getting to the web application if there is no longer a point handling them. Options here vary and solutions available to avoid it aren't always great.\n
  34. You might actually think that doing a restart will solve a problem with backlogged requests. You have to be careful here as well though. For some servers, the listener socket can be preserved, so any backlog there isn't actually cleared. Further, when performing a restart, new processes have to be created and application loaded again. This can take time and cause more requests to backlog. So choose carefully when you restart. To totally reset, it is better to do a full shutdown and clear the backlog.\n
  35. For fat Python web applications with a large startup cost, server configurations which allow for auto scaling can also compound problems. When under load and you get a further throughput spike the server can decide to start more processes. This slows the system down temporarily, causing backlog and if it takes a long time to start processes, the server could decide to start even more processes, increasing system load again, blowing out memory and overloading your whole system.\n
  36. To avoid unexpected surprises, you are better off starting up the maximum number of processes you expect to require or can fit in available memory with your web application loaded. Ensure you pre load your web application when processes start and not lazily when first request arrives. Do everything possible to keep the processes in memory all the time, avoiding restarts. Especially don't use options to restart when some maximum request count is reached.\n
  37. Because the suggestion is that you should preconfigure the server to its maximum capacity at the outset, it does limit the vertical scaling you can do at least within the confines of the same hardware. Next step therefore is horizontal scaling. Keep in mind the same issues about preloading. You don't want to bring on new hosts and direct traffic to them, only for the first requests sent to it to be delayed while the application loads.\n
  38. No matter how you set your system up, if problems do arise, the only way you are going to start to be able to understand what went wrong when it does all crash in a heap is through monitoring. If you treat your system as a black box, how will you know what is going on inside. One thing is for sure, all those benchmarks you may have run to find out what the fastest web server was are not going to help you one bit.\n
  39. Server monitoring tools, although useful, only show you the affect of the problem on the overall system. They don't necessarily provide you that insight of what is going on inside of your web application as they still largely treat your web application and web server like a black box. A deeper level of introspection is required.\n
  40. When we talk about finding out what is happening inside of your Python web application, the options have been limited. Tools such as Django debug toolbar, or the Python profiler are only suited to a development environment. Sentry can be used in production to capture errors, but performance problems aren't going to generate nice exceptions for you.\n
  41. This historical lack of good tools for knowing what is going on inside of your Python web application is why I am loving my current day job. If you had managed to miss it, I am now working at New Relic. New Relic performance monitoring provides the ability to monitor the front end, your web application and the underlying server. I am bringing all that goodness to the world of the Python web. New Relic gives you that deep introspection required to know what is going on.\n
  42. I am of course also the author of mod_wsgi. Being able to get New Relic working with Python means I have been able to use the reporting it provides to delve quite deep into the behaviour of mod_wsgi under different situations. The results have been quite revealing. One of the areas it has helped in understanding is the funnelling effects when using daemon mode. I'll admit there is room for improvement and I will be trying to address some issues in mod_wsgi 4.0.\n
  43. Summing things up. Pick a web server and architecture which seems to meet your requirements, then use benchmarks to evaluate its behaviour. Don't use benchmarks simply to try and compare different systems. Don't trust server defaults. Configure and tune your whole stack based on the results you get from live production monitoring. Try using New Relic for really deep introspection of what is going on in all parts of your system.\n
  44. So, if you are doing Python web application development, do consider giving New Relic a try. If your not sure, New Relic does provide a free trial period where you can try out all the features it has. Even when the trial ends, a free Lite subscription is available which still provides lots of useful information. If you want to work at New Relic then come talk to us. Right now we are looking for for a Python developer in Portland. While you think about how cool that might be, we should have time for questions.\n