SlideShare a Scribd company logo
1 of 106
Download to read offline
Metrics-Driven Engineering

Mike Brittain        @ mikebrittain
Director of engineering, Infrastructure

                                          October 13, 2011
Tools and Process at Etsy
How many new visits?
  How many listings created?
  How many registrations?
How do people use Etsy?
  How many convos sent?
    How many purchases?
     How many new shops?
Search indexing?
     How fast are pages generating?
   Async tasks currently in queue?
What is the application doing?
 Developer API auth and rate limiting?
       Images resized and stored?
          Error and warning rates?
Replication slave lag?
       Memcache hits/misses?
       Available connections?
Are the servers in good shape ?
    Database queries per second?
       Total outgoing bandwidth?
            CPU, Memory, I/O?
Business Metrics
Application Metrics
System Metrics
Visibility EVERYWHERE
Constant Change
$314 Million GMS 2010
  $180 Million GMS 2009
  $87 Million GMS 2008

  $26 Million GMS 2007




credit: pentarux (flickr)
25 Million Unique Visitors
  1 Billion page views per month




credit: pentarux (flickr)
Engineering team grew 500%
                        over 18 months


credit: martin_heigan (flickr)
Less talk, more do.
Always Be Shipping



credit: ibailemon (flickr)
Always Be Shipping
                             (even if it’s your first day)




credit: ibailemon (flickr)
90+ Engineers
                     40+ Deploys / day

credit: misswired (flickr)
credit: digidave (flickr)
Code Reviews
Automated Tests
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
Plus “admin-only,” percentage ramp-up, A/B testing,
whitelists, blacklists, etc...
Failure is not an option
inevitable!
Failure is not an option
inevitable!
Failure is not an option
            a learning opportunity!
inevitable!
Failure is not an option
            a learning opportunity!
     DETECTABLE!
Access
Detect problems quickly
CONFIDENCE
A:    Well, the Ops team manages the network, racks
     the servers, installed the monitoring tools, wears
                the pagers, blah, blah, blah...
Engineers build the application
Logging
      Graphing
OPS              ENG
      Trending
      Alerting
“Engineers are too busy writing
  features to build metrics.”
Metrics are part of every feature
        ...and so are config flags
Dead Simple
Simple, open source tools
Cacti (network, SNMP)
Ganglia (machines)
Graphite (application)
Splunk (log analysis, nightly reports)
Nagios (alerting)
                             Logging
                             Logster
                               StatsD
Ganglia
Ganglia
Cluster-oriented
Huge community contributed recipes
Custom metrics (gmetad)
Graphite
Graphite
                            Single-instance
              Create new metrics on-the-fly
   Customize via URLs and display functions
Logging
It’s 2:48 PM.
Do you know where your
       logs are?
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
LogFormat "%h %l %u %t "%r" %>s %b"
                common
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
apache_note()
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
grep "/listing/" access.log | 
awk '{sum=sum+$(NF-2)} END {print sum/NR}'
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0034   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0055   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002   [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
web0089   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020   [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
web1101   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087   [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
web0002   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0201   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0355   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0052   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0066   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
Logster
Fatals       Errors   Warnings
Logster
Run by cron
Keeps a cursor on your log file
Aggregate lines anyway you want
Output to Ganglia or Graphite
Simple parsers
                                  github.com/etsy
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
^.+ [.+] [(?P<log_level>.+)]
if (fields['log_level'] == “fatal”):
   self.fatals += 1

elif (fields['log_level'] == “error”):
   self.errors += 1

elif (fields['log_level'] == “warning”):
   self.warnings += 1

...
MetricObject("fatals",
  (self.fatals / self.duration), "per sec")

MetricObject("errors",
  (self.errors / self.duration), "per sec")

MetricObject("warning",
  (self.warnings / self.duration), "per sec")
Fatals   Errors   Warnings
StatsD
StatsD
                           Network daemon (node.js)
                               Accepts data over UDP
                      Flushes to Graphite every 10 sec
                                     One-line of code
github.com/etsy
StatsD::increment("logins.success");
StatsD::increment("logins.success");




                                  logins
StatsD::timing("gearman.time", $msec);
StatsD::timing("gearman.time", $msec);



                                 90th pct

                                 average

                                 lower
Ad hoc
name value timestamp
echo "events.deploy.site 1 `date +%s`" 
     | nc graphite.etsycorp.com 2003
Vertical Line Technology!
target=drawAsInfinite(events.deploy.site)
We could stare at graphs all day...
http://graphite/render?
   from=-1hours&width=600&height=200
&target=webs.errorLog.warning&rawData=1
http://graphite/render?
       from=-1hours&width=600&height=200
    &target=webs.errorLog.warning&rawData=1

webs.errorLog.warning,1318444930,1318448530,60|
5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,
1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,
1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.
0,1.0,1.0,None
Holt-Winters Confidence Bands

upper

         lower
Holt-Winters Aberration
Business metrics
 + Confidence bands
_____________
    Alertable metrics
40,000+ metrics at Etsy
  Systems, Applications, Business
Dashboards
Dashboards
Kind of Hard :-/
<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or
+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
     <img src="http://graphite.etsycorp.com/render?
from=-1hours&width=280&height=220&title=File+or+Script+Not
+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
</a>
Super Easy!
$g = new Graphite($time);
$g->setTitle('File Not Found');
$g->addMetric('webs.errorLog.notExist', '#00cc00');
echo $g->getDashboardHTML(280, 220);
Metrics!
Metrics!
Metrics + Events
Metrics!
Metrics + Events
Metrics + Alerts
Metrics!
Metrics + Events
Metrics + Alerts
Metrics + Metrics
High-level, real-time visibility
Detect problems quickly
CONFIDENCE
Make them required features
Make them dead simple
Make them accessible
Make them!
Homework
codeascraft.etsy.com
github.com/etsy                      Get in touch
                                     mike @ etsy . com
We’re always looking for people         @ mikebrittain
who are interested in this kind of
stuff...



Thank You
etsy.com/careers
Metrics-Driven Engineering

More Related Content

What's hot

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Iakiv Kramarenko
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Michelangelo van Dam
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Codemotion
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETBen Hall
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with ReactFITC
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaExoLeaders.com
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - ENIakiv Kramarenko
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...apidays
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moondavejohnson
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012Nicholas Zakas
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybaraIakiv Kramarenko
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to comePablo Enfedaque
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...apidays
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitPeter Wilcsinszky
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajaxdavejohnson
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsSteve Wallin
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQueryhowlowck
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Stephan Hochdörfer
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Stephan Hochdörfer
 

What's hot (20)

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NET
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with React
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with Karma
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
 
KISS Automation.py
KISS Automation.pyKISS Automation.py
KISS Automation.py
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to come
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnit
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajax
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQuery
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010
 

Viewers also liked

How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNMike Brittain
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCMike Brittain
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMatt Graham
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsMike Brittain
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and TrendingMike Brittain
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and ResilienceMike Brittain
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackDaniel Schauenberg
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsMike Brittain
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightRoss Snyder
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2Paul Adams
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker, Inc.
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyMike Brittain
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018Brian Solis
 

Viewers also liked (15)

Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYC
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EE
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
 

Similar to Metrics-Driven Engineering

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software DesignPatrick McKenzie
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Chris Alfano
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTPMustafa TURAN
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wyciekówKonrad Kokosa
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009Robbie Cheng
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active CacheChing Yi Chan
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Librariesjeresig
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyonddion
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3masahiroookubo
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationBrian Hogg
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"LogeekNightUkraine
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointGeoff Varosky
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Neo4j
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translationwcto2017
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...Fwdays
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 

Similar to Metrics-Driven Engineering (20)

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software Design
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active Cache
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyond
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for Translation
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePoint
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translation
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Metrics-Driven Engineering

  • 1. Metrics-Driven Engineering Mike Brittain @ mikebrittain Director of engineering, Infrastructure October 13, 2011
  • 3. How many new visits? How many listings created? How many registrations? How do people use Etsy? How many convos sent? How many purchases? How many new shops?
  • 4. Search indexing? How fast are pages generating? Async tasks currently in queue? What is the application doing? Developer API auth and rate limiting? Images resized and stored? Error and warning rates?
  • 5. Replication slave lag? Memcache hits/misses? Available connections? Are the servers in good shape ? Database queries per second? Total outgoing bandwidth? CPU, Memory, I/O?
  • 11.
  • 12. $314 Million GMS 2010 $180 Million GMS 2009 $87 Million GMS 2008 $26 Million GMS 2007 credit: pentarux (flickr)
  • 13. 25 Million Unique Visitors 1 Billion page views per month credit: pentarux (flickr)
  • 14. Engineering team grew 500% over 18 months credit: martin_heigan (flickr)
  • 16. Always Be Shipping credit: ibailemon (flickr)
  • 17. Always Be Shipping (even if it’s your first day) credit: ibailemon (flickr)
  • 18.
  • 19. 90+ Engineers 40+ Deploys / day credit: misswired (flickr)
  • 23. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly
  • 24. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly Plus “admin-only,” percentage ramp-up, A/B testing, whitelists, blacklists, etc...
  • 25. Failure is not an option
  • 27. inevitable! Failure is not an option a learning opportunity!
  • 28. inevitable! Failure is not an option a learning opportunity! DETECTABLE!
  • 30.
  • 31.
  • 32.
  • 35.
  • 36. A: Well, the Ops team manages the network, racks the servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  • 37. Engineers build the application
  • 38. Logging Graphing OPS ENG Trending Alerting
  • 39. “Engineers are too busy writing features to build metrics.”
  • 40. Metrics are part of every feature ...and so are config flags
  • 43. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Logging Logster StatsD
  • 45. Ganglia Cluster-oriented Huge community contributed recipes Custom metrics (gmetad)
  • 47. Graphite Single-instance Create new metrics on-the-fly Customize via URLs and display functions
  • 49. It’s 2:48 PM. Do you know where your logs are?
  • 50. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 51. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 52. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 53. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 54. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 55. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 56. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 57. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 58. LogFormat "%h %l %u %t "%r" %>s %b" common
  • 59. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 61. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 62. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 63. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 64. grep "/listing/" access.log | awk '{sum=sum+$(NF-2)} END {print sum/NR}'
  • 65. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  • 66. Logster Fatals Errors Warnings
  • 67. Logster Run by cron Keeps a cursor on your log file Aggregate lines anyway you want Output to Ganglia or Graphite Simple parsers github.com/etsy
  • 68. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 70. if (fields['log_level'] == “fatal”): self.fatals += 1 elif (fields['log_level'] == “error”): self.errors += 1 elif (fields['log_level'] == “warning”): self.warnings += 1 ...
  • 71. MetricObject("fatals", (self.fatals / self.duration), "per sec") MetricObject("errors", (self.errors / self.duration), "per sec") MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 72. Fatals Errors Warnings
  • 74. StatsD Network daemon (node.js) Accepts data over UDP Flushes to Graphite every 10 sec One-line of code github.com/etsy
  • 78. StatsD::timing("gearman.time", $msec); 90th pct average lower
  • 79. Ad hoc name value timestamp
  • 80. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 82.
  • 83. We could stare at graphs all day...
  • 84. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1
  • 85. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1 webs.errorLog.warning,1318444930,1318448530,60| 5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0, 1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0, 1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5. 0,1.0,1.0,None
  • 88. Business metrics + Confidence bands _____________ Alertable metrics
  • 89. 40,000+ metrics at Etsy Systems, Applications, Business
  • 92. Kind of Hard :-/ <a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or +Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a>
  • 93. Super Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); echo $g->getDashboardHTML(280, 220);
  • 97. Metrics! Metrics + Events Metrics + Alerts Metrics + Metrics
  • 101. Make them required features
  • 102. Make them dead simple
  • 105. Homework codeascraft.etsy.com github.com/etsy Get in touch mike @ etsy . com We’re always looking for people @ mikebrittain who are interested in this kind of stuff... Thank You etsy.com/careers