Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scaling
Pinterest
Marty Weiner
Cloud Ninja

Yash Nelapati
Ascii Artist

Monday, November 11, 13
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/scaling-pinterest

InfoQ.com:...
Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the sprea...
Evolution

Scaling Pinterest

Monday, November 11, 13
Growth
March 2010
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
March 2010
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

May 2012
Growth
March 2010
Page views per day

·
·
·
·

RackSpace
1 small Web Engine
1 small MySQL DB
1 Engineer + 2 Founders

Mar ...
Growth
March 2010

Scaling Pinterest

Monday, November 11, 13
Growth
March 2010

Scaling Pinterest

Monday, November 11, 13
Growth
January 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012
Growth
January 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012
Growth
January 2011
Page views per day

·

Amazon EC2 + S3 +
CloudFront

·
·
·

1 NGinX, 4 Web Engines
1 MySQL DB + 1 Read...
Scaling Pinterest

Monday, November 11, 13
Growth
September 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012 May 201...
Growth
September 2011
Page views per day

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012 May 201...
Growth
September 2011
Page views per day
·
·

Amazon EC2 + S3 + CloudFront
2 NGinX, 16 Web Engines + 2 API
Engines

·

5 F...
It will fail. Keep it simple.

Scaling Pinterest

Monday, November 11, 13
If you’re the biggest user of a
technology, the challenges will
be greatly amplified

Scaling Pinterest

Monday, November ...
Growth
January 2012

Scaling Pinterest

Monday, November 11, 13
Growth
April 2012
Page views per day

Mar 2010

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

...
Growth
April 2012
Page views per day

Mar 2010

Mar 2010

Scaling Pinterest

Monday, November 11, 13

Jan 2011

Jan 2012

...
Growth
April 2012
Page views per day
·
·
·
·

Amazon EC2 + S3 + Edge Cast
135 Web Engines + 75 API Engines
10 Service Inst...
Growth
April 2012
Page views per day
·

12 Engineers

·
·
·
·
·

1 Data Infrastructure
1 Ops
2 Mobile
8 Generalists

10 No...
Scaling Pinterest

Scaling Pinterest

Monday, November 11, 13
Growth
April 2013
Page views per day

April 2012

Scaling Pinterest

Monday, November 11, 13

April 2013
Growth
April 2013
Page views per day

April 2012

Scaling Pinterest

Monday, November 11, 13

April 2013
Growth
April 2013
·
·

Page views per day

Amazon EC2 + S3 + Edge Cast
400+ Web Engines + 400+ API
Engines

·

70+ MySQL D...
Growth
April 2013
·
·

Page views per day

Amazon EC2 + S3 + Edge Cast
400+ Web Engines + 400+ API
Engines

·

70+ MySQL D...
Growth
April 2013
·

65+ Engineers

·
·
·
·
·
·
·
·
·
·

Page views per day

7 Data Infrastructure + Science
7 Search and ...
Scaling Pinterest

Monday, November 11, 13
Technologies

Scaling Pinterest

Monday, November 11, 13
Arch
Overview

ELB

Puppet
StatsD

Routing & Filtering
(Varnish)
Task Queue
(Redis)

Web App
(Python)

API App
(Python / J...
Data
Pipeline

Web App
(Python)

API App
(Python)

Task Processing
(Python/Pyres)

Kafka

S3 Copier

Tripwire (Spam)

S3

...
Web App
NGinX

Website Rendering (x8)
(Python / JS / HTML)

API

Scaling Pinterest

Monday, November 11, 13
Our MySQL Sharding?
http://www.infoq.com/presentations/Pinterest

Scaling Pinterest

Monday, November 11, 13
Choosing
Your
Tech

Questions to ask
• Does it meet your needs?
• How mature is the product?
• Is it commonly used? Can yo...
Hosting

Why Amazon Web Services (AWS)?
• Variety of servers running Linux
• Very good peripherals: load balancing, DNS, m...
Hosting

Why Amazon Web Services (AWS)?
• Variety of servers running Linux
• Very good peripherals: load balancing, DNS, m...
Hosting

AWS Usage
• Route 53 for DNS
• ELB for 1st tier load balance
• EC2 Ubuntu Linux
• Varnish layer
• All web, API, b...
Code

Why Python?
• Extremely mature
• Well known and well liked
• Solid active community
• Very good libraries specifical...
Code

Why Python?
• Extremely mature
• Well known and well liked
• Solid active community
• Very good libraries specifical...
Code

Python Usage
• All web backend, API, and related business logic
• Most services

Scaling Pinterest

Monday, November...
Code

Python Usage
• All web backend, API, and related business logic
• Most services

Java and Go Usage
• Varnish plugins...
Production
Data

Why MySQL and Memcache?
• Extremely mature
• Well known and well liked
• (MySQL) Rarely catastrophic loss...
Production
Data

MySQL and Memcache Usage
• Storage / Caching of core data
• Users, boards, pins, comments, domains
• Mapp...
Production
Data

Why Redis?
• Well known and well liked
• Active community
• Consistently good performance
• Variety of co...
Production
Data

Redis Usage
• Follower data
• Configurations
• Public feed pin IDs
• Caching of various core mappings (e....
Production
Data

Why HBase?
• Small, but growing loyal community
• Difficult to hire for, but...
• Non-volatile, O(1), ext...
Production
Data

HBase Usage
• User feeds (pin IDs are pushed to feeds)
• Rich pin details
• Spam features
• User relation...
Production
Data

What happened to Cassandra,
Mongo, ES, and Membase?
• Does it meet your needs?
• How mature is the produc...
A 2nd chance...

Scaling Pinterest

Monday, November 11, 13
A 2nd
Chance

Stuff we could have done better
• Logging on day 1 (StatsD, Kafka, Map Reduce)
• Log every request, event, si...
A 2nd
Chance

Stuff we could have done better
• Shard our MySQL storage much earlier
• Once you start relying on read slave...
A 2nd
Chance

Stuff we could have done better
• A/B testing earlier
• Decider on top of Zookeeper WATCH
• Progressive roll ...
What’s
next?

Looking Forward
• Continually improve Pinner experience
• Help Pinners discover more of the things they love...
Have fun

Scaling Pinterest

Monday, November 11, 13
marty@pinterest.com
pinterest.com/martaaay

Monday, November 11, 13

yashh@pinterest.com
pinterest.com/yashh
marty@pinterest.com
pinterest.com/martaaay

Monday, November 11, 13

yashh@pinterest.com
pinterest.com/yashh
My 2nd
Chance

If I could do it all over again...
• Stronger ACID transactional guarantees across multiple
systems
• Curre...
My 2nd
Chance

Transactional tasks
• All tasks become a dependency tree of repeatable
synchronous or asynchronous actions
...
My 2nd
Chance

Transactional tasks
• All tasks become a dependency tree of repeatable
synchronous or asynchronous actions
...
My 2nd
Chance

Transactional tasks example
• Pin create sync
• Write empty pin object
• Write pin ID to board, likes, user...
My 2nd
Chance

Transactional tasks example
• Pin create async
• Write pin to required user feeds and public feeds
• Feeds ...
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/scalingpinterest
Upcoming SlideShare
Loading in …5
×

Scaling Pinterest

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1awkL99.

Details on Pinterest's architeture, its systems -Pinball, Frontdoor-, and stack - MongoDB, Cassandra, Memcache, Redis, Flume, Kafka, EMR, Qubole, Redshift, Python, Java, Go, Nutcracker, Puppet, etc. Filmed at qconsf.com.

Yash Nelapati is an infrastructure engineer at Pinterest where he focusses on scalability, capacity planning and architecture. Prior to Pinterest he was into web development and rapidly prototyping UI. Marty Weiner joined Pinterest in early 2011 as the 2nd engineer. Previously worked at Azul Systems as a VM engineer focused on building/improving the JIT compilers in HotSpot.

  • Be the first to comment

Scaling Pinterest

  1. 1. Scaling Pinterest Marty Weiner Cloud Ninja Yash Nelapati Ascii Artist Monday, November 11, 13
  2. 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /scaling-pinterest InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  3. 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Evolution Scaling Pinterest Monday, November 11, 13
  5. 5. Growth March 2010 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  6. 6. Growth March 2010 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  7. 7. Growth March 2010 Page views per day · · · · RackSpace 1 small Web Engine 1 small MySQL DB 1 Engineer + 2 Founders Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  8. 8. Growth March 2010 Scaling Pinterest Monday, November 11, 13
  9. 9. Growth March 2010 Scaling Pinterest Monday, November 11, 13
  10. 10. Growth January 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012
  11. 11. Growth January 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012
  12. 12. Growth January 2011 Page views per day · Amazon EC2 + S3 + CloudFront · · · 1 NGinX, 4 Web Engines 1 MySQL DB + 1 Read Slave 1 Task Queue + 2 Task Processors · · 1 MongoDB 2 Engineers + 2 Founders Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012
  13. 13. Scaling Pinterest Monday, November 11, 13
  14. 14. Growth September 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  15. 15. Growth September 2011 Page views per day Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  16. 16. Growth September 2011 Page views per day · · Amazon EC2 + S3 + CloudFront 2 NGinX, 16 Web Engines + 2 API Engines · 5 Functionally Sharded MySQL DB + 9 read slaves · · 4 Cassandra Nodes 15 Membase Nodes (3 separate clusters) · · · · · · 8 Memcache Nodes 10 Redis Nodes 3 Task Routers + 4 Task Processors 4 Elastic Search Nodes 3 Mongo Clusters 3 Engineers (8 Total) Scaling Pinterest Monday, November 11, 13 Mar 2010 Jan 2011 Jan 2012 May 2012
  17. 17. It will fail. Keep it simple. Scaling Pinterest Monday, November 11, 13
  18. 18. If you’re the biggest user of a technology, the challenges will be greatly amplified Scaling Pinterest Monday, November 11, 13
  19. 19. Growth January 2012 Scaling Pinterest Monday, November 11, 13
  20. 20. Growth April 2012 Page views per day Mar 2010 Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  21. 21. Growth April 2012 Page views per day Mar 2010 Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  22. 22. Growth April 2012 Page views per day · · · · Amazon EC2 + S3 + Edge Cast 135 Web Engines + 75 API Engines 10 Service Instances 80 MySQL DBs (m1.xlarge) + 1 slave each · · · 110 Redis Instances 60 Memcache Instances 2 Redis Task Manager + 60 Task Mar 2010 Processors · 3rd party sharded Solr Scaling Pinterest Monday, November 11, 13 Mar 2010 Jan 2011 Jan 2012 May 2012
  23. 23. Growth April 2012 Page views per day · 12 Engineers · · · · · 1 Data Infrastructure 1 Ops 2 Mobile 8 Generalists 10 Non-Engineers Mar 2010 Mar 2010 Scaling Pinterest Monday, November 11, 13 Jan 2011 Jan 2012 May 2012
  24. 24. Scaling Pinterest Scaling Pinterest Monday, November 11, 13
  25. 25. Growth April 2013 Page views per day April 2012 Scaling Pinterest Monday, November 11, 13 April 2013
  26. 26. Growth April 2013 Page views per day April 2012 Scaling Pinterest Monday, November 11, 13 April 2013
  27. 27. Growth April 2013 · · Page views per day Amazon EC2 + S3 + Edge Cast 400+ Web Engines + 400+ API Engines · 70+ MySQL DBs (hi.4xlarge on SSDs) + 1 slave each · · · 100+ Redis Instances 230+ Memcache Instances 10 Redis Task Manager + 500 Task Processors · 65+ Engineers (130+ total) April 2012 Scaling Pinterest Monday, November 11, 13 April 2013
  28. 28. Growth April 2013 · · Page views per day Amazon EC2 + S3 + Edge Cast 400+ Web Engines + 400+ API Engines · 70+ MySQL DBs (hi.4xlarge on SSDs) + 1 slave each · · · 100+ Redis Instances 230+ Memcache Instances 10 Redis Task Manager + 500 Task Processors · · · · · · · 65+ Engineers (130+ total) 8 services (80 instances) Sharded Solr 20 HBase 12 Kafka + Azkabhan 8 Zookeeper Instances 12 Varnish Scaling Pinterest Monday, November 11, 13 April 2012 April 2013
  29. 29. Growth April 2013 · 65+ Engineers · · · · · · · · · · Page views per day 7 Data Infrastructure + Science 7 Search and Discovery 9 Business and Platform 6 Spam, Abuse, Security 9 Web 9 Mobile 2 growth 10 Infrastructure 6 Ops 65+ Non-Engineers Scaling Pinterest Monday, November 11, 13 April 2012 April 2013
  30. 30. Scaling Pinterest Monday, November 11, 13
  31. 31. Technologies Scaling Pinterest Monday, November 11, 13
  32. 32. Arch Overview ELB Puppet StatsD Routing & Filtering (Varnish) Task Queue (Redis) Web App (Python) API App (Python / JS / HTML) Monit Sensu Task Processing (Python/Pyres) All connection pairings managed by ZooKeeper MySQL Service (Java/Finagle) Images (S3 + CDN) Scaling Pinterest Monday, November 11, 13 Memcache Mux (Nutcracker) Sharded MySQL Memcache Follower Service (Python/Thrift) Feed Service (Python/Thrift) Redis Search Service (Python/Thrift) HBase Spam Service (Python/Thrift)
  33. 33. Data Pipeline Web App (Python) API App (Python) Task Processing (Python/Pyres) Kafka S3 Copier Tripwire (Spam) S3 Qubole Pinball Scaling Pinterest Monday, November 11, 13 Redshift
  34. 34. Web App NGinX Website Rendering (x8) (Python / JS / HTML) API Scaling Pinterest Monday, November 11, 13
  35. 35. Our MySQL Sharding? http://www.infoq.com/presentations/Pinterest Scaling Pinterest Monday, November 11, 13
  36. 36. Choosing Your Tech Questions to ask • Does it meet your needs? • How mature is the product? • Is it commonly used? Can you hire people who have used it? • Is the community active? • How robust is it to failure? • How well does it scale? Will you be the biggest user? • Does it have a good debugging tools? Profiler? Backup software? • Is the cost justified? Scaling Pinterest Monday, November 11, 13
  37. 37. Hosting Why Amazon Web Services (AWS)? • Variety of servers running Linux • Very good peripherals: load balancing, DNS, map reduce, basic security, and more • Good reliability • Very active dev community • Not cheap, but... Scaling Pinterest Monday, November 11, 13
  38. 38. Hosting Why Amazon Web Services (AWS)? • Variety of servers running Linux • Very good peripherals: load balancing, DNS, map reduce, basic security, and more • Good reliability • Very active dev community • Not cheap, but... • New instances ready in seconds Scaling Pinterest Monday, November 11, 13
  39. 39. Hosting AWS Usage • Route 53 for DNS • ELB for 1st tier load balance • EC2 Ubuntu Linux • Varnish layer • All web, API, background appliances • All services • All databases and caches • S3 for images, logs Scaling Pinterest Monday, November 11, 13
  40. 40. Code Why Python? • Extremely mature • Well known and well liked • Solid active community • Very good libraries specifically targeted to web development • Effective rapid prototyping • Open Source Scaling Pinterest Monday, November 11, 13
  41. 41. Code Why Python? • Extremely mature • Well known and well liked • Solid active community • Very good libraries specifically targeted to web development • Effective rapid prototyping • Open Source Some Java and Go... • Faster, lower variance response time Scaling Pinterest Monday, November 11, 13
  42. 42. Code Python Usage • All web backend, API, and related business logic • Most services Scaling Pinterest Monday, November 11, 13
  43. 43. Code Python Usage • All web backend, API, and related business logic • Most services Java and Go Usage • Varnish plugins • Search indexers • High frequency services (e.g., MySQL service) Scaling Pinterest Monday, November 11, 13
  44. 44. Production Data Why MySQL and Memcache? • Extremely mature • Well known and well liked • (MySQL) Rarely catastrophic loss of data • Response time to request rate increases linearly • Very good software support: XtraBackup, Innotop, Maatkit • Solid active community • Open Source Scaling Pinterest Monday, November 11, 13
  45. 45. Production Data MySQL and Memcache Usage • Storage / Caching of core data • Users, boards, pins, comments, domains • Mappings (e.g., users to boards, user likes, repin info) • Legal compliance data Scaling Pinterest Monday, November 11, 13
  46. 46. Production Data Why Redis? • Well known and well liked • Active community • Consistently good performance • Variety of convenient and efficient data structures • 3 Flavors of Persistence: Now, Snapshot, Never • Open Source Scaling Pinterest Monday, November 11, 13
  47. 47. Production Data Redis Usage • Follower data • Configurations • Public feed pin IDs • Caching of various core mappings (e.g., board to pins) Scaling Pinterest Monday, November 11, 13
  48. 48. Production Data Why HBase? • Small, but growing loyal community • Difficult to hire for, but... • Non-volatile, O(1), extremely fast and efficient storage • Strong Hadoop integration • Consistently good performance • Used by Facebook (bigger than us) • Seems to work well • Open Source Scaling Pinterest Monday, November 11, 13
  49. 49. Production Data HBase Usage • User feeds (pin IDs are pushed to feeds) • Rich pin details • Spam features • User relationships to pins Scaling Pinterest Monday, November 11, 13
  50. 50. Production Data What happened to Cassandra, Mongo, ES, and Membase? • Does it meet your needs? • How mature is the product? • Is it commonly used? Can you hire people who have used it? • Is the community active? Can you get help? • How robust is it to failure? • How well does it scale? Will you be the biggest user? • Does it have a good debugging tools? Profiler? Backup software? • Is the cost justified? Scaling Pinterest Monday, November 11, 13
  51. 51. A 2nd chance... Scaling Pinterest Monday, November 11, 13
  52. 52. A 2nd Chance Stuff we could have done better • Logging on day 1 (StatsD, Kafka, Map Reduce) • Log every request, event, signup • Basic analytics • Recovery from data corruption or failure • Alerting on day 1 Scaling Pinterest Monday, November 11, 13
  53. 53. A 2nd Chance Stuff we could have done better • Shard our MySQL storage much earlier • Once you start relying on read slaves, start the timebomb countdown • We also fell into the NoSQL trap (Membase, Cassandra, Mongo, etc) • Pyres for background tasks day 1 • Hire technical operations eng earlier • Chef / Puppet earlier • Unit testing earlier (Jenkins for builds) Scaling Pinterest Monday, November 11, 13
  54. 54. A 2nd Chance Stuff we could have done better • A/B testing earlier • Decider on top of Zookeeper WATCH • Progressive roll out • Kill switches Scaling Pinterest Monday, November 11, 13
  55. 55. What’s next? Looking Forward • Continually improve Pinner experience • Help Pinners discover more of the things they love • Better uptime and lower latency • Faster development times • Reduce spam and abuse • Continually improve collaboration and build bigger, better, faster products • 180 Pinployees and beyond Scaling Pinterest Monday, November 11, 13
  56. 56. Have fun Scaling Pinterest Monday, November 11, 13
  57. 57. marty@pinterest.com pinterest.com/martaaay Monday, November 11, 13 yashh@pinterest.com pinterest.com/yashh
  58. 58. marty@pinterest.com pinterest.com/martaaay Monday, November 11, 13 yashh@pinterest.com pinterest.com/yashh
  59. 59. My 2nd Chance If I could do it all over again... • Stronger ACID transactional guarantees across multiple systems • Currently have: sometimes A, best effort C, I, D, no silent failure • Want: sometimes A, eventual C, I, D, no silent completion Scaling Pinterest Monday, November 11, 13
  60. 60. My 2nd Chance Transactional tasks • All tasks become a dependency tree of repeatable synchronous or asynchronous actions • All actions must be repeatable • Otherwise, must add repeatability • All tasks get a unique transaction number • Counters are tricky Scaling Pinterest Monday, November 11, 13
  61. 61. My 2nd Chance Transactional tasks • All tasks become a dependency tree of repeatable synchronous or asynchronous actions • Sync actions are executed in order • Async actions are executed in any order • Repeat until successful or too many failures • Too many failures -> put in per task failure queue • Gives eventual C, I, D • No silent completion and A require extra effort Scaling Pinterest Monday, November 11, 13
  62. 62. My 2nd Chance Transactional tasks example • Pin create sync • Write empty pin object • Write pin ID to board, likes, user’s pins, clear caches • Write pin object • Pin not shown until pin object created -> Atomicity! Scaling Pinterest Monday, November 11, 13
  63. 63. My 2nd Chance Transactional tasks example • Pin create async • Write pin to required user feeds and public feeds • Feeds are sorted sets. Reinsertion is okay. • Send emails, Facebook Likes, Twitter Tweets • Before send, check / record in temporary storage -> Gives (temporary) repeatability Scaling Pinterest Monday, November 11, 13
  64. 64. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/scalingpinterest

×