4. Tell me more...
● Big Data Ecosystem @ Netflix
● How we built a scalable event pipeline - Keystone - in a year
○ Replaced legacy system without service disruption
○ Small team 8 +1
● Netflix Culture
○ Relevant tenets tagged on the slides
6. Over 75M Members
190 Countries
125M hours/day → 11B hours / quarter
14,269 years / day → 1,255,707 years / quarter
1000+ devices
37% of Internet traffic at peak
7. Netflix Is a Data Driven Company
Content
Product
Marketing
Finance
Business Development
Talent
Infrastructure
←CultureofAnalytics→
15. "It may well be the most important document
ever to come out of the Valley." 1
Sheryl Sandberg
COO, Facebook
1 Business Insider, 2013
16. A NETFLIX ORIGINAL SERVICE
How we built an internal facing 1 trillion / day stream processing
cloud platform in a year, and how culture played a pivotal role
Freedom
&
Responsibility
22. Support at-least-once processing
Scale, Ease of Operations
Replace dormant open source software - Chukwa
Enable future value adds - Stream Processing As a Service
Seamless transition to the new platform
ContextNotControl
23. Migrate Events to a new Pipeline In flight,
while not losing more that 0.1% of them
ContextNotControl
HighlyAligned
LooselyCoupled
26. 1 trillion events ingested per day during holiday season
1+ trillion events processed every day
350 billion a year ago 600+ billion events ingested per day
Keystone - Scale - Streaming
27. 11 million events (24 GB per second) peak
Upto 10MB payload / Avg 4K
1.3 PB / day
Keystone - Scale - Streaming
49. Kafka Auditor - One pre cluster
● Broker monitoring
● Consumer monitoring
● Heart-beat & Continuous message latency
● On-demand Broker performance testing
● Built as a service deployable on single or multiple instances
50. Kafka Cluster Size -Tips
● Per Cluster Stay under 10k partitions & 200 brokers
● Leave approx. 40% free disk space on each broker
51. ● Started with AWS zone aware partition assignments
● We have discovered and filed several bugs
○ Details - Upcoming in Netflix Tech blog
Kafka Contributions
Opensource&
CommunityDriven
68. ● Exposed cost attribution per event producers & topic
○ E.g. one producer reduced throughput by 600%
● Automation - frees up additional resources
Scaling Up by Scaling Down
69. ● No dedicated product or project managers
● No separate devops or operational team
● This does not mean we are constantly overworked
○ we make wise and simple choices and
○ lean towards automation & self-healing systems.
We build and run what you saw today!
YoubuildIt!
Yourunit!
HighPerformance
70. Not DevOps, but move towards NoOps
You build it! You run it!
71. ● High Performance culture
● Communication
● No culture of process adherence
○ Creativity & Self Discipline
○ Freedom and Responsibility
73. Streaming Processing As a Service
● multi-tenant polyglot support of streaming engines like
Spark Streaming, Mantis, Samza, and may be Flink
Future steps
Opensource&
CommunityDriven
74. Messaging As a Service
● Kafka & Others
● Spark Streaming, Mantis, Samza, and may be Flink.
Future steps
Opensource&
CommunityDriven
75. Data thruway
● Support for schemas - registry, discovery, validation.
Self Service Tooling
Future steps
Opensource&
CommunityDriven