SlideShare a Scribd company logo
1 of 45
Download to read offline
Nitin Sharma
Real-Time Impression Store @ Netflix
Sep 2018
● Recommendations @ Netflix
● Distributed Impression Store
● Distributed Processing Infrastructure
Agenda
Recommendations @ Netflix
● 129M+ active members
● 190+ countries
● Increasing launch of originals
● 2 Trillion kafka events/day
Recommendations @Scale
● Signals:
○ Impressions, Plays, View History
○ Searches, My Lists etc.
● Compute:
○ Offline Compute
○ Online Compute
Netflix Recommendation Signals
Impression Store
● Raw impressions:
○ Large dataset (PBs every day)
○ 10s of trillions of rows
○ $$$$
● Lack of signal:
○ Too noisy
○ Custom aggregation - expensive
Storage Philosophy
● Impression fatigue , Familiarity effect
● Under vs over Impressing
● Use in Recommendations - Score Online
Goals
● Want: I want all impressions
● Need:
○ Meaningful way to capture impressions
○ (W/O storing full impressions)
● Idea:
○ EMA Counts # of times - <User, Video> over
multiple decay windows.
How many times has a show been recommended
to you?
● Representation:
○ Exponential Moving Average (EMA) Score
○ Rate of impressions over a given window
○ EMA score over Windows:
■ <user, video, location> => <last_seen_ts, [1d, 1w, 1m, 6m, 1y]>
■ E.g <a, narcos, toprow> => <today, [0.9, 0.2, 0, 0, 0]>
● Benefits:
○ Better Signal
○ Memory Footprint
○ Extensible
Data model Definition - EMA
EMA Visualization
Scale & SLA
Dimension Attribute Scale
Storage
Videos * locations * users ~600B + entries
Raw Data size 150+TB
Indexing Updates
Bulk 600B+ entries < 1.5 hours
Real time 50M entries < 5mins
Query 99th Percentile Latency < 10s of ms
Patterns Read/Update/Write
Operations
Cost $$ >> $$$$
Availability 99.99% [~4m downtime/30
days]
Distributed Data Store
Observation
Attributes Stringent Flexible
Predictably faster Reads
Bulk + RT writes
Strict Schema
Secondary Index
Easy Operations
Ser/Deser at Read (aka
blob storage)
Cache with props of a
DataStore?
● Format: <K,V> storage & Schema-less
● Underlying: Memcached + Cross Region support
● Storage: In memory + Ext Storage
Evcache
Architecture
Architecture
Server
EVCar
Application
Client Library
Client
Client Side Writing (set, delete, add, etc.)
us-west-2a us-west-2cus-west-2b
ClientClient Client
Client Side Reading (get)
us-west-2a us-west-2cus-west-2b
Client
Primary Secondary
● In Memory:
○ $$$$
○ No persistence
○ Lack of consistency - Node crash etc
● Layered Storage?
But wait...
● Scenario:
○ 10% of active items => 90% of hits
○ Large values eat up RAM
● Philosophy:
○ LRU <k,v> on disk; Index in RAM
○ MRU key + value in RAM
○ Values => NVMe; SSD
Ext Storage
Ext Storage
Ext Storage
● Effects:
○ Reduce overall RAM
○ Reduce Server Count
○ Increase overall Cache size
Cost : 1 TB of storage space
EVCache - RAM Only
20 r4.2xlarge
EVCache - Flash
1 i3.xlarge
Network Bandwidth : ~40 Gbps Network Bandwidth : ~2.5 Gbps
● Cross Region Replication
● Backup & Recovery
Data Store (ish) Features
Region BRegion A
APP APP
Repl Proxy
Repl Relay
1 mutate
2 send
metadata
3 poll msg
5
https send
m
sg
6 mutate4
get data
for set
Kafka Repl Relay Kafka
Repl Proxy
Cross-Region Replication
7 read
● Cross Region:
○ Relay: Event forward through kafka
○ Proxy: Event receiver and mutate cache
■ Stateless
■ Easy Throttle
● CAS:
■ Client-> Replica
■ Client -> Kafka
■ Write Retries (by proxy) until replica is consistent.
Characteristics
● Client Side
○ Writes: All replicas in a region
○ Compression: Gzip
○ Reads:
■ Quorum - Latches
■ Consistency All
● Server Side
○ Writes: Cross regions [replication]
○ Reads: Basic read repairs [Compare & Set]
○ Storage: Mem & External
Client & Server Responsibilities
Backup & Restore Architecture
Cache Warmer
(Spark)
Application
Client Library
Client
Control
S3
Data Flow
Metadata Flow
Control Flow
Data Representation
● Protobuf
● EMA Payload:
○ (last_seen_ts, Float[1d, 1w, 1m, 6m, 1y])
● Overall:
○ Map<VideoId,Map<locationId,EMA>>
EMA Representation
Client Side Write Trade-offs
ClientClient
● Writes:
○ Read->De-ser -> Update -> Ser -> Write
● Schema evolution
Impressions Infrastructure
Batch Indexing Infrastructure (V1)
us-east-1 us-west
eu-west
Precompute/Live
Compute
RT Indexing Infrastructure (V2)
Precompute/Live
Compute
Rocksdb
● Engine: Apache Flink
● Processing Semantics:
○ Exactly once
○ Late arriving events
● RocksDB:
○ Zero data loss:
■ Failures/Crashes - Checkpoints/Savepoints
○ State Management:
■ Application state
RT Indexing Infrastructure
Chaos Engineering
● Region Failovers
● Crash Proof ?:
○ Replicas
○ Tail Latency tests
● Cross Region Replication:
○ Kafka:
■ 2-3x data
○ Low Lag:
■ Scale up Replay Proxies
Tiered Storage:
○ Memory
○ SSD/Nvme
○ EBS (Nitro based)
■ 80,000 IOPS
■ 14Gbps
Future Work
https://www.linkedin.com/in/knitinsharma/
nitins@netflix.com
Speaker Info

More Related Content

What's hot

Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks
Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech TalksScaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks
Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech TalksAmazon Web Services
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
 
CI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdCI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdBilly Yuen
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive DataSumit Rangwala
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...LibbySchulze
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For OperatorsKevin Brockhoff
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Powers of Ten Redux
Powers of Ten ReduxPowers of Ten Redux
Powers of Ten ReduxJason Plurad
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Amazon Web Services
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 

What's hot (20)

Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks
Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech TalksScaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks
Scaling Redis Workloads with Amazon ElastiCache - AWS Online Tech Talks
 
FinOps for private cloud
FinOps for private cloudFinOps for private cloud
FinOps for private cloud
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 
CI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cdCI:CD in Lightspeed with kubernetes and argo cd
CI:CD in Lightspeed with kubernetes and argo cd
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Powers of Ten Redux
Powers of Ten ReduxPowers of Ten Redux
Powers of Ten Redux
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
EVCache at Netflix
EVCache at NetflixEVCache at Netflix
EVCache at Netflix
 

Similar to Netflix - Realtime Impression Store

EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecPeter Bakas
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceScott Mansfield
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Scott Mansfield
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...Amazon Web Services
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsYaroslav Tkachenko
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleDmytro Semenov
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Roopa Tangirala
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017Thomas Weise
 

Similar to Netflix - Realtime Impression Store (20)

EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)Application Caching: The Hidden Microservice (SAConf)
Application Caching: The Hidden Microservice (SAConf)
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 

Recently uploaded

NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...Amil baba
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5T.D. Shashikala
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...Roi Lipman
 
Roushan Kumar Java oracle certificate
Roushan Kumar Java oracle certificate Roushan Kumar Java oracle certificate
Roushan Kumar Java oracle certificate RaushanKumar452467
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
Lect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptxLect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptxHamzaKhawar4
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxkarthikeyanS725446
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdfKamal Acharya
 
Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxMonirHossain707319
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...MohammadAliNayeem
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdfKamal Acharya
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientistgettygaming1
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxMd. Shahidul Islam Prodhan
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdfKamal Acharya
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdfKamal Acharya
 

Recently uploaded (20)

NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
RM&IPR M5 notes.pdfResearch Methodolgy & Intellectual Property Rights Series 5
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
Roushan Kumar Java oracle certificate
Roushan Kumar Java oracle certificate Roushan Kumar Java oracle certificate
Roushan Kumar Java oracle certificate
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Lect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptxLect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptx
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Lect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptxLect_Z_Transform_Main_digital_image_processing.pptx
Lect_Z_Transform_Main_digital_image_processing.pptx
 
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
Complex plane, Modulus, Argument, Graphical representation of a complex numbe...
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Dairy management system project report..pdf
Dairy management system project report..pdfDairy management system project report..pdf
Dairy management system project report..pdf
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 

Netflix - Realtime Impression Store

  • 1.
  • 2. Nitin Sharma Real-Time Impression Store @ Netflix Sep 2018
  • 3. ● Recommendations @ Netflix ● Distributed Impression Store ● Distributed Processing Infrastructure Agenda
  • 5.
  • 6. ● 129M+ active members ● 190+ countries ● Increasing launch of originals ● 2 Trillion kafka events/day Recommendations @Scale
  • 7. ● Signals: ○ Impressions, Plays, View History ○ Searches, My Lists etc. ● Compute: ○ Offline Compute ○ Online Compute Netflix Recommendation Signals
  • 8.
  • 10. ● Raw impressions: ○ Large dataset (PBs every day) ○ 10s of trillions of rows ○ $$$$ ● Lack of signal: ○ Too noisy ○ Custom aggregation - expensive Storage Philosophy
  • 11.
  • 12. ● Impression fatigue , Familiarity effect ● Under vs over Impressing ● Use in Recommendations - Score Online Goals
  • 13. ● Want: I want all impressions ● Need: ○ Meaningful way to capture impressions ○ (W/O storing full impressions) ● Idea: ○ EMA Counts # of times - <User, Video> over multiple decay windows. How many times has a show been recommended to you?
  • 14. ● Representation: ○ Exponential Moving Average (EMA) Score ○ Rate of impressions over a given window ○ EMA score over Windows: ■ <user, video, location> => <last_seen_ts, [1d, 1w, 1m, 6m, 1y]> ■ E.g <a, narcos, toprow> => <today, [0.9, 0.2, 0, 0, 0]> ● Benefits: ○ Better Signal ○ Memory Footprint ○ Extensible Data model Definition - EMA
  • 16. Scale & SLA Dimension Attribute Scale Storage Videos * locations * users ~600B + entries Raw Data size 150+TB Indexing Updates Bulk 600B+ entries < 1.5 hours Real time 50M entries < 5mins Query 99th Percentile Latency < 10s of ms Patterns Read/Update/Write Operations Cost $$ >> $$$$ Availability 99.99% [~4m downtime/30 days]
  • 18. Observation Attributes Stringent Flexible Predictably faster Reads Bulk + RT writes Strict Schema Secondary Index Easy Operations Ser/Deser at Read (aka blob storage)
  • 19. Cache with props of a DataStore?
  • 20.
  • 21. ● Format: <K,V> storage & Schema-less ● Underlying: Memcached + Cross Region support ● Storage: In memory + Ext Storage Evcache
  • 24. Client Side Writing (set, delete, add, etc.) us-west-2a us-west-2cus-west-2b ClientClient Client
  • 25. Client Side Reading (get) us-west-2a us-west-2cus-west-2b Client Primary Secondary
  • 26. ● In Memory: ○ $$$$ ○ No persistence ○ Lack of consistency - Node crash etc ● Layered Storage? But wait...
  • 27. ● Scenario: ○ 10% of active items => 90% of hits ○ Large values eat up RAM ● Philosophy: ○ LRU <k,v> on disk; Index in RAM ○ MRU key + value in RAM ○ Values => NVMe; SSD Ext Storage
  • 29. Ext Storage ● Effects: ○ Reduce overall RAM ○ Reduce Server Count ○ Increase overall Cache size
  • 30. Cost : 1 TB of storage space EVCache - RAM Only 20 r4.2xlarge EVCache - Flash 1 i3.xlarge Network Bandwidth : ~40 Gbps Network Bandwidth : ~2.5 Gbps
  • 31. ● Cross Region Replication ● Backup & Recovery Data Store (ish) Features
  • 32. Region BRegion A APP APP Repl Proxy Repl Relay 1 mutate 2 send metadata 3 poll msg 5 https send m sg 6 mutate4 get data for set Kafka Repl Relay Kafka Repl Proxy Cross-Region Replication 7 read
  • 33. ● Cross Region: ○ Relay: Event forward through kafka ○ Proxy: Event receiver and mutate cache ■ Stateless ■ Easy Throttle ● CAS: ■ Client-> Replica ■ Client -> Kafka ■ Write Retries (by proxy) until replica is consistent. Characteristics
  • 34. ● Client Side ○ Writes: All replicas in a region ○ Compression: Gzip ○ Reads: ■ Quorum - Latches ■ Consistency All ● Server Side ○ Writes: Cross regions [replication] ○ Reads: Basic read repairs [Compare & Set] ○ Storage: Mem & External Client & Server Responsibilities
  • 35. Backup & Restore Architecture Cache Warmer (Spark) Application Client Library Client Control S3 Data Flow Metadata Flow Control Flow
  • 37. ● Protobuf ● EMA Payload: ○ (last_seen_ts, Float[1d, 1w, 1m, 6m, 1y]) ● Overall: ○ Map<VideoId,Map<locationId,EMA>> EMA Representation
  • 38. Client Side Write Trade-offs ClientClient ● Writes: ○ Read->De-ser -> Update -> Ser -> Write ● Schema evolution
  • 40. Batch Indexing Infrastructure (V1) us-east-1 us-west eu-west Precompute/Live Compute
  • 41. RT Indexing Infrastructure (V2) Precompute/Live Compute Rocksdb
  • 42. ● Engine: Apache Flink ● Processing Semantics: ○ Exactly once ○ Late arriving events ● RocksDB: ○ Zero data loss: ■ Failures/Crashes - Checkpoints/Savepoints ○ State Management: ■ Application state RT Indexing Infrastructure
  • 43. Chaos Engineering ● Region Failovers ● Crash Proof ?: ○ Replicas ○ Tail Latency tests ● Cross Region Replication: ○ Kafka: ■ 2-3x data ○ Low Lag: ■ Scale up Replay Proxies
  • 44. Tiered Storage: ○ Memory ○ SSD/Nvme ○ EBS (Nitro based) ■ 80,000 IOPS ■ 14Gbps Future Work