SlideShare a Scribd company logo
1 of 68
Download to read offline
Application Caching:
The Hidden Microservice
Scott Mansfield (@sgmansfield)
Senior Software Engineer
EVCache
90 seconds
What do caches touch?
Signing up*
Logging in
Choosing a profile
Picking liked videos
Personalization*
Loading home page*
Scrolling home page*
A/B tests
Video image selection
Searching*
Viewing title details
Playing a title*
Subtitle / language prefs
Rating a title
My List
Video history*
UI strings
Video production*
* multiple caches involved
Home Page
Request
Key-Value store optimized for
AWS and tuned for Netflix use
cases
Ephemeral Volatile Cache
What is EVCache?
Distributed, sharded, replicated key-value store
Tunable in-region and global replication
Based on Memcached
Resilient to failure
Topology aware
Linearly scalable
Seamless deployments
Why Optimize for AWS
Instances disappear
Zones fail
Regions become unstable
Network is lossy
Customer requests bounce between regions
Failures happen and we test all the time
EVCache Use @ Netflix
Hundreds of terabytes of data
Trillions of ops / day
Tens of billions of items stored
Tens of millions of ops / sec
Millions of replications / sec
Thousands of servers
Hundreds of instances per cluster
Hundreds of microservice clients
Tens of distinct clusters
3 regions
4 engineers
Architecture
Server
Memcached
Prana (Sidecar)
Application
Client Library
Client
Eureka
(Service Discovery)
Architecture
us-west-2a us-west-2cus-west-2b
ClientClient Client
Reading (get)
us-west-2a us-west-2cus-west-2b
Client
Primary Secondary
Writing (set, delete, add, etc.)
us-west-2a us-west-2cus-west-2b
ClientClient Client
Code
EVCache evCache = new EVCache.Builder()
.setAppName("EVCACHE_TEST")
.setCachePrefix("prefix")
.setDefaultTTL(900) // 15 minutes
.build();
evCache.set("key", "value");
evCache.get("key");
evCache.delete("key");
Use Case: Lookaside Cache
Application
Client Library
Client Ribbon Client
S S S S
C C C C
Data Flow
Use Case: Transient Data Store
Application
Client Library
Client
Application
Client Library
Client
Application
Client Library
Client
Time
Use Case: Primary Store
Offline / Nearline
Precomputes for
Recommendations
Online Services
Offline Services
Online Application
Client Library
Client
Data Flow
Online Services
Offline Services
Use Case: Versioned Primary Store
Offline / Nearline
Precomputes for
Recommendations
Online Application
Client Library
Client
Data Flow
Archaius
(Dynamic Properties)
Control System
(Valhalla)
Use Case: High Volume && High Availability
Compute & Publish
on schedule
Data Flow
Application
Client Library
In-memory Remote Ribbon Client
Optional
S S S S
C C C C
Pipeline of Personalization
Compute A
Compute B Compute C
Compute D
Online Services
Offline Services
Compute E
Data Flow
Online 1 Online 2
Polyglot Clients
APP
Java Client
APP
Prana
Local Proxy
Memcached
APP
Memcached Proxy
HTTP
APP
HTTP Proxy
Use Case: Polyglot
HTTP
Python App
HTTP Proxy
Input Go App
Prana
Local Proxy
Output
Additional Features
Kafka
● Global data replication
● Consistency metrics
Key Iteration
● Cache warming
● Lost instance recovery
● Backup (and restore)
Additional Features (Kafka)
Global data replication
Consistency metrics
Region BRegion A
APP APP
Repl Proxy
Repl Relay
1 mutate
2 send
metadata
3 poll msg
5
https send
m
sg
6 mutate4
get data
for set
Kafka Repl Relay Kafka
Repl Proxy
Cross-Region Replication
7 read
Region A
APP
Consistency
Checker
1 mutate
2 send
metadata
3 poll msg
Kafka
Consistency Metrics
4 pull data
Atlas
(Metrics Backend)
5 report
Client
Dashboards
Additional Features (Key Iteration)
Cache warming
Lost instance recovery
Backup (and restore)
Cache Warming
Cache Warmer
(Spark)
Application
Client Library
Client
Control
S3
Data Flow
Metadata Flow
Control Flow
Lost Instance Recovery
Cache Warmer
(Spark)
Application
Client Library
Client
S3
Partial Data Flow
Metadata Flow
Control Flow
Control
Zone A Zone B
Data Flow
Backup (and Restore)
Cache Warmer
(Spark)
Application
Client Library
Client
Control
S3Data Flow
Control Flow
What is EVCache bad at?
Perfect consistency
High security
Hot keys
Very large objects
Moneta
Next-generation
EVCache server
Moneta
Moneta: The Goddess of Memory
Juno Moneta: The Protectress of Funds for Juno
● Evolution of the EVCache server
● Cost optimization
● EVCache on SSD
● Ongoing lower EVCache cost per stream
● Takes advantage of global request patterns
Old Server
● Stock Memcached and Prana (Netflix sidecar)
● All data stored in RAM in Memcached
● Expensive with global expansion / N+1 architecture
Memcached
Prana
external
Optimization
● Global data means many copies
● Access patterns are heavily region-oriented
● In one region:
○ Hot data is used often
○ Cold data is almost never touched
● Keep hot data in RAM, cold data on SSD
● Size RAM for working set, SSD for overall dataset
New Server
● Adds Rend and Mnemonic
● Still looks like Memcached
● Unlocks cost-efficient storage & server-side intelligence
Rend
Prana
Memcached (RAM)
Mnemonic (SSD)
external internal
go get github.com/netflix/rend
Rend
Rend
● High-performance Memcached proxy & server
● Written in Go
○ Powerful concurrency primitives
○ Productive and fast
● Manages the L1/L2 relationship
● Tens of thousands of connections
Rend
● Modular to allow future changes / expansion of scope
○ Set of libraries and a default main()
● Manages connections, request orchestration, and
communication
● Low-overhead metrics library
● Multiple orchestrators
● Parallel locking for data integrity
Server Loop
Request Orchestration
Backend Handlers
M
E
T
R
I
C
S
Connection Management
Protocol
Mnemonic
Open source soon™
Mnemonic
● Manages data storage on SSD
● Reuses Rend server libraries
● Maps Memcached ops to RocksDB ops
Rend Server Core Lib (Go)
Mnemonic Op Handler (Go)
Mnemonic Core (C++)
RocksDB (C++)
Why RocksDB?
● Fast at medium to high write load
○ Goal was 99% read latency below 20ms
● Log-Structured Merge minimizes random writes to SSD
○ Writes are buffered in memory
● Immutable Static Sorted Tables
memtable
Record A Record B
SST
memtable memtable
SST SST . . .
How we use RocksDB
● Records sharded across many RocksDBs per instance
○ Reduces number of files checked, decreasing latency
● FIFO "Compaction"
○ More suitable for our precompute use cases
● Bloom filters and indices pinned in memory
○ Trade L1 space for faster L2
R
...
Mnemonic Core
Key: ABC
Key: XYZ
R R R
FIFO Limitation
● FIFO compaction not suitable for all use cases
○ Very frequently updated records may push out valid records
● Future: custom compaction or level compaction
SST
Record A2
Record B1
Record B2
Record A3
Record A1
Record A2
Record B1
Record B2
Record A3
Record A1
Record B3Record B3
Record C
Record D
Record E
Record F
Record G
Record H
time
SSTSST
Moneta in Production
● Serving all of our personalization data
● Rend runs with two ports:
○ One for standard users (read heavy or active data management)
○ Another for async and batch users: Replication and Precompute
● Maintains working set in RAM
● Optimized for precomputes
○ Smartly replaces data in L1
external internal
Prana
Memcached (RAM)
Mnemonic (SSD)
Std
Batch
Get: 229 μs (139 μs) avg
Percentiles:
● 50th
: 140 μs (113 μs)
● 75th
: 199 μs (139 μs)
● 90th
: 318 μs (195 μs)
● 95th
: 486 μs (231 μs)
● 99th
: 1.67 ms (508 μs)
● 99.9th
: 9.68 ms (1.97 ms)
Moneta Performance in Production
Set: 367 μs (200 μs) avg
Percentiles:
● 50th
: 227 μs (172 μs)
● 75th
: 301 μs (202 μs)
● 90th
: 411 μs (243 μs)
● 95th
: 579 μs (295 μs)
● 99th
: 3.25 ms (502 μs)
● 99.9th
: 18.5 ms (4.73 ms)
Latencies: peak (trough)
70%
Reduction in cost*
70%
Open Source
https://github.com/netflix/EVCache
https://github.com/netflix/rend
Thank You
@sgmansfield
smansfield@netflix.com
techblog.netflix.com
Failure Resilience in Client
● Operation Fast Failure
● Tunable Retries
● Operation Queues
● Tunable Latch for Mutations
● Async Replication through Kafka

More Related Content

What's hot

What's hot (20)

HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
 
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 InstanceExtreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
 
Inside CynosDB: MariaDB optimized for the cloud at Tencent
Inside CynosDB: MariaDB optimized for the cloud at TencentInside CynosDB: MariaDB optimized for the cloud at Tencent
Inside CynosDB: MariaDB optimized for the cloud at Tencent
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
 
Accordion HBaseCon 2017
Accordion HBaseCon 2017Accordion HBaseCon 2017
Accordion HBaseCon 2017
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
Update on Crimson - the Seastarized Ceph - Seastar Summit
Update on Crimson  - the Seastarized Ceph - Seastar SummitUpdate on Crimson  - the Seastarized Ceph - Seastar Summit
Update on Crimson - the Seastarized Ceph - Seastar Summit
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis Labs
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Date-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series DataDate-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series Data
 
Redis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech TalkRedis vs Infinispan | DevNation Tech Talk
Redis vs Infinispan | DevNation Tech Talk
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 

Similar to Application Caching: The Hidden Microservice (SAConf)

Similar to Application Caching: The Hidden Microservice (SAConf) (20)

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
EVCache Builderscon
EVCache BuildersconEVCache Builderscon
EVCache Builderscon
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's ViewExperience In Building Scalable Web Sites Through Infrastructure's View
Experience In Building Scalable Web Sites Through Infrastructure's View
 
Aerospike Hybrid Memory Architecture
Aerospike Hybrid Memory ArchitectureAerospike Hybrid Memory Architecture
Aerospike Hybrid Memory Architecture
 
MYSQL
MYSQLMYSQL
MYSQL
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Node.js scaling in highload
Node.js scaling in highloadNode.js scaling in highload
Node.js scaling in highload
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Node.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scaleNode.js Web Apps @ ebay scale
Node.js Web Apps @ ebay scale
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 

Application Caching: The Hidden Microservice (SAConf)

  • 1. Application Caching: The Hidden Microservice Scott Mansfield (@sgmansfield) Senior Software Engineer EVCache
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20. What do caches touch? Signing up* Logging in Choosing a profile Picking liked videos Personalization* Loading home page* Scrolling home page* A/B tests Video image selection Searching* Viewing title details Playing a title* Subtitle / language prefs Rating a title My List Video history* UI strings Video production* * multiple caches involved
  • 22.
  • 23. Key-Value store optimized for AWS and tuned for Netflix use cases Ephemeral Volatile Cache
  • 24. What is EVCache? Distributed, sharded, replicated key-value store Tunable in-region and global replication Based on Memcached Resilient to failure Topology aware Linearly scalable Seamless deployments
  • 25. Why Optimize for AWS Instances disappear Zones fail Regions become unstable Network is lossy Customer requests bounce between regions Failures happen and we test all the time
  • 26. EVCache Use @ Netflix Hundreds of terabytes of data Trillions of ops / day Tens of billions of items stored Tens of millions of ops / sec Millions of replications / sec Thousands of servers Hundreds of instances per cluster Hundreds of microservice clients Tens of distinct clusters 3 regions 4 engineers
  • 30. Writing (set, delete, add, etc.) us-west-2a us-west-2cus-west-2b ClientClient Client
  • 31. Code EVCache evCache = new EVCache.Builder() .setAppName("EVCACHE_TEST") .setCachePrefix("prefix") .setDefaultTTL(900) // 15 minutes .build(); evCache.set("key", "value"); evCache.get("key"); evCache.delete("key");
  • 32. Use Case: Lookaside Cache Application Client Library Client Ribbon Client S S S S C C C C Data Flow
  • 33. Use Case: Transient Data Store Application Client Library Client Application Client Library Client Application Client Library Client Time
  • 34. Use Case: Primary Store Offline / Nearline Precomputes for Recommendations Online Services Offline Services Online Application Client Library Client Data Flow
  • 35. Online Services Offline Services Use Case: Versioned Primary Store Offline / Nearline Precomputes for Recommendations Online Application Client Library Client Data Flow Archaius (Dynamic Properties) Control System (Valhalla)
  • 36. Use Case: High Volume && High Availability Compute & Publish on schedule Data Flow Application Client Library In-memory Remote Ribbon Client Optional S S S S C C C C
  • 37. Pipeline of Personalization Compute A Compute B Compute C Compute D Online Services Offline Services Compute E Data Flow Online 1 Online 2
  • 38. Polyglot Clients APP Java Client APP Prana Local Proxy Memcached APP Memcached Proxy HTTP APP HTTP Proxy
  • 39. Use Case: Polyglot HTTP Python App HTTP Proxy Input Go App Prana Local Proxy Output
  • 40. Additional Features Kafka ● Global data replication ● Consistency metrics Key Iteration ● Cache warming ● Lost instance recovery ● Backup (and restore)
  • 41. Additional Features (Kafka) Global data replication Consistency metrics
  • 42. Region BRegion A APP APP Repl Proxy Repl Relay 1 mutate 2 send metadata 3 poll msg 5 https send m sg 6 mutate4 get data for set Kafka Repl Relay Kafka Repl Proxy Cross-Region Replication 7 read
  • 43. Region A APP Consistency Checker 1 mutate 2 send metadata 3 poll msg Kafka Consistency Metrics 4 pull data Atlas (Metrics Backend) 5 report Client Dashboards
  • 44. Additional Features (Key Iteration) Cache warming Lost instance recovery Backup (and restore)
  • 45. Cache Warming Cache Warmer (Spark) Application Client Library Client Control S3 Data Flow Metadata Flow Control Flow
  • 46. Lost Instance Recovery Cache Warmer (Spark) Application Client Library Client S3 Partial Data Flow Metadata Flow Control Flow Control Zone A Zone B Data Flow
  • 47. Backup (and Restore) Cache Warmer (Spark) Application Client Library Client Control S3Data Flow Control Flow
  • 48. What is EVCache bad at? Perfect consistency High security Hot keys Very large objects
  • 50. Moneta Moneta: The Goddess of Memory Juno Moneta: The Protectress of Funds for Juno ● Evolution of the EVCache server ● Cost optimization ● EVCache on SSD ● Ongoing lower EVCache cost per stream ● Takes advantage of global request patterns
  • 51. Old Server ● Stock Memcached and Prana (Netflix sidecar) ● All data stored in RAM in Memcached ● Expensive with global expansion / N+1 architecture Memcached Prana external
  • 52. Optimization ● Global data means many copies ● Access patterns are heavily region-oriented ● In one region: ○ Hot data is used often ○ Cold data is almost never touched ● Keep hot data in RAM, cold data on SSD ● Size RAM for working set, SSD for overall dataset
  • 53. New Server ● Adds Rend and Mnemonic ● Still looks like Memcached ● Unlocks cost-efficient storage & server-side intelligence Rend Prana Memcached (RAM) Mnemonic (SSD) external internal
  • 55. Rend ● High-performance Memcached proxy & server ● Written in Go ○ Powerful concurrency primitives ○ Productive and fast ● Manages the L1/L2 relationship ● Tens of thousands of connections
  • 56. Rend ● Modular to allow future changes / expansion of scope ○ Set of libraries and a default main() ● Manages connections, request orchestration, and communication ● Low-overhead metrics library ● Multiple orchestrators ● Parallel locking for data integrity Server Loop Request Orchestration Backend Handlers M E T R I C S Connection Management Protocol
  • 58. Mnemonic ● Manages data storage on SSD ● Reuses Rend server libraries ● Maps Memcached ops to RocksDB ops Rend Server Core Lib (Go) Mnemonic Op Handler (Go) Mnemonic Core (C++) RocksDB (C++)
  • 59. Why RocksDB? ● Fast at medium to high write load ○ Goal was 99% read latency below 20ms ● Log-Structured Merge minimizes random writes to SSD ○ Writes are buffered in memory ● Immutable Static Sorted Tables memtable Record A Record B SST memtable memtable SST SST . . .
  • 60. How we use RocksDB ● Records sharded across many RocksDBs per instance ○ Reduces number of files checked, decreasing latency ● FIFO "Compaction" ○ More suitable for our precompute use cases ● Bloom filters and indices pinned in memory ○ Trade L1 space for faster L2 R ... Mnemonic Core Key: ABC Key: XYZ R R R
  • 61. FIFO Limitation ● FIFO compaction not suitable for all use cases ○ Very frequently updated records may push out valid records ● Future: custom compaction or level compaction SST Record A2 Record B1 Record B2 Record A3 Record A1 Record A2 Record B1 Record B2 Record A3 Record A1 Record B3Record B3 Record C Record D Record E Record F Record G Record H time SSTSST
  • 62. Moneta in Production ● Serving all of our personalization data ● Rend runs with two ports: ○ One for standard users (read heavy or active data management) ○ Another for async and batch users: Replication and Precompute ● Maintains working set in RAM ● Optimized for precomputes ○ Smartly replaces data in L1 external internal Prana Memcached (RAM) Mnemonic (SSD) Std Batch
  • 63. Get: 229 μs (139 μs) avg Percentiles: ● 50th : 140 μs (113 μs) ● 75th : 199 μs (139 μs) ● 90th : 318 μs (195 μs) ● 95th : 486 μs (231 μs) ● 99th : 1.67 ms (508 μs) ● 99.9th : 9.68 ms (1.97 ms) Moneta Performance in Production Set: 367 μs (200 μs) avg Percentiles: ● 50th : 227 μs (172 μs) ● 75th : 301 μs (202 μs) ● 90th : 411 μs (243 μs) ● 95th : 579 μs (295 μs) ● 99th : 3.25 ms (502 μs) ● 99.9th : 18.5 ms (4.73 ms) Latencies: peak (trough)
  • 67.
  • 68. Failure Resilience in Client ● Operation Fast Failure ● Tunable Retries ● Operation Queues ● Tunable Latch for Mutations ● Async Replication through Kafka