SlideShare a Scribd company logo
1 of 28
Download to read offline
Daniel Hochman, Engineer
1,000 2,000 Instances and Beyond
Agenda
- From the ground up
- Provisioning
- Clustering
- Maintaining high availability
- Handling system failures
- Observability
- Load testing
- Roadmap
Case study: scaling a geospatial index
Operating Redis on the Lyft platform
RedisConf 2017
By the numbers
2017
50 clusters
750 instances
15M QPS peak
Twemproxy
2018
64 clusters (+14)
2,000 instances (+1,250)
25M QPS peak (+10M)
Envoy
Migrated entire Redis infrastructure.
Consistency?
- Lyft runs with no replication
- No AOF, no RDB
- "Best-effort"
- No consistency guarantees
- If an instance is lost, data is gone
Real-time nature of service means most data is dynamic and refreshed often.
From the ground up
Provisioning clusters
- Every Redis cluster is an EC2 autoscaling group
- Each service defines and deploys its own cluster
asg.present:
- name: locationsredis
- image: ubuntu16_base
- launch_config:
- cloud_init:
#!/bin/bash
NAME=locationsredis
SERVICE=redis
curl s3://provision.sh | sh
- instance_type: c5.large
- min_size: 60
- max_size: 60
Provisioning instances
- Central provisioning templates
- Include and override
include /etc/lyft/redis/redis-defaults.conf
# overrides
bind 0.0.0.0
save ""
port {{ port }}
maxmemory-policy {{ get(maxmemory_policy, 'allkeys-lru') }}
{% if environment == 'production' %}
rename-command KEYS ""
rename-command CONFIG CAREFULCONFIG
{% endif %}
Twemproxy (deprecated)
- Also known as Nutcracker
- Unmaintained, replaced with closed-source
- No active healthcheck
- No hot restart (config changes cause downtime)
- Difficult to extend (e.g. to integrate with instance discovery)
Commits
Envoy Proxy
- Open-source
- Built for edge and mesh networking
- Observability: stats, stats, stats
- Dynamic configuration
- Pluggable architecture
- Out-of-process
- Thriving ecosystem
- Redis, DynamoDB, MongoDB codecs
Discovery
discovery
GET /members/locationsredis
POST /members/locationsredis
Membership is eventually consistent.
…
30s
60s
locationsredis:
- 10.0.0.1:6379, 40s ago
- 10.0.0.2:6379, 23s ago
...
- 10.0.0.9:6379, 12s ago
Active healthchecking
> PING
"PONG"
> EXISTS _maintenance_
(integer) 0
> SET _maintenance_ true
OK
> EXISTS _maintenance_
(integer) 1
Send a command periodically to check for a healthy response.
healthcheck:
unhealthy_threshold: 3
healthy_threshold: 2
interval: 5s
interval_jitter: 0.1s
Passive healthchecking
Monitor the success and failure of operations and eject outliers.
outlier_detection:
consecutive_failure: 30
success_rate_stdev: 1
interval: 3s
base_ejection_time: 3s
Panic routing thresholds ensures that we don't eject everything.
Consistent hashing
cluster:
name: locationsredis
lb_policy: ring_hash
Ketama algorithm
Initialization: Hash each server n times to an integer
e.g. hash( 10.0.0.1_1) = 15
Request:
1. Hash a key to an integer
e.g. GET lyft ➝ hash(lyft) = 10
2. Search for the range that
contains the key
Larger n?
- Better distribution
- Longer ring initialization
- Longer search time
1
15
Partitioning
localhost:6379
…
SET msg hello
INCR comm
MGET lyft hello
SET msg hello
GET hello
INCR comm
GET lyft
OK
1
nil
To the application, the proxy looks like a single instance of Redis.
Unsupported commands
Any command with multiple keys is generally unsupported.
Example:
SUNION key1 key2
Solution:
"Hash tagging" designates a portion of the key for hashing.
SUNION {key}1 {key}2
Maintaining
High Availability
Recovering from failure
When an instance is lost, rebuild the ring
When a new instance takes its place, rebuild the ring
t0 t1
t2
Consistent hashing only re-allocates a portion of the keyspace.
More rebuilding
When an instance is lost, rebuild the ring
When a new instance takes its place, rebuild the ring
When active healthcheck fails, rebuild the ring
When outlier detection eject, rebuild the ring
Optimization required!
B U S Y
Consistent hashing
Maglev hashing algorithm
- 10x faster ring build
- 5x faster selection
- Less variance between hosts
- Slightly higher key redistribution
on membership change
Fault injection
Now
- Chaos Monkey
- Envoy HTTP fault injection
- Latency
- Error
TODO
- TCP
- Redis-specific
- Target certain commands
openfip / redfi
Stats
Mix of stats from Envoy and Redis
- Per-backend RPS
- Command RPS
- CPU
- Memory
- Network
- Hit rate
- Key count
- Connection count
{% macro redis_cluster_stats(redis_cluster_name, alarm_thresholds) %}
redis-look
$ redis-look-monitor.py -n 2 --estimate-throughput
^C 32072 commands in 2.54 seconds (12605.22 cmd/s)
* top by key
count avg/s % key
136 53.45 0.4 count:1033422222177010026
136 53.45 0.4 count:1004894103322111029
* top by command
count avg/s % command
8198 3222.05 25.6 GET
6746 2651.37 21.0 ZREMRANGEBYSCORE
* top by command and key
count avg/s % command and key
115 45.20 0.4 GET healthcheck
115 45.20 0.4 GET params
* top by est. throughput
est. bytes count throughput throughput/s key
1MB 72 72MB 32MB attr:1004893923555550610
434B 99 42.0K 16.5K attr:1004897644432010001
Throughput cost of large keys is real.
redis-cli --bigkeys can identify
large keys, but sampled and without
frequency.
danielhochman / redis-look
Serialization
Benefits of smaller format
- Lower memory consumption, I/O
- Lower network I/O
- Lower serialization cost
708 bytes
70%
1012 bytes
(original)
190 bytes
18%
Load testing
- Injecting extra bytes
- Oplog replay at higher speed (difficult)
- Simulated Rides
- Practical load test in production
- Test business logic and infrastructure
- Weekly cadence
RPS
Time
Real
Simulated
Total
System during Load Test
Spectre
- First week of January
- 25%+ performance loss
- Identified required migrations with load testing
- Migrated half of fleet from C4 to C5
- Migration completed in 3 days
- 20% performance gain
CPU
Spectre week
Week before Spectre
Migrate to C5
Week over week Redis CPU
Time
Roadmap
- Envoy has feature parity with Nutcracker (except hash tagging)
- Documentation on minimal configuration for Envoy as Redis proxy
- Replication
- Request and response dumping (i.e. oplog)
Q&A
- Thanks!
- @danielhochman on GitHub and Twitter
- Participate in Envoy open source! envoyproxy / envoy
- Lyft is hiring. Talk to me or visit https://www.lyft.com/jobs.

More Related Content

What's hot

MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKYoungHeon (Roy) Kim
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
Hunting for Credentials Dumping in Windows Environment
Hunting for Credentials Dumping in Windows EnvironmentHunting for Credentials Dumping in Windows Environment
Hunting for Credentials Dumping in Windows EnvironmentTeymur Kheirkhabarov
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAltinity Ltd
 
NGINX: High Performance Load Balancing
NGINX: High Performance Load BalancingNGINX: High Performance Load Balancing
NGINX: High Performance Load BalancingNGINX, Inc.
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Ltd
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Load Balancing and Scaling with NGINX
Load Balancing and Scaling with NGINXLoad Balancing and Scaling with NGINX
Load Balancing and Scaling with NGINXNGINX, Inc.
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackRadhika Puthiyetath
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요Jo Hoon
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustAltinity Ltd
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovZeroTurnaround
 

What's hot (20)

MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
SQL Database Mirroring setup
SQL Database Mirroring setupSQL Database Mirroring setup
SQL Database Mirroring setup
 
Hunting for Credentials Dumping in Windows Environment
Hunting for Credentials Dumping in Windows EnvironmentHunting for Credentials Dumping in Windows Environment
Hunting for Credentials Dumping in Windows Environment
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
NGINX: High Performance Load Balancing
NGINX: High Performance Load BalancingNGINX: High Performance Load Balancing
NGINX: High Performance Load Balancing
 
Altinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdfAltinity Quickstart for ClickHouse-2202-09-15.pdf
Altinity Quickstart for ClickHouse-2202-09-15.pdf
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Load Balancing and Scaling with NGINX
Load Balancing and Scaling with NGINXLoad Balancing and Scaling with NGINX
Load Balancing and Scaling with NGINX
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Troubleshooting Apache Cloudstack
Troubleshooting Apache CloudstackTroubleshooting Apache Cloudstack
Troubleshooting Apache Cloudstack
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
왜 쿠버네티스는 systemd로 cgroup을 관리하려고 할까요
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
JVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir IvanovJVM JIT compilation overview by Vladimir Ivanov
JVM JIT compilation overview by Vladimir Ivanov
 

Similar to RedisConf18 - 2,000 Instances and Beyond

Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsUsing SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsScyllaDB
 
Zabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshopZabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshopZabbix
 
Windows Remote Management - EN
Windows Remote Management - ENWindows Remote Management - EN
Windows Remote Management - ENKirill Nikolaev
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Matt Warren
 
Training Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLITraining Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLIContinuent
 
4Developers: Time series databases
4Developers: Time series databases4Developers: Time series databases
4Developers: Time series databasesPROIDEA
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemSneha Inguva
 
Mod03 linking and accelerating
Mod03 linking and acceleratingMod03 linking and accelerating
Mod03 linking and acceleratingPeter Haase
 
A Guide to Event-Driven SRE-inspired DevOps
A Guide to Event-Driven SRE-inspired DevOpsA Guide to Event-Driven SRE-inspired DevOps
A Guide to Event-Driven SRE-inspired DevOpsAndreas Grabner
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityRicardo Jimenez-Peris
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]Chris Suszyński
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
Spca2014 advanced share point troubleshooting hessing
Spca2014 advanced share point troubleshooting hessingSpca2014 advanced share point troubleshooting hessing
Spca2014 advanced share point troubleshooting hessingNCCOMMS
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan finalpreethaappan
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementRicardo Jimenez-Peris
 

Similar to RedisConf18 - 2,000 Instances and Beyond (20)

Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsUsing SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
 
Zabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshopZabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshop
 
Banv
BanvBanv
Banv
 
SAP consulting results
SAP consulting resultsSAP consulting results
SAP consulting results
 
Windows Remote Management - EN
Windows Remote Management - ENWindows Remote Management - EN
Windows Remote Management - EN
 
Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016Performance and how to measure it - ProgSCon London 2016
Performance and how to measure it - ProgSCon London 2016
 
Training Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLITraining Slides: 153 - Working with the CLI
Training Slides: 153 - Working with the CLI
 
Time series databases
Time series databasesTime series databases
Time series databases
 
4Developers: Time series databases
4Developers: Time series databases4Developers: Time series databases
4Developers: Time series databases
 
Handy Networking Tools and How to Use Them
Handy Networking Tools and How to Use ThemHandy Networking Tools and How to Use Them
Handy Networking Tools and How to Use Them
 
Mod03 linking and accelerating
Mod03 linking and acceleratingMod03 linking and accelerating
Mod03 linking and accelerating
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
A Guide to Event-Driven SRE-inspired DevOps
A Guide to Event-Driven SRE-inspired DevOpsA Guide to Event-Driven SRE-inspired DevOps
A Guide to Event-Driven SRE-inspired DevOps
 
LeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo UniversityLeanXcale Presentation - Waterloo University
LeanXcale Presentation - Waterloo University
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
Redis acc
Redis accRedis acc
Redis acc
 
Spca2014 advanced share point troubleshooting hessing
Spca2014 advanced share point troubleshooting hessingSpca2014 advanced share point troubleshooting hessing
Spca2014 advanced share point troubleshooting hessing
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
The End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional ManagementThe End of a Myth: Ultra-Scalable Transactional Management
The End of a Myth: Ultra-Scalable Transactional Management
 

More from Redis Labs

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Labs
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Redis Labs
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...Redis Labs
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020Redis Labs
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Redis Labs
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis Labs
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Redis Labs
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Redis Labs
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Redis Labs
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...Redis Labs
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Redis Labs
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Redis Labs
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Redis Labs
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020Redis Labs
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020Redis Labs
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020Redis Labs
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020Redis Labs
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Redis Labs
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Redis Labs
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Redis Labs
 

More from Redis Labs (20)

Redis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redisRedis Day Bangalore 2020 - Session state caching with redis
Redis Day Bangalore 2020 - Session state caching with redis
 
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
 
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
 
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
 
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
 
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of OracleRedis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
 
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
 
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
 
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
 
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
 
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
 
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
 
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
 
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
 
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
 

Recently uploaded

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

RedisConf18 - 2,000 Instances and Beyond

  • 1. Daniel Hochman, Engineer 1,000 2,000 Instances and Beyond
  • 2. Agenda - From the ground up - Provisioning - Clustering - Maintaining high availability - Handling system failures - Observability - Load testing - Roadmap
  • 3. Case study: scaling a geospatial index Operating Redis on the Lyft platform RedisConf 2017
  • 4. By the numbers 2017 50 clusters 750 instances 15M QPS peak Twemproxy 2018 64 clusters (+14) 2,000 instances (+1,250) 25M QPS peak (+10M) Envoy Migrated entire Redis infrastructure.
  • 5. Consistency? - Lyft runs with no replication - No AOF, no RDB - "Best-effort" - No consistency guarantees - If an instance is lost, data is gone Real-time nature of service means most data is dynamic and refreshed often.
  • 7. Provisioning clusters - Every Redis cluster is an EC2 autoscaling group - Each service defines and deploys its own cluster asg.present: - name: locationsredis - image: ubuntu16_base - launch_config: - cloud_init: #!/bin/bash NAME=locationsredis SERVICE=redis curl s3://provision.sh | sh - instance_type: c5.large - min_size: 60 - max_size: 60
  • 8. Provisioning instances - Central provisioning templates - Include and override include /etc/lyft/redis/redis-defaults.conf # overrides bind 0.0.0.0 save "" port {{ port }} maxmemory-policy {{ get(maxmemory_policy, 'allkeys-lru') }} {% if environment == 'production' %} rename-command KEYS "" rename-command CONFIG CAREFULCONFIG {% endif %}
  • 9. Twemproxy (deprecated) - Also known as Nutcracker - Unmaintained, replaced with closed-source - No active healthcheck - No hot restart (config changes cause downtime) - Difficult to extend (e.g. to integrate with instance discovery) Commits
  • 10. Envoy Proxy - Open-source - Built for edge and mesh networking - Observability: stats, stats, stats - Dynamic configuration - Pluggable architecture - Out-of-process - Thriving ecosystem - Redis, DynamoDB, MongoDB codecs
  • 11. Discovery discovery GET /members/locationsredis POST /members/locationsredis Membership is eventually consistent. … 30s 60s locationsredis: - 10.0.0.1:6379, 40s ago - 10.0.0.2:6379, 23s ago ... - 10.0.0.9:6379, 12s ago
  • 12. Active healthchecking > PING "PONG" > EXISTS _maintenance_ (integer) 0 > SET _maintenance_ true OK > EXISTS _maintenance_ (integer) 1 Send a command periodically to check for a healthy response. healthcheck: unhealthy_threshold: 3 healthy_threshold: 2 interval: 5s interval_jitter: 0.1s
  • 13. Passive healthchecking Monitor the success and failure of operations and eject outliers. outlier_detection: consecutive_failure: 30 success_rate_stdev: 1 interval: 3s base_ejection_time: 3s Panic routing thresholds ensures that we don't eject everything.
  • 14. Consistent hashing cluster: name: locationsredis lb_policy: ring_hash Ketama algorithm Initialization: Hash each server n times to an integer e.g. hash( 10.0.0.1_1) = 15 Request: 1. Hash a key to an integer e.g. GET lyft ➝ hash(lyft) = 10 2. Search for the range that contains the key Larger n? - Better distribution - Longer ring initialization - Longer search time 1 15
  • 15. Partitioning localhost:6379 … SET msg hello INCR comm MGET lyft hello SET msg hello GET hello INCR comm GET lyft OK 1 nil To the application, the proxy looks like a single instance of Redis.
  • 16. Unsupported commands Any command with multiple keys is generally unsupported. Example: SUNION key1 key2 Solution: "Hash tagging" designates a portion of the key for hashing. SUNION {key}1 {key}2
  • 18. Recovering from failure When an instance is lost, rebuild the ring When a new instance takes its place, rebuild the ring t0 t1 t2 Consistent hashing only re-allocates a portion of the keyspace.
  • 19. More rebuilding When an instance is lost, rebuild the ring When a new instance takes its place, rebuild the ring When active healthcheck fails, rebuild the ring When outlier detection eject, rebuild the ring Optimization required! B U S Y
  • 20. Consistent hashing Maglev hashing algorithm - 10x faster ring build - 5x faster selection - Less variance between hosts - Slightly higher key redistribution on membership change
  • 21. Fault injection Now - Chaos Monkey - Envoy HTTP fault injection - Latency - Error TODO - TCP - Redis-specific - Target certain commands openfip / redfi
  • 22. Stats Mix of stats from Envoy and Redis - Per-backend RPS - Command RPS - CPU - Memory - Network - Hit rate - Key count - Connection count {% macro redis_cluster_stats(redis_cluster_name, alarm_thresholds) %}
  • 23. redis-look $ redis-look-monitor.py -n 2 --estimate-throughput ^C 32072 commands in 2.54 seconds (12605.22 cmd/s) * top by key count avg/s % key 136 53.45 0.4 count:1033422222177010026 136 53.45 0.4 count:1004894103322111029 * top by command count avg/s % command 8198 3222.05 25.6 GET 6746 2651.37 21.0 ZREMRANGEBYSCORE * top by command and key count avg/s % command and key 115 45.20 0.4 GET healthcheck 115 45.20 0.4 GET params * top by est. throughput est. bytes count throughput throughput/s key 1MB 72 72MB 32MB attr:1004893923555550610 434B 99 42.0K 16.5K attr:1004897644432010001 Throughput cost of large keys is real. redis-cli --bigkeys can identify large keys, but sampled and without frequency. danielhochman / redis-look
  • 24. Serialization Benefits of smaller format - Lower memory consumption, I/O - Lower network I/O - Lower serialization cost 708 bytes 70% 1012 bytes (original) 190 bytes 18%
  • 25. Load testing - Injecting extra bytes - Oplog replay at higher speed (difficult) - Simulated Rides - Practical load test in production - Test business logic and infrastructure - Weekly cadence RPS Time Real Simulated Total System during Load Test
  • 26. Spectre - First week of January - 25%+ performance loss - Identified required migrations with load testing - Migrated half of fleet from C4 to C5 - Migration completed in 3 days - 20% performance gain CPU Spectre week Week before Spectre Migrate to C5 Week over week Redis CPU Time
  • 27. Roadmap - Envoy has feature parity with Nutcracker (except hash tagging) - Documentation on minimal configuration for Envoy as Redis proxy - Replication - Request and response dumping (i.e. oplog)
  • 28. Q&A - Thanks! - @danielhochman on GitHub and Twitter - Participate in Envoy open source! envoyproxy / envoy - Lyft is hiring. Talk to me or visit https://www.lyft.com/jobs.