SlideShare a Scribd company logo
1 of 17
Download to read offline
Redis Cluster
a pragmatic approach to distribution
All nodes are directly
connected with a
service channel.
TCP baseport+4000,
example 6379 ->
10379.
Node to Node protocol
is binary, optimized for
bandwidth and speed.
Clients talk to nodes as
usually, using ascii
protocol, with minor
additions.
Nodes don't proxy
queries.
What nodes talk about?



PING: are you ok dude?              PONG: Sure I'm ok dude!
I'm master for XYZ hash slots.      I'm master for XYZ hash slots.
Config is FF89X1JK                  Config is FF89X1JK

Gossip: this are info about         Gossip: I want to share with
other nodes I'm in touch with:      you some info about random
                                    nodes:
A replies to my ping, I think its
state is OK.                        C and D are fine and replied in
B is idle, I guess it's having      time.
problems but I need some            But B is idle for me as well!
ACK.                                IMHO it's down!.
Hash slots
keyspace is divided into 4096 hash slots. But in this example
we'll assume they are just ten, from 0 to 9 ;)

Different nodes will hold a subset of hash slots.




A given key "foo" is at slot:

  slot = crc16("foo") mod NUMER_SLOTS
Master and Slave nodes
Nodes are all connected and functionally equivalent, but
actually there are two kind of nodes: slave and master nodes:
Redundancy
In the example there are two replicas per every master node, so
up to two random nodes can go down without issues.



                                    Working with two nodes
                                    down is guaranteed, but in
                                    the best case the cluster
                                    will continue to work as
                                    long as there is at least
                                    one node for every hash
                                    slot.
What this means so far?
  Every key only exists in a single instance, plus N replicas
  that will never receive writes. So there is no merge, nor
  application-side inconsistency resolution.

  The price to pay is not resisting to net splits that are bigger
  than replicas-per-hashslot nodes down.

  Master and Slave nodes use the Redis Replication you
  already know.

  Every physical server will usually hold multiple nodes, both
  slaves and masters, but the redis-trib cluster manager
  program will try to allocate slaves and masters so that the
  replicas are in different physical servers.
Client requests - dummy client




1.   Client => A: GET foo
2.   A => Client: -MOVED 8 192.168.5.21:6391
3.   Client => B: GET foo
4.   B => Client: "bar"

-MOVED 8 ... this error means that hash slot 8 is located at
the specified IP/port, and the client should reissue the query
there.
Client requests - smart client




1.   Client => A: CLUSTER HINTS
2.   A => Client: ... a map of hash slots -> nodes
3.   Client => B: GET foo
4.   B => Client: "bar"
Client requests
  Dummy, single-connection clients, will work with minimal
  modifications to existing client code base. Just try a random
  node among a list, then reissue the query if needed.

  Smart clients will take persistent connections to many
  nodes, will cache hashslot -> node info, and will update the
  table when they receive a -MOVED error.

  This schema is always horizontally scalable, and low
  latency if the clients are smart.

  Especially in large clusters where clients will try to have
  many persistent connections to multiple nodes, the Redis
  client object should be shared.
Re-sharding




 We are experiencing too much load. Let's add a new server.
 Node C marks his slot 7 as "MOVING to D"
 Every time C receives a request about slot 7, if the key is
 actually in C, it replies, otherwise it replies with -ASK D
 -ASK is like -MOVED but the difference is that the client
 should retry against D only this query, not next queries.
 That means: smart clients should not update internal state.
Re-sharding - moving data




 All the new keys for slot 7 will be created / updated in D.
 All the old keys in C will be moved to D by redis-trib using
 the MIGRATE command.
 MIGRATE is an atomic command, it will transfer a key from
 C to D, and will remove the key in C when we get the OK
 from D. So no race is possible.
 p.s. MIGRATE is an exported command. Have fun...
 Open problem: ask C the next key in hash slot N, efficiently.
Re-sharding with failing nodes
  Nodes can fail while resharding. It's slave
  promotion as usually.

  The redis-trib utility is executed by the
  sysadmin. Will exit and warn when
  something is not ok as will check the
  cluster config continuously while
  resharding.
Fault tolerance
  All nodes continuously
  ping other nodes...

  A node marks another
  node as possibly failing
  when there is a timeout
  longer than N seconds.

  Every PING and PONG
  packet contain a gossip
  section: information
  about other nodes idle
  times, from the point of
  view of the sending
  node.
Fault tolerance - failing nodes
  A guesses B is failing, as the latest PING request timed out.
  A will not take any action without any other hint.

  C sends a PONG to A, with the gossip section containing
  information about B: C also thinks B is failing.

  At this point A marks B as failed, and notifies the
  information to all the other nodes in the cluster, that will
  mark the node as failing.

  If B will ever return back, the first time he'll ping any node of
  the cluster, it will be notified to shut down ASAP, as
  intermitting clients are not good for the clients.

  Only way to rejoin a Redis cluster after massive crash
  is: redis-trib by hand.
Redis-trib - the Redis Cluster Manager

  It is used to setup a new cluster, once you start N blank
  nodes.

  it is used to check if the cluster is consistent. And to fix it if
  the cluster can't continue, as there are hash slots without a
  single node.

  It is used to add new nodes to the cluster, either as slaves
  of an already existing master node, or as blank nodes
  where we can re-shard a few hash slots to lower other
  nodes load.
It's more complex than this...

  there are many details that can't fit a 20 minutes
  presentation...
  Ping/Pong packets contain enough information for the
  cluster to restart after graceful stop. But the sysadmin can
  use CLUSTER MEET command to make sure nodes will
  engage if IP changed and so forth.
  Every node has a unique ID, and a cluster config file.
  Everytime the config changes the cluster config file is
  saved.
  The cluster config file can't be edited by humans.
  The node ID never changes for a given node.
  Questions?

More Related Content

What's hot

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis IntroductionAlex Su
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSHSage Weil
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementEl Mahdi Benzekri
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 

What's hot (20)

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Redis and it's data types
Redis and it's data typesRedis and it's data types
Redis and it's data types
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis Introduction
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Redis Persistence
Redis  PersistenceRedis  Persistence
Redis Persistence
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log Management
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 

Viewers also liked

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...NoSQLmatters
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askCarlos Abalde
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learnedTit Petric
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. RedisTim Lossen
 
美团点评技术沙龙010-Redis Cluster运维实践
美团点评技术沙龙010-Redis Cluster运维实践美团点评技术沙龙010-Redis Cluster运维实践
美团点评技术沙龙010-Redis Cluster运维实践美团点评技术团队
 
Redis 常见使用模式分析
Redis 常见使用模式分析Redis 常见使用模式分析
Redis 常见使用模式分析vincent253
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in PracticeNoah Davis
 
美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路
美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路
美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路美团点评技术团队
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
美团点评沙龙012-从零到千万量级的实时物流平台架构实践
美团点评沙龙012-从零到千万量级的实时物流平台架构实践美团点评沙龙012-从零到千万量级的实时物流平台架构实践
美团点评沙龙012-从零到千万量级的实时物流平台架构实践美团点评技术团队
 
美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化
美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化
美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化美团点评技术团队
 
深入了解Redis
深入了解Redis深入了解Redis
深入了解Redisiammutex
 
美团点评技术沙龙09 - 外卖O2O的用户画像实践
美团点评技术沙龙09 - 外卖O2O的用户画像实践美团点评技术沙龙09 - 外卖O2O的用户画像实践
美团点评技术沙龙09 - 外卖O2O的用户画像实践美团点评技术团队
 
Managing Redis with Kubernetes - Kelsey Hightower, Google
Managing Redis with Kubernetes - Kelsey Hightower, GoogleManaging Redis with Kubernetes - Kelsey Hightower, Google
Managing Redis with Kubernetes - Kelsey Hightower, GoogleRedis Labs
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Redis Labs
 
Redis replication dcshi
Redis replication dcshiRedis replication dcshi
Redis replication dcshidcshi
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsDave Nielsen
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Labs
 
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Labs
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecaseKris Jeong
 

Viewers also liked (20)

Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
Salvatore Sanfilippo – How Redis Cluster works, and why - NoSQL matters Barce...
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to ask
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learned
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
 
美团点评技术沙龙010-Redis Cluster运维实践
美团点评技术沙龙010-Redis Cluster运维实践美团点评技术沙龙010-Redis Cluster运维实践
美团点评技术沙龙010-Redis Cluster运维实践
 
Redis 常见使用模式分析
Redis 常见使用模式分析Redis 常见使用模式分析
Redis 常见使用模式分析
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in Practice
 
美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路
美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路
美团点评沙龙 飞行中换引擎--美团配送业务系统的架构演进之路
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
美团点评沙龙012-从零到千万量级的实时物流平台架构实践
美团点评沙龙012-从零到千万量级的实时物流平台架构实践美团点评沙龙012-从零到千万量级的实时物流平台架构实践
美团点评沙龙012-从零到千万量级的实时物流平台架构实践
 
美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化
美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化
美团点评技术沙龙09 - 美团外卖中的单量预估及列表优化
 
深入了解Redis
深入了解Redis深入了解Redis
深入了解Redis
 
美团点评技术沙龙09 - 外卖O2O的用户画像实践
美团点评技术沙龙09 - 外卖O2O的用户画像实践美团点评技术沙龙09 - 外卖O2O的用户画像实践
美团点评技术沙龙09 - 外卖O2O的用户画像实践
 
Managing Redis with Kubernetes - Kelsey Hightower, Google
Managing Redis with Kubernetes - Kelsey Hightower, GoogleManaging Redis with Kubernetes - Kelsey Hightower, Google
Managing Redis with Kubernetes - Kelsey Hightower, Google
 
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
 
Redis replication dcshi
Redis replication dcshiRedis replication dcshi
Redis replication dcshi
 
Redis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale AppsRedis Functions, Data Structures for Web Scale Apps
Redis Functions, Data Structures for Web Scale Apps
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
 
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis LabsRedis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Day Keynote Salvatore Sanfillipo Redis Labs
 
This is redis - feature and usecase
This is redis - feature and usecaseThis is redis - feature and usecase
This is redis - feature and usecase
 

Similar to Redis cluster

Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterAnne Nicolas
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATSThe Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATSApcera
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS NATS
 
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)wallyqs
 
lecture8.ppt
lecture8.pptlecture8.ppt
lecture8.pptImXaib
 
Oslo.Messaging new 0mq driver proposal
Oslo.Messaging new 0mq driver proposalOslo.Messaging new 0mq driver proposal
Oslo.Messaging new 0mq driver proposaldavanum
 
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...idsecconf
 
Meetup docker using software defined networks
Meetup docker   using software defined networksMeetup docker   using software defined networks
Meetup docker using software defined networksOCTO Technology
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
 

Similar to Redis cluster (20)

Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATSThe Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS
 
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)The Zen of High Performance Messaging with NATS (Strange Loop 2016)
The Zen of High Performance Messaging with NATS (Strange Loop 2016)
 
IP Multicast on ec2
IP Multicast on ec2IP Multicast on ec2
IP Multicast on ec2
 
lecture8.ppt
lecture8.pptlecture8.ppt
lecture8.ppt
 
Oslo.Messaging new 0mq driver proposal
Oslo.Messaging new 0mq driver proposalOslo.Messaging new 0mq driver proposal
Oslo.Messaging new 0mq driver proposal
 
Layer2&arp
Layer2&arpLayer2&arp
Layer2&arp
 
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
 
R bernardino hand_in_assignment_week_1
R bernardino hand_in_assignment_week_1R bernardino hand_in_assignment_week_1
R bernardino hand_in_assignment_week_1
 
Neutron Deep Dive
Neutron Deep Dive Neutron Deep Dive
Neutron Deep Dive
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Meetup docker using software defined networks
Meetup docker   using software defined networksMeetup docker   using software defined networks
Meetup docker using software defined networks
 
Docker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
 
Computer Networking Assignment Help
Computer Networking Assignment HelpComputer Networking Assignment Help
Computer Networking Assignment Help
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 

More from iammutex

Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出iammutex
 
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析iammutex
 
MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用iammutex
 
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slideiammutex
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Modelsiammutex
 
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告iammutex
 
redis 适用场景与实现
redis 适用场景与实现redis 适用场景与实现
redis 适用场景与实现iammutex
 
Introduction to couchdb
Introduction to couchdbIntroduction to couchdb
Introduction to couchdbiammutex
 
What every data programmer needs to know about disks
What every data programmer needs to know about disksWhat every data programmer needs to know about disks
What every data programmer needs to know about disksiammutex
 
redis运维之道
redis运维之道redis运维之道
redis运维之道iammutex
 
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011Realtime hadoopsigmod2011
Realtime hadoopsigmod2011iammutex
 
[译]No sql生态系统
[译]No sql生态系统[译]No sql生态系统
[译]No sql生态系统iammutex
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbaseiammutex
 
Redis cluster
Redis clusterRedis cluster
Redis clusteriammutex
 
Hadoop introduction berlin buzzwords 2011
Hadoop introduction   berlin buzzwords 2011Hadoop introduction   berlin buzzwords 2011
Hadoop introduction berlin buzzwords 2011iammutex
 
Couchdb and me
Couchdb and meCouchdb and me
Couchdb and meiammutex
 

More from iammutex (20)

Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出
 
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
 
MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用
 
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide
 
skip list
skip listskip list
skip list
 
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency ModelsThoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
 
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告
 
redis 适用场景与实现
redis 适用场景与实现redis 适用场景与实现
redis 适用场景与实现
 
Introduction to couchdb
Introduction to couchdbIntroduction to couchdb
Introduction to couchdb
 
What every data programmer needs to know about disks
What every data programmer needs to know about disksWhat every data programmer needs to know about disks
What every data programmer needs to know about disks
 
Ooredis
OoredisOoredis
Ooredis
 
Ooredis
OoredisOoredis
Ooredis
 
redis运维之道
redis运维之道redis运维之道
redis运维之道
 
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011Realtime hadoopsigmod2011
Realtime hadoopsigmod2011
 
[译]No sql生态系统
[译]No sql生态系统[译]No sql生态系统
[译]No sql生态系统
 
Couchdb + Membase = Couchbase
Couchdb + Membase = CouchbaseCouchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
 
Redis cluster
Redis clusterRedis cluster
Redis cluster
 
Hadoop introduction berlin buzzwords 2011
Hadoop introduction   berlin buzzwords 2011Hadoop introduction   berlin buzzwords 2011
Hadoop introduction berlin buzzwords 2011
 
Couchdb and me
Couchdb and meCouchdb and me
Couchdb and me
 

Redis cluster

  • 1. Redis Cluster a pragmatic approach to distribution
  • 2. All nodes are directly connected with a service channel. TCP baseport+4000, example 6379 -> 10379. Node to Node protocol is binary, optimized for bandwidth and speed. Clients talk to nodes as usually, using ascii protocol, with minor additions. Nodes don't proxy queries.
  • 3. What nodes talk about? PING: are you ok dude? PONG: Sure I'm ok dude! I'm master for XYZ hash slots. I'm master for XYZ hash slots. Config is FF89X1JK Config is FF89X1JK Gossip: this are info about Gossip: I want to share with other nodes I'm in touch with: you some info about random nodes: A replies to my ping, I think its state is OK. C and D are fine and replied in B is idle, I guess it's having time. problems but I need some But B is idle for me as well! ACK. IMHO it's down!.
  • 4. Hash slots keyspace is divided into 4096 hash slots. But in this example we'll assume they are just ten, from 0 to 9 ;) Different nodes will hold a subset of hash slots. A given key "foo" is at slot: slot = crc16("foo") mod NUMER_SLOTS
  • 5. Master and Slave nodes Nodes are all connected and functionally equivalent, but actually there are two kind of nodes: slave and master nodes:
  • 6. Redundancy In the example there are two replicas per every master node, so up to two random nodes can go down without issues. Working with two nodes down is guaranteed, but in the best case the cluster will continue to work as long as there is at least one node for every hash slot.
  • 7. What this means so far? Every key only exists in a single instance, plus N replicas that will never receive writes. So there is no merge, nor application-side inconsistency resolution. The price to pay is not resisting to net splits that are bigger than replicas-per-hashslot nodes down. Master and Slave nodes use the Redis Replication you already know. Every physical server will usually hold multiple nodes, both slaves and masters, but the redis-trib cluster manager program will try to allocate slaves and masters so that the replicas are in different physical servers.
  • 8. Client requests - dummy client 1. Client => A: GET foo 2. A => Client: -MOVED 8 192.168.5.21:6391 3. Client => B: GET foo 4. B => Client: "bar" -MOVED 8 ... this error means that hash slot 8 is located at the specified IP/port, and the client should reissue the query there.
  • 9. Client requests - smart client 1. Client => A: CLUSTER HINTS 2. A => Client: ... a map of hash slots -> nodes 3. Client => B: GET foo 4. B => Client: "bar"
  • 10. Client requests Dummy, single-connection clients, will work with minimal modifications to existing client code base. Just try a random node among a list, then reissue the query if needed. Smart clients will take persistent connections to many nodes, will cache hashslot -> node info, and will update the table when they receive a -MOVED error. This schema is always horizontally scalable, and low latency if the clients are smart. Especially in large clusters where clients will try to have many persistent connections to multiple nodes, the Redis client object should be shared.
  • 11. Re-sharding We are experiencing too much load. Let's add a new server. Node C marks his slot 7 as "MOVING to D" Every time C receives a request about slot 7, if the key is actually in C, it replies, otherwise it replies with -ASK D -ASK is like -MOVED but the difference is that the client should retry against D only this query, not next queries. That means: smart clients should not update internal state.
  • 12. Re-sharding - moving data All the new keys for slot 7 will be created / updated in D. All the old keys in C will be moved to D by redis-trib using the MIGRATE command. MIGRATE is an atomic command, it will transfer a key from C to D, and will remove the key in C when we get the OK from D. So no race is possible. p.s. MIGRATE is an exported command. Have fun... Open problem: ask C the next key in hash slot N, efficiently.
  • 13. Re-sharding with failing nodes Nodes can fail while resharding. It's slave promotion as usually. The redis-trib utility is executed by the sysadmin. Will exit and warn when something is not ok as will check the cluster config continuously while resharding.
  • 14. Fault tolerance All nodes continuously ping other nodes... A node marks another node as possibly failing when there is a timeout longer than N seconds. Every PING and PONG packet contain a gossip section: information about other nodes idle times, from the point of view of the sending node.
  • 15. Fault tolerance - failing nodes A guesses B is failing, as the latest PING request timed out. A will not take any action without any other hint. C sends a PONG to A, with the gossip section containing information about B: C also thinks B is failing. At this point A marks B as failed, and notifies the information to all the other nodes in the cluster, that will mark the node as failing. If B will ever return back, the first time he'll ping any node of the cluster, it will be notified to shut down ASAP, as intermitting clients are not good for the clients. Only way to rejoin a Redis cluster after massive crash is: redis-trib by hand.
  • 16. Redis-trib - the Redis Cluster Manager It is used to setup a new cluster, once you start N blank nodes. it is used to check if the cluster is consistent. And to fix it if the cluster can't continue, as there are hash slots without a single node. It is used to add new nodes to the cluster, either as slaves of an already existing master node, or as blank nodes where we can re-shard a few hash slots to lower other nodes load.
  • 17. It's more complex than this... there are many details that can't fit a 20 minutes presentation... Ping/Pong packets contain enough information for the cluster to restart after graceful stop. But the sysadmin can use CLUSTER MEET command to make sure nodes will engage if IP changed and so forth. Every node has a unique ID, and a cluster config file. Everytime the config changes the cluster config file is saved. The cluster config file can't be edited by humans. The node ID never changes for a given node. Questions?