SlideShare a Scribd company logo
1 of 23
Download to read offline
Sharding: patterns and 
antipatterns 
Konstantin Osipov (Mail.Ru, Tarantool) 
Alexey Rybak (Badoo)
Big picture: scalable databases 
● replication 
● sharding and re-sharding 
● distributed queries & jobs, Map/Reduce 
● DDL 
● will focus on sharding/re-sharding only
Contents 
I. sharding function 
II. routing 
III.re-sharding
I. Sharding function
Selecting a good shard key 
● the identified object 
should be small 
● some data you won’t be 
able to shard (and have to 
duplicate in each shard) 
● don’t store the key if you 
don’t have to
Good and bad shard keys 
● good: user session, shopping order 
● maybe: user (if user data isn’t too thick) 
● bad: inventory item, order date
Garage sharding: numbers 
● replication based doubling (2, 4, 8, out of 
cash) 
● the magic number 48 (2✕3✕4)
Garage sharding thru hashing 
● good: remainders 
o f(key) ≡ key % n_srv 
o f(key) ≡ crc32(key) % n_srv 
● bad: first login letter
Sharding for grown-ups 
● table function 
● consistent hashing
Table functions 
● virtual buckets: key -> bucket -> shard 
o “key -> bucket” function, “bucket -> shard” table 
o “key -> bucket” table, “bucket -> shard” table
Consistent hashing 
● Danny Lewin RIP 
● Kinda ring and like... 
uhm... points, you 
know ... 
● Libraries: Ketama
Guava/Sumbur 
● f(key, n_servers) => server_id 
● strictly uniform key-to-server mapping 
● recurrence formula (15 lines of code)
II. Routing
Routing types 
● smart client 
● coordinator 
● proxy 
● local proxy on every app server 
● intra-database routing
Smart Client 
● no extra hops 
● all clients 
(PHP/Python/C...) 
should implement 
● resharding is hard
Proxy 
● encapsulates routing logic 
● extra hop, traffic 
● +1 service 
● SPOF 
=> local proxy
Coordinator 
● centralized 
knowledge 
● SPOF
Intra-database routing 
● too many nodes 
● redundancy is high 
● ad-hoc requests
III.Re-sharding
Re-sharding is a pain 
● redistribution impacts: 
o clients 
o network performance 
o consistency 
=> maintenance time window 
● forget about it on petabyte scale
Best practice: no data redistribution 
● update is a move 
● data expiration (new data on new servers) 
● new data on selected servers
DDL 
● upgrade your app 
● upgrade your database 
● update your app and remove any trace of old 
schema
Thank you! Questions? 
kostja@tarantool.org 
fisher@corp.badoo.com

More Related Content

What's hot

Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelSri Ambati
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)ITCamp
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Taiwan User Group
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokesGagan Bajpai
 
Tms training
Tms trainingTms training
Tms trainingChi Lee
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsSpeedment, Inc.
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tipsvishnu rao
 
Analytical data processing
Analytical data processingAnalytical data processing
Analytical data processingPolad Saruxanov
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at ScaleCharlie Reverte
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodbDeep Kapadia
 

What's hot (19)

Web scale monitoring
Web scale monitoringWeb scale monitoring
Web scale monitoring
 
Hadoop @ eBuddy
Hadoop @ eBuddyHadoop @ eBuddy
Hadoop @ eBuddy
 
Geo data analytics
Geo data analyticsGeo data analytics
Geo data analytics
 
Caffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noelCaffe + H2O - By Cyprien noel
Caffe + H2O - By Cyprien noel
 
Cimagraphi8
Cimagraphi8Cimagraphi8
Cimagraphi8
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
Tms training
Tms trainingTms training
Tms training
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMsJava one2015 - Work With Hundreds of Hot Terabytes in JVMs
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tips
 
Analytical data processing
Analytical data processingAnalytical data processing
Analytical data processing
 
Demonstration
DemonstrationDemonstration
Demonstration
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at Scale
 
Redis Beyond
Redis BeyondRedis Beyond
Redis Beyond
 
Mongodb meetup
Mongodb meetupMongodb meetup
Mongodb meetup
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
Quick overview of MongoDB
Quick overview of MongoDBQuick overview of MongoDB
Quick overview of MongoDB
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 

Similar to Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)

Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Alexey Zinoviev
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013Sri Ambati
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
 
Which DBMS and Why?
Which DBMS and Why?Which DBMS and Why?
Which DBMS and Why?Majid Azimi
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013Andrew Dunstan
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningFromDual GmbH
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...
Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...
Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...DevOpsDays Tel Aviv
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAlex Pinkin
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 

Similar to Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014) (20)

Caching in
Caching inCaching in
Caching in
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Druid
DruidDruid
Druid
 
Caching in
Caching inCaching in
Caching in
 
2013 05 ny
2013 05 ny2013 05 ny
2013 05 ny
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013Sv big datascience_cliffclick_5_2_2013
Sv big datascience_cliffclick_5_2_2013
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Which DBMS and Why?
Which DBMS and Why?Which DBMS and Why?
Which DBMS and Why?
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...
Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...
Large Scale NoSql DB Migration Under Fire - Ido Barkan - DevOpsDays Tel Aviv ...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Austin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_dataAustin bdug 2011_01_27_small_and_big_data
Austin bdug 2011_01_27_small_and_big_data
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 

Recently uploaded

Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Recently uploaded (20)

Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Sharding: patterns and antipatterns (Osipov, Rybak, HighLoad'2014)

  • 1. Sharding: patterns and antipatterns Konstantin Osipov (Mail.Ru, Tarantool) Alexey Rybak (Badoo)
  • 2. Big picture: scalable databases ● replication ● sharding and re-sharding ● distributed queries & jobs, Map/Reduce ● DDL ● will focus on sharding/re-sharding only
  • 3. Contents I. sharding function II. routing III.re-sharding
  • 5. Selecting a good shard key ● the identified object should be small ● some data you won’t be able to shard (and have to duplicate in each shard) ● don’t store the key if you don’t have to
  • 6. Good and bad shard keys ● good: user session, shopping order ● maybe: user (if user data isn’t too thick) ● bad: inventory item, order date
  • 7. Garage sharding: numbers ● replication based doubling (2, 4, 8, out of cash) ● the magic number 48 (2✕3✕4)
  • 8. Garage sharding thru hashing ● good: remainders o f(key) ≡ key % n_srv o f(key) ≡ crc32(key) % n_srv ● bad: first login letter
  • 9. Sharding for grown-ups ● table function ● consistent hashing
  • 10. Table functions ● virtual buckets: key -> bucket -> shard o “key -> bucket” function, “bucket -> shard” table o “key -> bucket” table, “bucket -> shard” table
  • 11. Consistent hashing ● Danny Lewin RIP ● Kinda ring and like... uhm... points, you know ... ● Libraries: Ketama
  • 12. Guava/Sumbur ● f(key, n_servers) => server_id ● strictly uniform key-to-server mapping ● recurrence formula (15 lines of code)
  • 14. Routing types ● smart client ● coordinator ● proxy ● local proxy on every app server ● intra-database routing
  • 15. Smart Client ● no extra hops ● all clients (PHP/Python/C...) should implement ● resharding is hard
  • 16. Proxy ● encapsulates routing logic ● extra hop, traffic ● +1 service ● SPOF => local proxy
  • 17. Coordinator ● centralized knowledge ● SPOF
  • 18. Intra-database routing ● too many nodes ● redundancy is high ● ad-hoc requests
  • 20. Re-sharding is a pain ● redistribution impacts: o clients o network performance o consistency => maintenance time window ● forget about it on petabyte scale
  • 21. Best practice: no data redistribution ● update is a move ● data expiration (new data on new servers) ● new data on selected servers
  • 22. DDL ● upgrade your app ● upgrade your database ● update your app and remove any trace of old schema
  • 23. Thank you! Questions? kostja@tarantool.org fisher@corp.badoo.com

Editor's Notes

  1. Если мы будем обсуждать тему за пивом, то шардинг будем обсуждать в широком смысле: и мы одновременно поднимем кучу других тем: как выбрать ключ по которому шарить, собственно шардинг, как выбрать функцию шардинга, то есть алгоритм разбиения данных по серверам как поддерживать систему: решардить данные при добавлении нод или замене выбывших DDL, то есть обслуживание схемы данных и эволюция схемы данных распаралеливание запрсоов (запрашивать данные с нескольких нод, менять данные консистентно и т.д. Сегодня мы сфокусируемся на одной области, чтобы попытаться раскрыть её.
  2. То есть, мы смотрим конкретно на тему шардинга. Какие тут главные вопросы? Мы утверждаем, что это: то как мы разбиваем данные на кластер - функция шардинга как мы находин нужный при запросах, то есть адресация и роутинга как всем этим управлять, т.е. добавлять новые ноды
  3. Шардинг функция почему мы об этом говорим сначала Всё три части взаимосвязаны, но естественно когда данные перестают помещаться на одну машину, первое о чём мы думаем, это как их поделить. И это принципиальный вопрос - поделишь - получишь неработоспособную архитектуру, дорогую подддержку на долгие годы вперёд (т.к. downtime недопустим)
  4. Во-первых, имейте в виду, что размер объекта должын быть достаточно мал, чтобы шардинг был равномерным (тебе не повезло, ты на шарде с Джастином Бибером). Во-вторых, часто оказывается, что для определенных случаев вам либо нужно постоянно делать запросы к разным нодам, либо дублировать данные. Не бойтесь дублировать данные. В этом мире у нормальных форм не такая ценность, как в теории, забейте на нормальные формы, постройте всё вокруг сценария использования данных. Наконец, может оказаться, что размер ключа - это половина размера объекта, поэтому совершенно не обязательно ключ должен храниться в самих данных.
  5. Давайте рассмотрим примеры. Что скоре всего имеет маленький размер и размажется равномерно? Сессия, заказ. А данные пользователя? Уже не во всех случаях (комментарии к постам Джастина Бибера, но есть нюансы - если в соцсети по нескольку джастинов биберов на шард, то ок). Равномерность нужна не только для всех данных, но и для горячей части. Поэтому есть и совсем прохие примеры выбора ключа - например, дата заказа/поста, в этом случае данные размазываются равномерно, но горячие данные либо сидят на совсем небольшой части кластера, либо почти любая операция должна подгружать данные со многих нод. Реальная история из твиттера и выборов обамы - добавляли по несколько нод в день на новые твиты, порвали два баяна во время выборов Обамы, в итоге сменили схему шардинга.
  6. Есть парочка “олдскульных” рабочих способов, которые обеспечат шардинг в разумных пределах (условно, от 1 до 50 серверов). деление на двойку рулит, потому что решардинг только половины данных с каждой ноды каждый раз пиздец приходит на больших числах, т.к. нужно закупать много железа очень рабочий и удобный вариант когда вы знаете что в пределе не может быть данных больше чем X X < 50 узлов replication based doubling - это йогурт для админов завиточки к предыдущей схеме - 2*3*4 - фишеру не забыть их рассказать
  7. примеры на предыдущем слайде это уже функция хэширования - например выраженная в виде crc + остаток от деления, либо first login letter Ага! Идея - что мы ещё можем использовать как функцию хэширования Например first login letter - это как раз совершенно не гарантирует равномерность распределения, крайне неудобен в поддержке, когда окажется что одна буква не влезает в несколько серверов
  8. если предполагается полностью эластичный рост на тысячи серверов если нужно решение “из коробки” методы
  9. мы конфигурируем отображение ключа на шард заданное с помощью таблицы ключ на бакет отображается с помощью хэш функции бакет на шард отражается с помощью таблицы соответствия вопрсо: где мы храним эту таблицу? На центральном серевере либо на координаторе. В любом случае, возникает проблема распространения конфиугурации при её изменении максимально р может быть задано распределение центральный сервер как только ты используешь математику, ты теряешь в свободе = ты не можешь конкретный ключ при желании положить в конкретное место (это если есть промежуточные бакеты). Так что от бакетов иногда имеет смысл отказаться
  10. главная проблема - минимизировать количество ребалансировки при добавлении шарда 9/11 first victim, one of the founders of AKAMAI, consistent hashing and merkle trees - for load balancing of content delivery network you need a lot of virtual points otherwise you don’t have sufficient randomness библиотеки есть в open source - профит решардинг - при больших данных - положит вам сеть в любом случае,потому что заливка шарда всё равно положит роутер в стойке в которой находится этот шард таким образом, нерезиновый
  11. однозначная функция, принимает два числа и выдаёт server_id < n_servers очень ровно режет диапазон не имеет состояния (удобно разместить как на клиенте, так и на сервере, так и на прокси - всц равно где) минусы - долго работает при большом количестве шардов, сложность - N^2
  12. forget about it on petabyte scale patterns of avoiding resharding: update is a move expire