Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sharding MySQL with Vitess

Sharding MySQL sakila database with Vitess. You can find demo materials below github repository

https://github.com/harunkucuk5/sakiladb-vitess

  • Be the first to comment

Sharding MySQL with Vitess

  1. 1. Sharding MySQL with Vitess Harun Küçük
  2. 2. What is Sharding? Sharding is a type of database partitioning that separates very large databases into smaller,faster and more easily managed parts called data shards.
  3. 3. • Non-Scalable Master Why we need Sharding? Sample Traditional MySQL Replication • Scalable App Layer • Scalable Replicas
  4. 4. Vitess • Started 2010 , youtube https://vitess.io/ • Open source since 2011 https://github.com/vitessio/vitess • Incubating project in CNCF https://www.cncf.io/projects/
  5. 5. Vitess Architecture • Lightweight proxy server • Routes traffic to correct vttablet • Returns consolidated results back to the clients
  6. 6. Vitess Architecture • Proxy server that sits in front of MySQL instance • Protect MySQL from harmful queries • Connection Pooling • Query rewriting • Hot row protection
  7. 7. Vitess Architecture • Stores metadata (running servers,sharding schema,Replication Graph) • Etcd, Apache Zookeeper or consul could be used for topology
  8. 8. Vitess Architecture • Vtctl is command line tool, Vtctld is an HTTP server that lets you browse the information stored in the topology.
  9. 9. Vitess Architecture • Replica tablets: candidates for master tablet , Readonly tables: for batch jobs, resharding,bigdata,backups etc.
  10. 10. Vitess Key Adaptors • Started 2010 at Youtube and It has been serving all Youtube database traffic since 2011. Youtube had 256 shards and each shards had between 80 and 120 replicas across 20 datacenters all around the world. (Approx. 256K instance) • JD.com is the 2nd largest retailer company in China. JD.com has more than 10.000 instance (master,replicas) in Vitess on kubernetes cluster • Square Cash App fully runs on Vitess. Square has more than 64 shards. • Slack migrated 40% database traffic to Vitess and their goal is 100% • Pinterest’s all of advertising campaign management fully runs on Vitess
  11. 11. Example: Sakila DVD Rental Company Database Lets suppose we have a DVD rental company and our database diagram live below
  12. 12. Day 1: Table Rows Query:
  13. 13. 30 days later… Query:
  14. 14. 6 months later… Query:
  15. 15. 2 years later… Query:
  16. 16. Entity Group for Sharding Payment,Rental and Customer tables have customer_id column. So these three tables should be sharded horizontally by customer_id
  17. 17. Step 1: Vertical Sharding
  18. 18. Step 2: Horizontal Sharding
  19. 19. Step 3: Resharding
  20. 20. Prerequisites for Demo • Vitess (https://github.com/vitessio/vitess) • Kubernetes Cluster or Minicube • Etcd operator for topology cluster • Helm for vitess helm charts (vitess/helm/vitess) • NFS client provisioner for persistent NFS volumes (https://github.com/helm/charts/tree/master/stable/nfs-client-provisioner)
  21. 21. 1.) initiate new cluster, Concepts File: initial_cluster.yaml • Cell : Zone, availability zone, datacenter • KeySpace : Logical Database • Vschema : Vitess Schema, contains metadata about how tables are organized across keyspaces and shards • Vindex : index to find shards
  22. 22. 1.) initiate new cluster File: initial_cluster.yaml
  23. 23. 1.) initiate new cluster
  24. 24. 2.) Create New Keyspace File:create_keyspace.yaml
  25. 25. 2.) Create New Keyspace File:create_keyspace.yaml
  26. 26. 3.) Split Keyspace Schema File:split_keyspace_schema.yaml
  27. 27. 3.) Split Keyspace Schema File:split_keyspace_schema.yaml
  28. 28. 3.) Split KeySpace Schema
  29. 29. 4.) Vertical Split Clone File:vertical_split_clone.yaml
  30. 30. 4.) Vertical Split Clone File:vertical_split_clone.yaml
  31. 31. 4.) Vertical Split Clone
  32. 32. 4.) Migrate File: migrate_readonly_replica.yaml migrate_master.yaml
  33. 33. 4.) Migrate ReadOnly and Replica
  34. 34. 4.) Migrate Master
  35. 35. 5.) Drop Blacklisted Tables
  36. 36. VtGate Quey Routing
  37. 37. VtGate Quey Routing
  38. 38. Horizontal Sharding
  39. 39. 6.) Primary Vindex for Horizontal Sharding File: create_vindex.yaml
  40. 40. Sharding by Hash Function
  41. 41. Sharding by Hash Function 1001 1000 1002 1003 1004 1005 1006 1007
  42. 42. Vitess Hash Algorithm (3des hash) 4000 = 4000000000000000 8000 = 8000000000000000
  43. 43. Vitess Hash Based Sharding (3des hash) 264 combination. theoretically allows us to have an infinite number of shards
  44. 44. Vitess Hash Based Sharding (3des hash)
  45. 45. 7.) Create Horizontal Shards File: create_horizontal_shards.yaml
  46. 46. 7.) Create Horizontal Shards
  47. 47. 7.) Horizontal SplitClone File: horizontal_split_clone.yaml
  48. 48. 7.) Horizontal SplitClone File: horizontal_split_clone.yaml
  49. 49. 7.) Horizontal SplitClone File: horizontal_split_clone.yaml
  50. 50. 8.) Migrate RdOnly,Replica and Master File: 13_migrate_readonly_replica.yaml 14_migrate_master.yaml
  51. 51. 8.) Migrate RdOnly,Replica and Master File: 13_migrate_readonly_replica.yaml 14_migrate_master.yaml
  52. 52. Querying Vitess Cluster - Counts
  53. 53. Querying Vitess Cluster - Routing
  54. 54. Querying Vitess Cluster - Routing
  55. 55. Querying Vitess Cluster – @replica,@rdonly
  56. 56. Querying Vitess Cluster – Cross Shard Join
  57. 57. Querying Vitess Cluster – Cross Shard Join
  58. 58. Querying Vitess Cluster – Cross Shard Join = + N * N = row count of 2. query’s result 1 2 3
  59. 59. VReplication Files:copy_schema_shard.sh,vreplication.sh,ap ply_routing_rules.sh,add_reference_table.sh
  60. 60. Querying Vitess Cluster –Cross Shard Join,VReplication
  61. 61. Querying Vitess Cluster –Scatter Query
  62. 62. Querying Vitess Cluster –Scatter Query LookupVindex
  63. 63. Drawbacks: Application Transactions • Best Effort Commit -> Consistency Problems • 2 Phase Commit -> Performance Cost
  64. 64. Drawbacks: Distributed Deadlocks Shard Level Deadlock
  65. 65. Drawbacks: Distributed Deadlocks Database Level Deadlock
  66. 66. Application Transactions
  67. 67. Q&A github.com/vitessio/vitess vitess.io vitess.slack.com
  68. 68. Thank You

×