Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime

METRONOM, a multinational B2B Supermarket, migrated round 80 clusters with over 500 nodes from Rackspace in the UK to Google Cloud in Belgium, saving money and surviving Brexit. This is the story of how we managed this with zero production downtime, the problems and solutions we encountered on the way. An example of using Cassandra DCs to migrate data across geographical boundaries.

  • Login to see the comments

Migrating 500 Nodes from Rackspace to Google Cloud with Zero Downtime

  1. 1. Migrating 500 Nodes from Rackspace to Google With Zero Downtime
  2. 2. Gilberto Müller • Engineering Manager • 17 YoE • XP - Infrastructure and datastores • METRONOM for 2.5 years • Previously HSBC, Wipro, MasterCard • SRE enthusiast
  3. 3. Paul Chandler • Independent Cassandra Consultant • First used Cassandra in 2014 • Designed this Google Move process • Historically based in the Travel Industry British Airways, Avis, TUI etc
  4. 4. METRO • Leading international wholesale and retail food specialist company • 50+ years old • 35 countries • 764 stores (in 25 countries) • 150.000 people worldwide • ~24mn customers • €36.5bn on sales for fiscal year 2017/18
  5. 5. METRONOM • The biggest software company you never heard about (from our CEO) • Digital transformation started in 2015 • Platform as a Service and Dev • Cassandra started as the only option • 8 Platform teams (changing over time) • Multiple DCs in different countries, hybrid-cloud (EU, CH, and RU*) • 100+ application development teams • MCC main customer
  6. 6. NoSQL Team • 9 people from 10 different places • Agile: Dash • Shared responsibility • Consultancy • SRE • DevOps • Infrastructure as a Code • Provisioning, patch, upgrade • Support • Migrations • We offer a platform, not DBA • Service wrapper (whole platform) • Backup and restore (whole platform) • On-call
  7. 7. Products • Apache Cassandra • DataStax Enterprise • Apache Solr (Solr Cloud) • DSE Search • Apache Spark • HDFS* DataStax, is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache, Apache Cassandra, Cassandra, Apache Solr, Apache Spark, Spark, Apache Zookeeper, Zookeeper, Apache Hadoop, and Hadoop are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
  8. 8. Technologies and Numbers • Zookeeper • HAProxy • Nginx • OpsCenter • Graphana • PostgreSQL • Puppet • Jenkins • Java • Linux • 1200+ servers • 300+ clusters • 165+ C* (both flavours) • 80+ Solr
  9. 9. Implementation
  10. 10. Steady State - 1 Datacenter RS_UK RS_UK • Multiple Clusters • Move 1 cluster at time • No Downtime allowed
  11. 11. RS_UK • Local consistency types for Reading and Writing • LOCAL_ONE LOCAL_QUORUM • Application Driver needs to DC Aware policy • Light Weight Transactions (LWT) must use LOCAL_SERIAL Application Pre Requisites
  12. 12. RS_UK ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', ‘RS_UK': 3, ‘GL_EU': 3}; Keyspaces:  system_auth  system_schema  dse_leases  system_distributed  dse_perf  system_traces  dse_security Step 1 – Alter system keyspaces
  13. 13. RS_UK GL_EU• Can be different Number of Nodes • Only System keyspaces automatically migrated • Should be quick Step 2 - Create Nodes in New Datacenter
  14. 14. RS_UK GL_EUcassandra.yaml • cluster_name: Must be the same for both datacenters • seeds: should point to seeds in RS_UK cassandra-rackdc.properties • dc should be the new datacenter Continue using GossipingPropertyFileSnitch Step 2 - Create Nodes in New Datacenter
  15. 15. RS_UK GL_EU Nodes created and system keyspaces copied
  16. 16. RS_UK GL_EU • Must still connect to RS_UK • No Data in GL_EU Nodes created and system keyspaces copied
  17. 17. RS_UK GL_EU • ALTER KEYSPACE user_keyspace1 WITH replication = {'class': 'NetworkTopologyStrategy', ‘RS_UK': 3, ‘GL_EU': 3}; • ALTER KEYSPACE user_keyspace2 WITH replication = {'class': 'NetworkTopologyStrategy', ‘RS_UK': 3, ‘GL_EU': 3}; • ALTER KEYSPACE user_keyspace3 WITH replication = {'class': 'NetworkTopologyStrategy', ‘RS_UK': 3, ‘GL_EU': 3}; Step 3 – Alter Replication for User Keyspaces
  18. 18. RS_UK GL_EUAt This Point: • Inserted data replicated • Old data not replicated (yet) • Still don’t connect • Lots of data missing Keyspaces Replicated
  19. 19. RS_UK GL_EUOn each new node run in turn • nodetool rebuild RS_UK This will take some time, best to script this section Step 4 – Rebuild Nodes
  20. 20. RS_UK GL_EUNodes gain data one node at a time Step 4 – Rebuild Nodes
  21. 21. RS_UK GL_EUFully functioning cluster: • Connect to either DC • Data flows automatically Nodes Rebuilt
  22. 22. RS_UK GL_EUcassandra.yaml change seed nodes to be nodes in GL_EU Point all applications to new datacenter Full repair on all nodes in new datacenter Prepare for Decommission
  23. 23. RS_UK GL_EU Prepare for Decommission
  24. 24. RS_UK GL_EU • ALTER KEYSPACE user_keyspace1 WITH replication = {'class': 'NetworkTopologyStrategy', ‘GL_EU': 3}; • ALTER KEYSPACE user_keyspace2 WITH replication = {'class': 'NetworkTopologyStrategy', ‘GL_EU': 3}; • ALTER KEYSPACE user_keyspace3 WITH replication = {'class': 'NetworkTopologyStrategy', ‘ ‘GL_EU': 3}; • Plus system keyspaces Alter Replication to one Datacenter for ALL keyspaces
  25. 25. RS_UK GL_EU Data now Disconnectecd
  26. 26. RS_UK GL_EU • Stop each node in RS_UK • Decommission each node in turn • nodetool removenode xxxxxxxxxxxxxxxx Decommission RS_UK nodes
  27. 27. RS_UK GL_EU Decommission RS_UK nodes Datacenter: RS_UK =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.29.30.29 11.66 GB 256 ? ab479afd-c754-47f7-92fb-47790d734ac9 rack1 UN 10.29.30.33 12.32 GB 256 ? 9aa1c5c5-c6cd-4267-ba68-c6bd8b2ac460 rack2 UN 10.29.30.34 12.16 GB 256 ? db454258-ac73-4a8a-9c75-226108c66889 rack3 Datacenter: GL_EU =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.131.134.35 13.19 GB 256 ? 114b4a37-7d69-40e5-988b-a4c998e7a02a rack1 UN 10.131.134.39 12.14 GB 256 ? 4173fc2a-e65c-43aa-baa4-a5eefe0ceb60 rack2 UN 10.131.134.42 12.97 GB 256 ? 8b5dde02-1ff1-48cc-9900-6d8f2bb339bf rack3 nodetool removenode ab479afd-c754-47f7-92fb-47790d734ac9
  28. 28. RS_UK GL_EU• Data successfully moved • Old Datacenter decommissioned Movement Complete
  29. 29. What Possibly Could Go Wrong ?
  30. 30. Network Performance Test the network performance between Datacenters
  31. 31. Network Performance • Enough Bandwidth • Not stealing all bandwidth
  32. 32. iperf3 • iperf3 –s • iperf3 -c xxx.xxx.xxx.xxxx • iperf3 -c xxx.xxx.xxx.xxxx -b 10G • iperf3 -c xxx.xxx.xxx.xxxx -C yeah [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 17.1 GBytes 14.7 Gbits/sec
  33. 33. net.ipv4.tcp_congestion_control=yeah Nodetool setinterdcstreamthroughput xxx
  34. 34. Views ( Pre 5.0.12 only )
  35. 35. Views • Views rebuilt – not streamed • Uses selects on table to rebuild = Tombstone Trouble
  36. 36. Memory Heavy use of Heap memory
  37. 37. Heap Size • Streaming and Compaction use up memory • Heap size can be increased • Don’t need to worry about GC pauses • Change back before connecting applications
  38. 38. Compaction Throughput • Large amount of data streamed • Compaction Lag • Lots of small sstables • Update Compaction Throughput nodetool setcompactionthroughput xxxxx
  39. 39. Streaming Throughput • Reduce pressure if needed • Reduce only streaming between datacenters nodetool setinterdcstreamthroughput xxxxx
  40. 40. Application Latency
  41. 41. RS_UK GL_EUselect column from table where id = 1 • 3 nodes holding data per DC Multi DC Replication
  42. 42. RS_UK GL_EU 2 nodes of: Node3 Node4 Node5 LOCAL QUORUM
  43. 43. RS_UK GL_EU 4 nodes of: Node3 Node4 Node5 Node8 Node9 Node10 At least one in 2nd DC 250 miles 22 m/s QUORUM
  44. 44. Lightweight Transactions (LWT) insert into table (id, name) values (1, “Name” ) IF NOT EXISTS Uses Paxos algorithm Uses different consistency level for Paxos SERIAL or LOCAL_SERIAL
  45. 45. RS_UK GL_EUselect column from table where id = 1 • Without DC aware there will be problems Load Balancing Policy
  46. 46. Implementation • DB of cluster and node names • Automatic scripts to create cloud instances • Scale clusters up or down • Puppet • Jenkins jobs • Rebuild stage • Decommission stage • Service wrapper to protect integrity of cluster
  47. 47. Conclusion
  48. 48. Success • 91 Clusters moved • Solr migration (not covered here) • No C* cluster downtime • Incorrect consistency sometimes caused application downtime • April 2018 - October 2018 • One cluster delayed until February 2019 • Padding 0s with compression • Automation is a must
  49. 49. Process can also be used for • Splitting clusters (i.e.: multi-tenant) • Updating non-trivial configuration • num_tokens • Upgrading underlying operating system • Ubuntu upgrades (upstart –> systemd)
  50. 50. Thank You More details can be found at: https://bit.ly/2Lnosw6 Paul ChandlerGilberto Müller Any Questions?

×