SlideShare a Scribd company logo
1 of 17
Download to read offline
eCommerce data migration
in moving systems across
Data Centres
Regu, Sharad
Flipkart in recent times
● Leading eCommerce player in India
○ 10M page visits, 2M shipments a day
○ 30 million products across more than 70 categories
○ Big Billion Days ($300M sales, top ranked app on Google
Play Store)
○ Ping - social collab in eCommerce
○ Progressive web app - native like experience on browser
(debuted at #chromedevsummit 2015)
Data at Flipkart - User & Order path
Data at Flipkart - Data Platform
● 6 TB new data ingested daily
○ 30 TB on sale days
● 1100 Raw Streams
● 3 Billion Raw events in a day
● 0.6 PB data processed daily
● 10K Hadoop jobs daily
● 3000 Report views daily
DC Landscape
DC2
(Chennai)
DC1
(Chennai)
DC3
(Chennai)
1 Gbps
2 x 10 Gbps
Primary : All User,
Order & FDP
systems
Secondary : Few batch
processing systems
(UIE, Reco, Ads)
New : All User & FDP
systems
Data migration needs, challenges
● User path systems
○ Minimize downtime. Site & App downtime is visible
■ Data - mostly eventually consistent
○ Session Data - Avoid User logout, Service scale : 250K RPS
○ Promise Data (Stock, Serviceability) - Avoid OOS, Over-booking (consistency matters)
○ Live Orders - Accept orders, Let customers checkout & pay (consistency matters)
○ User Accounts - No data loss. Change velocity is not much
● Order path systems
○ Availability not a constraint, Throughput and Data durability is
■ Data - strong durability, consistent
○ Current orders being fulfilled
○ Warehouse stock inwarding, movement
Data migration needs, challenges
● User Insights
○ Inter DC data bandwidth limitations (1Gbps shared link)
○ 130 TB (snappy compressed) data in HBase, Derived data(Insights) much smaller though
● MySQL instance footprint - 600+
● Flipkart Data Platform
○ Data publishers/consumers not moving together
■ Data consumers could move earlier than the publishers, vice-versa
○ Migrating couple of PB data not feasible over network
○ Consistency for raw, prepared and reporting data
Migration planning and execution
● Most useful tool - Google spreadsheets & docs!
○ Inventory of systems in each business cluster - split by service, backing data store
○ Defined data migration recipes and SME group for each data store type
■ Advise on IaaS constructs - instance types, PaaS integration - service discovery, Data
migration strategy (export vs live replication), built tooling
○ Create cutover sequence and interdependencies
■ for e.g. Catalog → Search → Cart/CO → Mobile apps
○ Wrote playbook for each cutover activity - including checklists, verification of data
export/restore
● Program managed a plan that touched 1000+ systems and many of the 1000
member engineering org
Knowledge base on 3rd party data stores, packages
Hacks, Tools and Utilities
● “Never underestimate the bandwidth of a station wagon full of tapes hurtling
down the highway” -- Andrew S. Tanenbaum (Computer Networks, 4th ed., p. 91)
○ We used disks instead to move User Insights data (stored in HBase)
■ Moves snapshots of derived/computed data over wire (relatively small)
■ Avoided HBase export. Instead transferred HFiles into disks using custom ‘distcp’ like
tool which knapsack'ed ~40K files into 6 disks. Open sourced as : https://github.
com/flipkart-incubator/blueshift
■ Disked shipped to new DC
■ Transferred HFiles into HDFS using Blueshift
■ Imported HFiles into HBase using HBase Bulk Load
Migrating live User sessions - dual write
● Cold data in HBase (9TB - compressed), hot in Memcached(1TB)
● Live read-writes on Memcached, async batched writes to HBase
● Migration via Dual writes
○ Fresh Memcached cluster in new DC
○ Added this cluster as another batched write destination in old DC
○ Data move initiated 21 days before actual-cutover to allow for catchup
○ Hbase data was exported using standard snapshotting and incremental copy table periodically
○ Batch interval reduced from 10 minutes to 1 minutes during cutover for aggressive copy
● No user logout, session loss after cutover
Migrating Product catalog data
● Data modelled as Entities & Relationships : clients have “Views” of this data
● Views expressed as JSON DSL
● Raw data exported from HBase, Elastic Search and copied to new DC
● Required a solution that could migrate updates after initial move
○ Developed JSON diff library that could work over 100 million views. Open sourced : https:
//github.com/flipkart-incubator/zjsonpatch
■ Diffs are applied in order - important for DC move
○ Bandwidth consumption for applying updates dropped from 800 Mbps to 13-14 Mbps
MySQL migration utility
Application Relay bridge over Kafka queues
● RESTBus :
orchestrates all Order
fulfillment systems
● Pattern : Locally
committed messages
in MySQL, relayed
over Kafka to Http
endpoints
● Bridge over 2 DCs with
destinations resolved
from ELB endpoints
2 way sync of Kafka Streams
Mirror
Old DC New DC
Copying data across clusters
● Only copied raw data about 200TB compressed
○ All prepared and reports data generated from raw data
● Verification utilities to check correctness in data in both clusters
● Ran the full data platform stack in both places for over 2 weeks till all data
publishers and consumers move
Thank You

More Related Content

What's hot

Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKKriangkrai Chaonithi
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysRahul Agarwal
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudShubham Tagra
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioAlluxio, Inc.
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaObjectRocket
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevAltinity Ltd
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at HotstarKafkaZone
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...ScyllaDB
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 

What's hot (20)

Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
DC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion DaysDC Migration and Hadoop Scale For Big Billion Days
DC Migration and Hadoop Scale For Big Billion Days
 
RubiX
RubiXRubiX
RubiX
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
Alluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the CloudAlluxio Data Orchestration Platform for the Cloud
Alluxio Data Orchestration Platform for the Cloud
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 

Viewers also liked

Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uidAjit Dadresa
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome themsaipriyadonthula
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantomRegunath B
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Ali Raw
 

Viewers also liked (10)

Unique identification authority of india uid
Unique identification authority of india   uidUnique identification authority of india   uid
Unique identification authority of india uid
 
What database
What databaseWhat database
What database
 
Srikanth Nadhamuni
Srikanth NadhamuniSrikanth Nadhamuni
Srikanth Nadhamuni
 
Aadhaar
AadhaarAadhaar
Aadhaar
 
Uid
UidUid
Uid
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
practical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome thempractical risks in aadhaar project and measures to overcome them
practical risks in aadhaar project and measures to overcome them
 
Building the Flipkart phantom
Building the Flipkart phantomBuilding the Flipkart phantom
Building the Flipkart phantom
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 
Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)Authentication(pswrd,token,certificate,biometric)
Authentication(pswrd,token,certificate,biometric)
 

Similar to E commerce data migration in moving systems across data centres

Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processJampp
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysWout Scheepers
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 

Similar to E commerce data migration in moving systems across data centres (20)

Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Big data
Big dataBig data
Big data
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Lecture1
Lecture1Lecture1
Lecture1
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

E commerce data migration in moving systems across data centres

  • 1. eCommerce data migration in moving systems across Data Centres Regu, Sharad
  • 2. Flipkart in recent times ● Leading eCommerce player in India ○ 10M page visits, 2M shipments a day ○ 30 million products across more than 70 categories ○ Big Billion Days ($300M sales, top ranked app on Google Play Store) ○ Ping - social collab in eCommerce ○ Progressive web app - native like experience on browser (debuted at #chromedevsummit 2015)
  • 3. Data at Flipkart - User & Order path
  • 4. Data at Flipkart - Data Platform ● 6 TB new data ingested daily ○ 30 TB on sale days ● 1100 Raw Streams ● 3 Billion Raw events in a day ● 0.6 PB data processed daily ● 10K Hadoop jobs daily ● 3000 Report views daily
  • 5. DC Landscape DC2 (Chennai) DC1 (Chennai) DC3 (Chennai) 1 Gbps 2 x 10 Gbps Primary : All User, Order & FDP systems Secondary : Few batch processing systems (UIE, Reco, Ads) New : All User & FDP systems
  • 6. Data migration needs, challenges ● User path systems ○ Minimize downtime. Site & App downtime is visible ■ Data - mostly eventually consistent ○ Session Data - Avoid User logout, Service scale : 250K RPS ○ Promise Data (Stock, Serviceability) - Avoid OOS, Over-booking (consistency matters) ○ Live Orders - Accept orders, Let customers checkout & pay (consistency matters) ○ User Accounts - No data loss. Change velocity is not much ● Order path systems ○ Availability not a constraint, Throughput and Data durability is ■ Data - strong durability, consistent ○ Current orders being fulfilled ○ Warehouse stock inwarding, movement
  • 7. Data migration needs, challenges ● User Insights ○ Inter DC data bandwidth limitations (1Gbps shared link) ○ 130 TB (snappy compressed) data in HBase, Derived data(Insights) much smaller though ● MySQL instance footprint - 600+ ● Flipkart Data Platform ○ Data publishers/consumers not moving together ■ Data consumers could move earlier than the publishers, vice-versa ○ Migrating couple of PB data not feasible over network ○ Consistency for raw, prepared and reporting data
  • 8. Migration planning and execution ● Most useful tool - Google spreadsheets & docs! ○ Inventory of systems in each business cluster - split by service, backing data store ○ Defined data migration recipes and SME group for each data store type ■ Advise on IaaS constructs - instance types, PaaS integration - service discovery, Data migration strategy (export vs live replication), built tooling ○ Create cutover sequence and interdependencies ■ for e.g. Catalog → Search → Cart/CO → Mobile apps ○ Wrote playbook for each cutover activity - including checklists, verification of data export/restore ● Program managed a plan that touched 1000+ systems and many of the 1000 member engineering org
  • 9. Knowledge base on 3rd party data stores, packages
  • 10. Hacks, Tools and Utilities ● “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway” -- Andrew S. Tanenbaum (Computer Networks, 4th ed., p. 91) ○ We used disks instead to move User Insights data (stored in HBase) ■ Moves snapshots of derived/computed data over wire (relatively small) ■ Avoided HBase export. Instead transferred HFiles into disks using custom ‘distcp’ like tool which knapsack'ed ~40K files into 6 disks. Open sourced as : https://github. com/flipkart-incubator/blueshift ■ Disked shipped to new DC ■ Transferred HFiles into HDFS using Blueshift ■ Imported HFiles into HBase using HBase Bulk Load
  • 11. Migrating live User sessions - dual write ● Cold data in HBase (9TB - compressed), hot in Memcached(1TB) ● Live read-writes on Memcached, async batched writes to HBase ● Migration via Dual writes ○ Fresh Memcached cluster in new DC ○ Added this cluster as another batched write destination in old DC ○ Data move initiated 21 days before actual-cutover to allow for catchup ○ Hbase data was exported using standard snapshotting and incremental copy table periodically ○ Batch interval reduced from 10 minutes to 1 minutes during cutover for aggressive copy ● No user logout, session loss after cutover
  • 12. Migrating Product catalog data ● Data modelled as Entities & Relationships : clients have “Views” of this data ● Views expressed as JSON DSL ● Raw data exported from HBase, Elastic Search and copied to new DC ● Required a solution that could migrate updates after initial move ○ Developed JSON diff library that could work over 100 million views. Open sourced : https: //github.com/flipkart-incubator/zjsonpatch ■ Diffs are applied in order - important for DC move ○ Bandwidth consumption for applying updates dropped from 800 Mbps to 13-14 Mbps
  • 14. Application Relay bridge over Kafka queues ● RESTBus : orchestrates all Order fulfillment systems ● Pattern : Locally committed messages in MySQL, relayed over Kafka to Http endpoints ● Bridge over 2 DCs with destinations resolved from ELB endpoints
  • 15. 2 way sync of Kafka Streams Mirror Old DC New DC
  • 16. Copying data across clusters ● Only copied raw data about 200TB compressed ○ All prepared and reports data generated from raw data ● Verification utilities to check correctness in data in both clusters ● Ran the full data platform stack in both places for over 2 weeks till all data publishers and consumers move