Good afternoon everyone, welcome to the session on Geo-Distributed storage for Big Data. Hope you have enjoyed the conference so far.
I would like to start our session today with a couple of “show of hands” questions:
How many of you have clusters in more than than one region/GEO?
How many of you use distcp today?
Other tools apart from “distcp”?
About us…
1. Today’s applications generate and consume data from all across the globe. But Hadoop has mostly remained confined to a single datacenter.
2. Hadoop has grown and matured around HDFS as its filesystem. HDFS did a great job in providing a petabyte scale, reliable storage system built on commodity hardware.
3. It’s storage layer is yet to evolve into a global filesystem with characteristics like geo-access, active-active access, selective replication.
4. Most vendor tools have relied on distcp based mechanisms…
----------
New age applications such as Mobile, social and IoT apps create and consume data across the globe.
This gives rise to multiple apps creating silos of data that are accessed via different protocols like S3, NFS, Swift etc depending on the application. This data is then further copied into other silos specifically for analytics purposes. This leads to overall operational complexity as well higher storage costs.
[Animated]
Spend a lot of time on this slide. This will be framed as “desirable characteristics of a general geo-hadoop platform”. Stress of active-active and strong consistency.
Not ECS specific yet.
Reiterate the concept from previous slide –
Mention WAN distances
Spend time here. Hopefully, HCFS has already been explained before.
Vishrut can take over from here.
I will try to do a humorous hand off. Something like “Now, my technical colleague will show you that this is not all smoke and mirrors. He has also graciously agreed to field all hard questions”.
[Animated] Point audience to some material available for further reading about this partnership
Yet to decide if we want to move this inside our video demo, or do this via a screenshot-based walkthough
[Animated]
Simple animation to ensure that audience understands “global hadoop” only applies to storage. Apps (MR/Hbase) are not geo-accessible.
And to demonstrate this solution only replaces the storage layer (HDFS)