Hive/Spark/HBase on S3 & NFS - No More HDFS Operation

Hive/Spark/HBase on S3 & NFS
– No More HDFS Operation
( / Yifeng Jiang)
@uprush
March 14th, 2019

Yifeng Jiang
Yifeng Jiang / / ( )
• APJ Solution Architect, Data Science @ Pure Storage
• Big data, machine learning, cloud, PaaS, web systems
Prior to Pure
• Hadooper since 2009
• HBook author
• Software engineer, PaaS, Cloud

Agenda
• Separate compute and storage
• Hive & Spark on S3 with S3A Committers
• HBase on NFS
• Demo, benchmark and case study

Hadoop Common Pain Points
• Hardware complexity: racks of server, cables, power
• Availability SLA’s / HDFS operation
• Unbalanced CPU and storage demands
• Software complexity: 20+ components
• Performance Tuning

Separate Compute & Storage
• Hardware complexity: racks of server, cables, power
• Availability SLA’s / HDFS operation
• Unbalanced CPU and storage demands
• Software complexity: 20+ components
• Performance Tuning
Virtual machine
S3/NFS
Scale independently

Modernizing Data Analytics
DATA INGEST
Search
NFS
ETL
S3
QueryStore
Deep Learning
Storage Backend

Cluster Topology
node1
NAS/Object Storage
NFSS3
SAN Block Storage
iSCSI/FC
node2 nodeN…
Hadoop/Spark Cluster
On Virtual Machine
• OS
• Hadoop binary, HDFS
• SAN volume
• HTTP only
• Data lake, file
• Spark, Hive
• Same mount point
on nodes
• HBase
S3A NFSXFS/Ext4

Hadoop S3a Library
Hadoop DFS protocol
• The communication protocol between NN, DN and client
• Default implementation: HDFS
• Other implementations: S3a, Azure, local FS, etc.
Hadoop S3A library
• Hadoop DFS protocol implementation for S3 compatible
storage like Amazon S3, Pure Storage FlashBlade.
• Enable the Hadoop ecosystem (Spark, Hive, MR, etc.) to
store and process data in S3 object storage.
• Several years in production, heavily used on cloud.

THE ALL-FLASH DATA HUB FOR MODERN ANALYTICS
10+ PBs
/ RACK
DENSITY
FILE & OBJECT
CONVERGED
75 GB/S BW
7.5 M+ NFS OPS
BIG + FAST
JUST ADD
A BLADE!
SIMPLE ELASTIC SCALE

Hadoop Ecosystem and S3
How it works?
• Hadoop ecosystem (Spark, Hive, MR, etc.) uses HDFS client internally.
• Spark executor -> HDFS client -> storage
• Hive on Tez container -> HDFS client -> storage
• HDFS client speaks Hadoop DFS protocol.
• Client automatically choose proper implementation to use base on schema.
• /user/joe/data -> HDFS
• file:///user/joe/data -> local FS (including NFS)
• s3a://user/joe/data -> S3A
• Exception: HBase (details covered later)

Spark on S3
Spark submit
• Changes: use s3a:// as input/output.
• Temporary data I/O: YARN -> HDFS
• User data I/O: Spark executors -> S3A -> S3
YARN RM
spark-
submit
YARN
Container
HDFS
S3
temp data
val flatJSON =
sc.textFile("s3a://deephub/tweets/")

Hadoop on S3 Challenges
Consistency model
• Amazon S3: eventual consistency, use S3Guard
• S3 compatible storage (e.g. FlashBlade S3) supports strong consistency
Slow “rename”
• “rename” is critical in HDFS to support atomic commits, like Linux “mv”
• S3 does not support “rename” natively.
• S3A simulates “rename” as LIST – COPY - DELETE

Slow S3 “Rename”
Image source: https://stackoverflow.com/questions/42822483/extremely-slow-s3-write-times-from-emr-spark/42835927

Hadoop on S3 Updates
Make S3A cloud native
• Hundreds of JIRAs
• Robustness, scale and
performance
• S3Guard
• Zero-rename S3A committers.
Use Hadoop 3.1~

S3A Committers
Originally S3A use FileOutputCommitter, which relays on “rename”
Zero-rename, cloud-native S3A Committers
• Staging committer: directory & partitioned
• Magic committer

S3A Committers
Staging Committer
• Does not require strong consistency.
• Proven to work at Netflix.
• Requires large local FS space.
Magic committer
• Require strong consistency. S3Guard on
AWS, FlashBlade S3, etc.
• Faster. Use less local local FS space.
• Less stable/tested than staging
committer.
Common Key Points
• Fast, no “rename”
• Both leverage S3 transactional multi-parts upload

S3A Committer Benchmark
• 1TB MR random text writes
• FlashBlade S3
• 15 blades
• Plenty compute nodes
0
200
400
600
800
1000
1200
file directory magic
time(s)
Output Committer
1TB Genwords Bench

Teragen Benchmark
Magic committer
faster
File committer
slow
Staging committer
fast
S3 ReadLocal FS ReadLess local FS
Read

HBase on NFS
The peace of volcano

HBase & HDFS
What HBase want from HDFS?
• HFile & WAL durability
• Scale & performance
• Mostly latency, but also
throughput
What HBase does NOT want from HDFS?
• Noisy neighbors
• Co-locating compute: YARN, Spark
• Co-locating data: Hive data
warehouse
• Complexity: operation & upgrade

HBase on NFS
How it works?
• Use NFS as HBase root and staging directory.
• Same mount NFS point on all RegionServer nodes
• Point HBase to store data in that mount point
• Leverage HDFS local FS implementation (file:///mnt/hbase)
No change on application.
• Clients only see HBase tables.
HMasterclient
Region
Server
NFS
Table API

HFile Durability
• HDFS uses 3x replication to protect HFile.
• HFile replication is not necessary in enterprise NFS
• Erasure coding or RAID like data protection within/across storage
array
• Amazon EFS stores data within and across multiple AZs
• FlashBlade supports N+2 data durability and high availability

HBase WAL Durability
• HDFS uses “hflush/hsync” API to ensure WAL is safely flushed to multiple
data nodes before acknowledging clients.
• Not necessary in enterprise NFS
• FlashBlade acknowledges writes after data is persisted in NVRAM on 3
blades.

NFS Performance for HBase
• Depend on NFS implementation.
• NFS is generally good for random access.
• Also check throughput.
• Flash storage is ideal for HBase.
• All-flash scale-out NFS such as Pure Storage FlashBlade

HBase PE on FlashBlade NFS
random writes
1RS, 1M rows/client,
10 clients
7 blades
random reads
1RS, 100K rows/client,
20 clients
7 blades
Memstore flush
storm?
Block cache affects
result
Latency seen by
storage, stable

HBase PE on Amazon EFS
random writes
1RS, 1M rows/client,
10 clients
1024MB/s provisioned
throughput EFS
random reads
1RS, 100K rows/client,
20 clients
1024MB/s provisioned
throughput EFS
Region too busy,
memstore flush is slow

Key Takeaways
• Storage options for cloud-era Hadoop/Spark
• Hive & Spark on S3 with cloud-native S3A Committers
• HBase on enterprise NFS
• Available in cloud and on premise (Pure Storage FlashBlade)
• Additional benefits: always-on compression, encryption, etc.
• Proven to work
• Simple, reliable, performant
• No more HDFS operation
• Virtualize your Hadoop/Spark cluster

Hive/Spark/HBase on S3 & NFS - No More HDFS Operation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hive/Spark/HBase on S3 & NFS - No More HDFS Operation

Similar to Hive/Spark/HBase on S3 & NFS - No More HDFS Operation (20)

More from Yifeng Jiang

More from Yifeng Jiang (20)

Recently uploaded

Recently uploaded (20)

Hive/Spark/HBase on S3 & NFS - No More HDFS Operation