Alluxio - Virtual Unified File System

Alluxio –Virtual Unified File System
Li Haoyuan – Founder and CEO at Alluxio
haoyuan@alluxio.com

The Global Datasphere will grow from
33 ZB in 2018 to 175 ZB by 2025
China’s Datasphere is expected to grow 30% on average
over the next 7 years &
will be the largest Datasphere of all regions by 2025
Source: IDC White Paper – #US44413318

We are in the era where
Data is your biggest asset

Extracting maximum value from your data
The Data Ecosystem Evolution

Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE

Data Ecosystem 1.0 – The Challenges
STORAGE
COMPUTE
Complex
Low performance
Expensive

3 big trends driving the need for a new architecture
Separation of
Compute &
Storage
Hybrid –Multi
cloud
environments
Self-service data
across the
enterprise

The Data Architecture for the Digital Future

Core requirements of 2.0 data ecosystem
Unified Memory-first Native APIs Multi-hybrid cloud

Virtual Unified File System
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift Driver S3 Driver NFS Driver

Unified
Namespace
Bring all files into a
single interface
Interact with data
using any API
Accelerate & tier
data transparently
API
Translation
Intelligent
Multi-tiering
Key Innovations of theVirtual Unified File System

Unified Namespace: Global Data Accessibility
FUSE Interface makes all enterprise data available locally
SUPPORTS
• HDFS
• NFS
• OpenStack
• Ceph
• Amazon S3
• Azure
• Google Cloud
IT OPS FRIENDLY
• Storage mounted into Alluxio
by central IT
• Security in Alluxio mirrors
source data
• Authentication through
LDAP/AD
• Wireline encryption
HDFS #1
Object Store
NFS
HDFS #2

Server-side API Translation: From legacy to modern
Convert from Client-side Interface to native Storage Interface
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift DriverS3 Driver NFS Driver

Intelligent Multi-tiering: Get high-value data faster
Local performance from remote data using multi-tier storage
Hot Warm Cold
RAM SSD HDD
Read & Write Buffering
Transparent to App
Policies for pinning,
promotion/demotion,TTL

Virtual
Data Lake
§ Accelerate batch, micro-
batch & streaming jobs
§ Slowly transition to
lower cost object stores
§ Run in hybrid cloud
environment with
compute in the cloud
§ Accelerate ML jobs
running on object stores
or file systems
§ Provide consistent
performance to data
scientists
§ Provide unified interface
to access all data
§ Accelerate & tier data
transparently across
storage tiers
§ Co-locate remote data
with compute for
performance
Machine Learning
Productivity
Self-service data
across hybrid cloud
Popular Technical Use Cases

100+ Known Production Deployments
Massive clusters deployed, many with 500+ nodes

Financial Services Case Study
Machine Learning Use Case
Challenge –
Gain end to end view of business
with large volume of data
Queries were slow / not interactive,
resulting in operational inefficiency
Solution –
ETL Data from Teradata to Alluxio
Impact –
Faster Time to Market – “Now we
don’t have to work Sundays”
SPARK
TERADATA
SPARK
TERADATA

Retail Case Study
Customer Analytics Use Case
Challenge –
Bottleneck in Trend Analysis of
mission critical daily sales and
inventory management
Queries were slow / not interactive,
resulting in operational inefficiency
Solution –
With Alluxio, data queries are 10X
faster
Impact –
Higher operational efficiency
SPARK
HDFS
SPARK
HDFS

Telecom Case Study
Customer 360 Insights
Challenge –
Desired a central view of consumer
information in near real time for
proactive support.
Many HDFS, different distributions,
many incompatible versions. On-
prem & cloud. Integration through
heavy ETL.
Solution –
Alluxio integrates data into central
catalog for fast access to consumer
interaction records.
Impact –
Reduced integration time
Faster data speed & freshness
HADOOP ML HADOOP
HDFS HDFS HDFS
ML
ETL
HDP
HDFS
CDH
HDFS
MAPR
HDFS
HDFS

Machine Learning / Deep Learning –
Maximizes GPU investment:
• Self-serve data access for data
scientists
• Rapid integration of new data
sources
• Improved memory management &
performance

Incredible Open Source Momentum with growing community
920+ contributors &
growing
3760+ Git Stars
Apache 2.0 Licensed
Hundreds of thousands
of downloads
Download Alluxio today @ www.alluxio.org

ThankYou
Join the Alluxio Community
www.alluxio.org | www.alluxio.com | Twitter: @alluxio

Alluxio - Virtual Unified File System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Alluxio - Virtual Unified File System

Similar to Alluxio - Virtual Unified File System (20)

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Recently uploaded

Recently uploaded (20)

Alluxio - Virtual Unified File System