SlideShare a Scribd company logo
1 of 39
Download to read offline
Unified Big Data Analytics:
Any stack, Any Cloud
1
2019/01/22 @ CloudHealth by VMware
Follow us | @alluxio
Download Alluxio | www.alluxio.org
Questions? | http://alluxio.org/slack
About Me
• Bin Fan
• PhD CS@CMU
• Founding Engineer@Alluxio
2
Email: binfan@alluxio.com
Github: apc999
Twitter: @binfan
Company
Overview
• Founded Feb. 2015 – Haoyuan Li
• PhD research at UC Berkeley AMPLab
• Initially Tachyon Nexus
• Venture Backed
• Andreessen Horowitz, Seven Seas etc.
• Open Source Business Model
• Tachyon Open Sourced in Dec. 2012
• Open source v1.0 released Feb. 2016
• Commercial product released Oct. 2016
• Office in San Mateo, CA
• Team: Google, Palantir, Vmware, AMD, Cisco…
Agenda Technology Trends1
Data Access Layer2
Alluxio Architecture3
Use Cases4
Agenda Technology Trends1
Data Access Layer2
Alluxio Architecture3
Use Cases4
Data Transformation
6
• Pressure in all industries to be
“data driven”
• Majority of companies still figuring out
the transformation
• Increased collection of numerous,
low-value data
• Challenge of overcoming data silos to
convert data into business value
• Limited success of Data Warehouse,
Mart, and Lakes – cost of
copying/moving data is substantial
• Single Data Plane for Business
value
Migration to Cloud
7
• Decoupling of compute and
storage
• Enterprise move from turnkey
solution to self managed data
platforms on IaaS
• Lacking agility at Data Storage
level
• Requires Storage Abstraction
Rise of Artificial Intelligence
8
• New workload targeting the
same data used for Hadoop
OLAP
• Potentially can be as
valuable or more valuable
than existing OLAP
workloads
• Challenge of adapting
existing architectures to
this new workload
Agenda Technology Trends1
Data Access Layer2
Alluxio Architecture3
Use Cases4
The Data Access Layer
10
• Abstraction layer between applications and storage systems
• Present a stable storage interface to applications, including
semantics, security, and performance
• Eliminate weakness of data silos instead of data silos
themselves
• Enable transparent migration of underlying storage systems
• Enable application API to storage API translation in a single
layer
Data Access Layer
Data Access Layer
Security Standard APIsHigh Performance
Compatibility Decoupling
Transparent
Migration
11
Alluxio
12
• Our implementation of the data access layer – a virtual
distributed file system
• Open source project with over 900 contributors from 100s of
organizations worldwide
• Deployed in many top internet and financial companies
100+ Known Production Deployments
AND MORE!
11/16/18 13
Agenda Technology Trends1
Data Access Layer2
Alluxio Architecture3
Use Cases4
Data Ecosystem with Alluxio
• Apps only talk to Alluxio
• Simple Add/Remove
• No App Changes
• Highest performance in
Memory
• No Lock in
Alluxio, a Virtual Distributed File System (VDFS)
Java File API
HDFS
Interface
S3 Interface REST API
HDFS Driver S3 Driver Swift Driver NFS Driver
FUSE
Interface
15
Alluxio Architecture
Alluxio
Master
Zookeeper
Standby
Master
Alluxio
Worker
Alluxio
Worker Under Store
RAM / SSD / HDD
RAM / SSD / HDD
Control Path
Data Path
16
Read Data not Cached in Alluxio + Caching
17
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
WorkerUnder Store 12
3
4
4
Read Cached Data in Alluxio
Alluxio
Worker
RAM / SSD / HDD
Application
Alluxio
Client
18
1
2
3
Write data only to Alluxio
Alluxio
Worker
RAM / SSD / HDD
Application
Alluxio
Client
19
1
2
3
Write to Alluxio and Under Store Synchronously
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Worker
Under Store
20
12
2
3
A Common File System Abstraction
21
• Common interface across apps
• HDFS-compatible interface:
change hdfs://foo/bar to
alluxio://foo/bar
• Other interfaces: Native Alluxio Java
FS, POSIX and S3.
• Cloud storage becomes “hidden”
to apps
• Less vendor lock-in!
Compute Zone
Standalone or managed with Mesos or Yarn
Storage in Different Availability Zone
Either on-prem or cloud
TensorflowPrestoMR
HDFS API POSIX API
Data Path: Improved I/O Performance
22
• A New Tier Above Cloud Storage for Compute
• Distributed buffer cache
• Restore locality to compute
• Read:
• Cache-hit read: served by Alluxio workers (local worker preferred)
• Cache-miss read: served by cloud storage, then cache to Alluxio worker
• Write:
• Burst buffer, then async propagate to S3 (Alluxio 2.0)
• Challenges:
• Locality: expose location information to applications; serve local apps
through ramdisk (rather than network)
Data Path: Async Persist to S3 (Alluxio 2.0)
23
RAM / SSD / HDD
Application
Alluxio
Client
Alluxio
Master
Alluxio
Worker
Under Store
• Async Writes
• Step1: App writes to Alluxio
• Step2: Alluxio writes to UFS
• Benefits
• Apps writes in Alluxio speed
• Data gets persisted
• Challenges
• File rename/delete before
persist: 2PC
• Fault-tolerance: journal async
requests
Metadata Path: Familiar Semantics
24
• Listing / renaming on object store can be expensive
• Common operations for batch or SQL analytics
• Overwriting Put is eventually consistent
• Alluxio loads and manages metadata in master
• Apps can continue assuming HDFS-like semantics and performance
implication
• Challenges
• Data modification bypassing Alluxio: when and how to re-sync
• Slow lists in object store: batch operations
• Too many objects: off-heap metadata (Alluxio 2.0)
Metadata Path: Efficient Renames
25
• Rename files on S3 can be expensive
• Common operations for MR in commit phase
• Write results to tmp paths
• Rename tmp files to final paths (another copy, slow)
• Rename with Alluxio async writes
• t0: writes to tmp paths in Alluxio: near-compute, fast writes
• t1: rename tmp paths to final path in Alluxio: cheap renames
• t2: persist files in final paths in Alluxio to S3: 2PC to avoid partial data
• Speculative execution allowed
Agenda Technology Trends1
Data Access Layer2
Alluxio Architecture3
Use Cases4
Sandbox: TPCDS + Alluxio + S3
27
• Setup
• TPCDS reads from S3, vs
• TPCDS reads from Alluxio backed by S3
• To play with Sandbox, make a request:
https://www.alluxio.org/sandbox/request
Case Study:
- Digital marketing SaaS platform
- Hive metastore: Input files from S3
- ~100 TB data on S3
- Pain Point:
- Slow to list files
- Limited EC2<->S3 bw
- No compute side caching
- No data locality
https://www.slideshare.net/ThaiBui7/hybrid-collaborative-tiered-storage-with-alluxio
28
Solution: Hot & Warm Data on Alluxio
29
Result
- 5 - 10x read improvement
- Enable easier debug with feedback loop for data analysts
30
Case Study:
- Leading Online Retailer (NASDAQ: JD)
- Building Ad-hoc SQL Query Engine
- Pain Point:
- Presto workers may read remotely from HDFS datanodes
- Large query variance
https://www.slideshare.net/Alluxio/alluxio-in-jd
31
Solution: Colocate Alluxio with Presto
32
Query Time
33
34
Query Time
Case Study:
35
- Leading Online Gaming Service Company (NASDAQ: NTES)
- Partner with Blizzard to operate service of “WoW”, “Hearthstone”
- Coming “Diablo Immortal”
- Building Ad-hoc SQL Query Engine
- Large data volume: ~30 TB raw data daily
- A separate satellite compute cluster
- Pain Point:
- Requirement in response time: < 15s
- Large startup latency on submitting SQL jobs as YARN app
Solution: Presto + Alluxio
36
Result: Smoother Response During Peak Time
37
Response time (ms)
Presto w/ Alluxio
Presto w/o Alluxio
- Alluxio: A New Data Access Layer
- Between compute and storage
- Transparent to bigdata analytics (HDFS-compatible, POSIX)
- Improve data and metadata performance on cloud storage
- Architecture and Data Flow
- Master, Worker, Under Storage
- Cache-{hit, miss} reads, Sync/Async writes
- Use Cases on Alluxio
Conclusion
38
zhuanlan.zhihu.com/alluxio
www.alluxio.com
info@alluxio.com
twitter.com/alluxio
linkedIn.com/alluxio
Thank you
binfan@alluxio.com

More Related Content

What's hot

StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoAlluxio, Inc.
 
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioImproving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioAlluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsAlluxio, Inc.
 
Embracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsEmbracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsAlluxio, Inc.
 
Hands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data ManagementHands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data ManagementAlluxio, Inc.
 
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open SourceAlluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open SourceAlluxio, Inc.
 
Alluxio Architecture and Performance
Alluxio Architecture and PerformanceAlluxio Architecture and Performance
Alluxio Architecture and PerformanceAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAlluxio, Inc.
 
Practical PaaS presentation
Practical PaaS presentationPractical PaaS presentation
Practical PaaS presentationhmalphettes
 
Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Alluxio, Inc.
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3Alluxio, Inc.
 
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioSpark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioAlluxio, Inc.
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraAlluxio, Inc.
 
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Lucas Jellema
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)Kay Lerch
 
OpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DOpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DAlessandro Pilotti
 
Comparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing PlatformsComparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing Platformsijsrd.com
 

What's hot (20)

StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using AlluxioImproving Data Locality for Spark Jobs on Kubernetes Using Alluxio
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi CloudsSimplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
Simplified Data Preparation for Machine Learning in Hybrid and Multi Clouds
 
Embracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloadsEmbracing hybrid cloud for data-intensive analytic workloads
Embracing hybrid cloud for data-intensive analytic workloads
 
Hands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data ManagementHands-on with Alluxio Structured Data Management
Hands-on with Alluxio Structured Data Management
 
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open SourceAlluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
 
Alluxio Architecture and Performance
Alluxio Architecture and PerformanceAlluxio Architecture and Performance
Alluxio Architecture and Performance
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph Objects
 
Practical PaaS presentation
Practical PaaS presentationPractical PaaS presentation
Practical PaaS presentation
 
Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens Reducing large S3 API costs using Alluxio at Datasapiens
Reducing large S3 API costs using Alluxio at Datasapiens
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioSpark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with Alluxio
 
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud EraModernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
Modernizing Your Data Platform for Analytics and AI in the Hybrid Cloud Era
 
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
 
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
AWS User Group Meetup Berlin - Kay Lerch on Apache NiFi (2016-04-19)
 
OpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DOpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2D
 
Comparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing PlatformsComparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing Platforms
 

Similar to Unified Big Data Analytics: Any Stack, Any Cloud

Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudAlluxio, Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsAlluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio, Inc.
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Community
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory SpeedAlluxio, Inc.
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Jiří Šimša
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio, Inc.
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIAlluxio, Inc.
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreAlluxio, Inc.
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Alluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio, Inc.
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
 

Similar to Unified Big Data Analytics: Any Stack, Any Cloud (20)

Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 

More from Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Unified Big Data Analytics: Any Stack, Any Cloud

  • 1. Unified Big Data Analytics: Any stack, Any Cloud 1 2019/01/22 @ CloudHealth by VMware Follow us | @alluxio Download Alluxio | www.alluxio.org Questions? | http://alluxio.org/slack
  • 2. About Me • Bin Fan • PhD CS@CMU • Founding Engineer@Alluxio 2 Email: binfan@alluxio.com Github: apc999 Twitter: @binfan
  • 3. Company Overview • Founded Feb. 2015 – Haoyuan Li • PhD research at UC Berkeley AMPLab • Initially Tachyon Nexus • Venture Backed • Andreessen Horowitz, Seven Seas etc. • Open Source Business Model • Tachyon Open Sourced in Dec. 2012 • Open source v1.0 released Feb. 2016 • Commercial product released Oct. 2016 • Office in San Mateo, CA • Team: Google, Palantir, Vmware, AMD, Cisco…
  • 4. Agenda Technology Trends1 Data Access Layer2 Alluxio Architecture3 Use Cases4
  • 5. Agenda Technology Trends1 Data Access Layer2 Alluxio Architecture3 Use Cases4
  • 6. Data Transformation 6 • Pressure in all industries to be “data driven” • Majority of companies still figuring out the transformation • Increased collection of numerous, low-value data • Challenge of overcoming data silos to convert data into business value • Limited success of Data Warehouse, Mart, and Lakes – cost of copying/moving data is substantial • Single Data Plane for Business value
  • 7. Migration to Cloud 7 • Decoupling of compute and storage • Enterprise move from turnkey solution to self managed data platforms on IaaS • Lacking agility at Data Storage level • Requires Storage Abstraction
  • 8. Rise of Artificial Intelligence 8 • New workload targeting the same data used for Hadoop OLAP • Potentially can be as valuable or more valuable than existing OLAP workloads • Challenge of adapting existing architectures to this new workload
  • 9. Agenda Technology Trends1 Data Access Layer2 Alluxio Architecture3 Use Cases4
  • 10. The Data Access Layer 10 • Abstraction layer between applications and storage systems • Present a stable storage interface to applications, including semantics, security, and performance • Eliminate weakness of data silos instead of data silos themselves • Enable transparent migration of underlying storage systems • Enable application API to storage API translation in a single layer
  • 11. Data Access Layer Data Access Layer Security Standard APIsHigh Performance Compatibility Decoupling Transparent Migration 11
  • 12. Alluxio 12 • Our implementation of the data access layer – a virtual distributed file system • Open source project with over 900 contributors from 100s of organizations worldwide • Deployed in many top internet and financial companies
  • 13. 100+ Known Production Deployments AND MORE! 11/16/18 13
  • 14. Agenda Technology Trends1 Data Access Layer2 Alluxio Architecture3 Use Cases4
  • 15. Data Ecosystem with Alluxio • Apps only talk to Alluxio • Simple Add/Remove • No App Changes • Highest performance in Memory • No Lock in Alluxio, a Virtual Distributed File System (VDFS) Java File API HDFS Interface S3 Interface REST API HDFS Driver S3 Driver Swift Driver NFS Driver FUSE Interface 15
  • 16. Alluxio Architecture Alluxio Master Zookeeper Standby Master Alluxio Worker Alluxio Worker Under Store RAM / SSD / HDD RAM / SSD / HDD Control Path Data Path 16
  • 17. Read Data not Cached in Alluxio + Caching 17 RAM / SSD / HDD Application Alluxio Client Alluxio WorkerUnder Store 12 3 4 4
  • 18. Read Cached Data in Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 18 1 2 3
  • 19. Write data only to Alluxio Alluxio Worker RAM / SSD / HDD Application Alluxio Client 19 1 2 3
  • 20. Write to Alluxio and Under Store Synchronously RAM / SSD / HDD Application Alluxio Client Alluxio Worker Under Store 20 12 2 3
  • 21. A Common File System Abstraction 21 • Common interface across apps • HDFS-compatible interface: change hdfs://foo/bar to alluxio://foo/bar • Other interfaces: Native Alluxio Java FS, POSIX and S3. • Cloud storage becomes “hidden” to apps • Less vendor lock-in! Compute Zone Standalone or managed with Mesos or Yarn Storage in Different Availability Zone Either on-prem or cloud TensorflowPrestoMR HDFS API POSIX API
  • 22. Data Path: Improved I/O Performance 22 • A New Tier Above Cloud Storage for Compute • Distributed buffer cache • Restore locality to compute • Read: • Cache-hit read: served by Alluxio workers (local worker preferred) • Cache-miss read: served by cloud storage, then cache to Alluxio worker • Write: • Burst buffer, then async propagate to S3 (Alluxio 2.0) • Challenges: • Locality: expose location information to applications; serve local apps through ramdisk (rather than network)
  • 23. Data Path: Async Persist to S3 (Alluxio 2.0) 23 RAM / SSD / HDD Application Alluxio Client Alluxio Master Alluxio Worker Under Store • Async Writes • Step1: App writes to Alluxio • Step2: Alluxio writes to UFS • Benefits • Apps writes in Alluxio speed • Data gets persisted • Challenges • File rename/delete before persist: 2PC • Fault-tolerance: journal async requests
  • 24. Metadata Path: Familiar Semantics 24 • Listing / renaming on object store can be expensive • Common operations for batch or SQL analytics • Overwriting Put is eventually consistent • Alluxio loads and manages metadata in master • Apps can continue assuming HDFS-like semantics and performance implication • Challenges • Data modification bypassing Alluxio: when and how to re-sync • Slow lists in object store: batch operations • Too many objects: off-heap metadata (Alluxio 2.0)
  • 25. Metadata Path: Efficient Renames 25 • Rename files on S3 can be expensive • Common operations for MR in commit phase • Write results to tmp paths • Rename tmp files to final paths (another copy, slow) • Rename with Alluxio async writes • t0: writes to tmp paths in Alluxio: near-compute, fast writes • t1: rename tmp paths to final path in Alluxio: cheap renames • t2: persist files in final paths in Alluxio to S3: 2PC to avoid partial data • Speculative execution allowed
  • 26. Agenda Technology Trends1 Data Access Layer2 Alluxio Architecture3 Use Cases4
  • 27. Sandbox: TPCDS + Alluxio + S3 27 • Setup • TPCDS reads from S3, vs • TPCDS reads from Alluxio backed by S3 • To play with Sandbox, make a request: https://www.alluxio.org/sandbox/request
  • 28. Case Study: - Digital marketing SaaS platform - Hive metastore: Input files from S3 - ~100 TB data on S3 - Pain Point: - Slow to list files - Limited EC2<->S3 bw - No compute side caching - No data locality https://www.slideshare.net/ThaiBui7/hybrid-collaborative-tiered-storage-with-alluxio 28
  • 29. Solution: Hot & Warm Data on Alluxio 29
  • 30. Result - 5 - 10x read improvement - Enable easier debug with feedback loop for data analysts 30
  • 31. Case Study: - Leading Online Retailer (NASDAQ: JD) - Building Ad-hoc SQL Query Engine - Pain Point: - Presto workers may read remotely from HDFS datanodes - Large query variance https://www.slideshare.net/Alluxio/alluxio-in-jd 31
  • 32. Solution: Colocate Alluxio with Presto 32
  • 35. Case Study: 35 - Leading Online Gaming Service Company (NASDAQ: NTES) - Partner with Blizzard to operate service of “WoW”, “Hearthstone” - Coming “Diablo Immortal” - Building Ad-hoc SQL Query Engine - Large data volume: ~30 TB raw data daily - A separate satellite compute cluster - Pain Point: - Requirement in response time: < 15s - Large startup latency on submitting SQL jobs as YARN app
  • 36. Solution: Presto + Alluxio 36
  • 37. Result: Smoother Response During Peak Time 37 Response time (ms) Presto w/ Alluxio Presto w/o Alluxio
  • 38. - Alluxio: A New Data Access Layer - Between compute and storage - Transparent to bigdata analytics (HDFS-compatible, POSIX) - Improve data and metadata performance on cloud storage - Architecture and Data Flow - Master, Worker, Under Storage - Cache-{hit, miss} reads, Sync/Async writes - Use Cases on Alluxio Conclusion 38