SlideShare a Scribd company logo
1 of 29
Download to read offline
Alluxio Data Orchestration for
Machine Learning
Lu Qiu, Bin Fan @ Alluxio
04/27/2021
1
About Us – Lu Qiu
● Software Engineer @ Alluxio
● Email: lu@alluxio.com
● Master Data Science @ GWU
● Areas: Alluxio fault tolerant system, journal
system, metrics system, and POSIX API.
Alluxio integration with Cloud
2
About Us – Bin Fan
● Founding Engineer, VP Open Source @ Alluxio
● Email: binfan@alluxio.com
● PhD in CS @ CMU
3
Agenda
● What is Alluxio POSIX API
● How to Use Alluxio via POSIX API
● Latest Work and Roadmap
4
What is Alluxio POSIX API
5
What is POSIX?
https://en.wikipedia.org/wiki/POSIX
- Portable Operating System Interface
- Define API, command line shells, utility interfaces for software
compatibility with variants of Unix and other operating systems
- Maintaining compatibility between operating systems
- A standard makes things stay compatible in operating systems
6
Apps Connecting to Alluxio via POSIX API
7
Accessing Remote/Distributed Data as
Local Directories
8
HDFS #1
Obj Store
NFS
HDFS #2
Connecting to
• HDFS
• Amazon S3
• Azure
• Google Cloud
• Ceph
• NFS
• Many more
Alluxio
Server
Alluxio
Server
Model Training
Distributed Caching w/ Unified Namespace
Alluxio
Server
A
B
/path1/file1
/path2/file2
C
A
B C A
Model Training Model Training
9
Under the Hood: FUSE
https://en.wikipedia.org/wiki/Filesystem_in_Userspace
- Filesystem in Userspace
- A software interface for Unix and Unix-like computer operating systems
that lets non-privileged users create their own file systems without editing
kernel code.
10
Under the Hood: FUSE (Cont.)
The userspace side of FUSE, the libfuse library
https://github.com/libfuse/libfuse
A FUSE file system is typically implemented as a standalone application that
links with libfuse.
https://github.com/libfuse/libfuse/blob/master/example/hello.c
- Define read/write/ls/…
11
12
Alluxio-FUSE limitations
3/25/19
● Since Alluxio as a write-once/read-many file system, the mounted
file system will not support all POSIX workloads.
Files can be written only once, only sequentially, and never be
modified. Vim command is not supported since it uses append
internally. Cp when destination file exists will fail.
● Alluxio does not have hard-link and soft-link concepts, so the
commands like ln are not supported, neither the hardlinks number
is displayed in ll output.
● Performance is worse than using Alluxio Java client directly
Limitations of Alluxio POSIX API
13
● Since Alluxio as a write-once/read-many file system, the mounted file
system will not support all POSIX workloads.
Files can be written only once, only sequentially, and never be modified.
Vim command is not supported since it uses append internally. Cp when
destination file exists will fail.
● Alluxio does not have hard-link and soft-link concepts, so the commands
like ln are not supported, neither the hardlinks number is displayed in ll
output.
● Performance is bound by FUSE and Alluxio client
How to Use Alluxio POSIX API
14
Launching Standalone Fuse
15
Mount Alluxio service as a local FS path:
Check out local Alluxio mount points
Unmount Alluxio service:
integration/fuse/bin/alluxio-fuse mount 
-o [mount_options] mount_point [alluxio_path]
integration/fuse/bin/alluxio-fuse stat
pid mount_point alluxio_path
80846 /mnt/people /people
80847 /mnt/sales /sales
integration/fuse/bin/alluxio-fuse unmount mount_point
Bash
Tensorflow
cat /mnt/alluxio/myInput
Accessing Alluxio Service via POSIX API
16
python classify_image.py --model_dir /mnt/fuse/imagenet/
Demo
17
A New JNI-based FUSE Impl
(available since 2.5.0)
18
Integrating libfuse (in C) with Java Client
19
● Previously based on 3rd party JNR-based FUSE library
● Now on a new 1st party JNI-based FUSE library
○ On libfuse directly to enable more optimizations
○ Close to native libfuse performance
○ Support high concurrency
JNR-FUSE Hard to debug
20
● JNR-FUSE has many dependencies, hard to debug and fix.
● Didn’t support callback functions well. When a native thread call JVM, it will
attach to JVM which is relatively expensive.
Community Collaboration
● Community-driven collaboration
○ Contributors from NJU, Alibaba, Tencent, Alluxio
● Already in used by Microsoft in Production
21
Performance
22
Target Scenarios
23
● Multi-node, multi-thread machine learning/deep learning workloads.
● Read path has better performance benefits compared to write path
● Medium to large files have better performance than small files
Local RPC Elimination
(available soon in 2.6.0)
24
Idea:
● Motivated by training workloads reading many small files
○ Standalone Alluxio-FUSE process is a long-running client translating
FUSE API calls to Alluxio client RPCs
○ RPCs required to communicate with workers, even on cache hit
● Combining Alluxio-FUSE functionality into Alluxio worker
25
Launching Fuse on Worker
26
● Configure alluxio-site.properties on worker nodes:
alluxio.worker.fuse.enabled=true
alluxio.worker.fuse.mount.point=/mnt/alluxio-service
alluxio.worker.fuse.mount.options=kernel_cache,entry_timeout=7200,attr_ti
meout=7200
● Then Start Worker Process through, Alluxio namespace can be accessed
via a local path /mnt/alluxio-service
Other Optimizations
● DONE: Moduliazed JNI-Fuse library (github repo)
● TODO: Optimize gRPC performance on remote cache hit
● TODO: Support libfuse 3.x (issue ticket)
● And many more coming..
Join Alluxio weekly community sync to create solutions together!
27
Reference
● Using Alluxio to Optimize and Improve Performance of
Kubernetes-Based Deep Learning in the Cloud (link)
● ALLUXIO POSIX API documentation (English or Chinese)
● Turn Cloud Storage or HDFS IntoYour Local File System for
Faster AI Model Training With TensorFlow (link)
● Fuse realization theory (Chinese TBT link)
28
Questions?
Welcome to join the Alluxio Community!
www.alluxio.io/slack | @alluxio
29

More Related Content

What's hot

Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed StorageAlluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed StorageAlluxio, Inc.
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in KubernetesDeep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in KubernetesAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAlluxio, Inc.
 
Building an external CPI for CloudStack
Building an external CPI for CloudStackBuilding an external CPI for CloudStack
Building an external CPI for CloudStackGuillaume Berche
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...Alluxio, Inc.
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebula Project
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a productCarlo Daffara
 
Openstack CPI cloudfoundry
Openstack CPI cloudfoundryOpenstack CPI cloudfoundry
Openstack CPI cloudfoundryYitao Jiang
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelArthur Berezin
 
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...NETWAYS
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05Lenz Grimmer
 
XCP-ng - past, present and future
XCP-ng - past, present and futureXCP-ng - past, present and future
XCP-ng - past, present and futureShapeBlue
 
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebulaOpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebulaOpenNebula Project
 
presentation el cluster0
presentation el cluster0presentation el cluster0
presentation el cluster0Dennis Mungai
 
Enabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaEnabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaNETWAYS
 

What's hot (20)

Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed StorageAlluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
Alluxio (formerly Tachyon): Open Source Memory Speed Virtual Distributed Storage
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in KubernetesDeep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified Namespace
 
Building an external CPI for CloudStack
Building an external CPI for CloudStackBuilding an external CPI for CloudStack
Building an external CPI for CloudStack
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
 
Making clouds: turning opennebula into a product
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a product
 
Openstack CPI cloudfoundry
Openstack CPI cloudfoundryOpenstack CPI cloudfoundry
Openstack CPI cloudfoundry
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
 
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
 
XCP-ng - past, present and future
XCP-ng - past, present and futureXCP-ng - past, present and future
XCP-ng - past, present and future
 
Big Data on DC/OS
Big Data on DC/OSBig Data on DC/OS
Big Data on DC/OS
 
What's new in openstack ocata
What's new in openstack ocata What's new in openstack ocata
What's new in openstack ocata
 
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebulaOpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
OpenNebula TechDay Boston 2015 - Hyperconvergence and OpenNebula
 
presentation el cluster0
presentation el cluster0presentation el cluster0
presentation el cluster0
 
Enabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaEnabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebula
 

Similar to Alluxio data orchestration for machine learning

Webinar: Open Source on the Modern Mainframe
Webinar: Open Source on the Modern MainframeWebinar: Open Source on the Modern Mainframe
Webinar: Open Source on the Modern MainframeOpen Mainframe Project
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014Puppet
 
Introduce of open swoole
Introduce of open swooleIntroduce of open swoole
Introduce of open swooleThanh Tai
 
Lenovo system management solutions
Lenovo system management solutionsLenovo system management solutions
Lenovo system management solutionsinside-BigData.com
 
Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...
Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...
Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...Ontico
 
MuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With Rust
MuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With RustMuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With Rust
MuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With RustJitendra Bafna
 
WSO2 Enterprise Service Bus - Product Overview
WSO2 Enterprise Service Bus - Product OverviewWSO2 Enterprise Service Bus - Product Overview
WSO2 Enterprise Service Bus - Product OverviewWSO2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrCumulus Networks
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanAmbassador Labs
 
198970820 p-oooooooooo
198970820 p-oooooooooo198970820 p-oooooooooo
198970820 p-oooooooooohomeworkping4
 
Serverless Pune Meetup 1
Serverless Pune Meetup 1Serverless Pune Meetup 1
Serverless Pune Meetup 1Vishal Biyani
 
Implementing CloudHub 2.0 CI/CD Pipeline with Bitbucket Integration
Implementing CloudHub 2.0 CI/CD Pipeline with Bitbucket IntegrationImplementing CloudHub 2.0 CI/CD Pipeline with Bitbucket Integration
Implementing CloudHub 2.0 CI/CD Pipeline with Bitbucket Integrationsandeepmenon62
 
2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next
2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next
2020-09-25 Uyuni Communit Hours: 2020.09 news and what's nextUyuni Project
 
The new WPE API
The new WPE APIThe new WPE API
The new WPE APIIgalia
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Alluxio, Inc.
 
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open SourceAlluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open SourceAlluxio, Inc.
 

Similar to Alluxio data orchestration for machine learning (20)

Webinar: Open Source on the Modern Mainframe
Webinar: Open Source on the Modern MainframeWebinar: Open Source on the Modern Mainframe
Webinar: Open Source on the Modern Mainframe
 
The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014The Switch as a Server - PuppetConf 2014
The Switch as a Server - PuppetConf 2014
 
Project Fuji/OpenESB Aquarium Paris
Project Fuji/OpenESB Aquarium ParisProject Fuji/OpenESB Aquarium Paris
Project Fuji/OpenESB Aquarium Paris
 
Introduce of open swoole
Introduce of open swooleIntroduce of open swoole
Introduce of open swoole
 
Lenovo system management solutions
Lenovo system management solutionsLenovo system management solutions
Lenovo system management solutions
 
Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...
Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...
Подталкиваем PHP к пределу возможностей, Michael Armstrong (lite speed techno...
 
Chef vs puppet
Chef vs puppetChef vs puppet
Chef vs puppet
 
MuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With Rust
MuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With RustMuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With Rust
MuleSoft Surat Meetup#44 - Anypoint Flex Gateway Custom Policies With Rust
 
WSO2 Enterprise Service Bus - Product Overview
WSO2 Enterprise Service Bus - Product OverviewWSO2 Enterprise Service Bus - Product Overview
WSO2 Enterprise Service Bus - Product Overview
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
 
Code One 2018 maven
Code One 2018   mavenCode One 2018   maven
Code One 2018 maven
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
 
198970820 p-oooooooooo
198970820 p-oooooooooo198970820 p-oooooooooo
198970820 p-oooooooooo
 
Serverless Pune Meetup 1
Serverless Pune Meetup 1Serverless Pune Meetup 1
Serverless Pune Meetup 1
 
Implementing CloudHub 2.0 CI/CD Pipeline with Bitbucket Integration
Implementing CloudHub 2.0 CI/CD Pipeline with Bitbucket IntegrationImplementing CloudHub 2.0 CI/CD Pipeline with Bitbucket Integration
Implementing CloudHub 2.0 CI/CD Pipeline with Bitbucket Integration
 
2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next
2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next
2020-09-25 Uyuni Communit Hours: 2020.09 news and what's next
 
The new WPE API
The new WPE APIThe new WPE API
The new WPE API
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open SourceAlluxio Community Office Hour: Getting Started with Alluxio Open Source
Alluxio Community Office Hour: Getting Started with Alluxio Open Source
 

More from Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Recently uploaded (20)

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

Alluxio data orchestration for machine learning

  • 1. Alluxio Data Orchestration for Machine Learning Lu Qiu, Bin Fan @ Alluxio 04/27/2021 1
  • 2. About Us – Lu Qiu ● Software Engineer @ Alluxio ● Email: lu@alluxio.com ● Master Data Science @ GWU ● Areas: Alluxio fault tolerant system, journal system, metrics system, and POSIX API. Alluxio integration with Cloud 2
  • 3. About Us – Bin Fan ● Founding Engineer, VP Open Source @ Alluxio ● Email: binfan@alluxio.com ● PhD in CS @ CMU 3
  • 4. Agenda ● What is Alluxio POSIX API ● How to Use Alluxio via POSIX API ● Latest Work and Roadmap 4
  • 5. What is Alluxio POSIX API 5
  • 6. What is POSIX? https://en.wikipedia.org/wiki/POSIX - Portable Operating System Interface - Define API, command line shells, utility interfaces for software compatibility with variants of Unix and other operating systems - Maintaining compatibility between operating systems - A standard makes things stay compatible in operating systems 6
  • 7. Apps Connecting to Alluxio via POSIX API 7
  • 8. Accessing Remote/Distributed Data as Local Directories 8 HDFS #1 Obj Store NFS HDFS #2 Connecting to • HDFS • Amazon S3 • Azure • Google Cloud • Ceph • NFS • Many more
  • 9. Alluxio Server Alluxio Server Model Training Distributed Caching w/ Unified Namespace Alluxio Server A B /path1/file1 /path2/file2 C A B C A Model Training Model Training 9
  • 10. Under the Hood: FUSE https://en.wikipedia.org/wiki/Filesystem_in_Userspace - Filesystem in Userspace - A software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. 10
  • 11. Under the Hood: FUSE (Cont.) The userspace side of FUSE, the libfuse library https://github.com/libfuse/libfuse A FUSE file system is typically implemented as a standalone application that links with libfuse. https://github.com/libfuse/libfuse/blob/master/example/hello.c - Define read/write/ls/… 11
  • 12. 12 Alluxio-FUSE limitations 3/25/19 ● Since Alluxio as a write-once/read-many file system, the mounted file system will not support all POSIX workloads. Files can be written only once, only sequentially, and never be modified. Vim command is not supported since it uses append internally. Cp when destination file exists will fail. ● Alluxio does not have hard-link and soft-link concepts, so the commands like ln are not supported, neither the hardlinks number is displayed in ll output. ● Performance is worse than using Alluxio Java client directly
  • 13. Limitations of Alluxio POSIX API 13 ● Since Alluxio as a write-once/read-many file system, the mounted file system will not support all POSIX workloads. Files can be written only once, only sequentially, and never be modified. Vim command is not supported since it uses append internally. Cp when destination file exists will fail. ● Alluxio does not have hard-link and soft-link concepts, so the commands like ln are not supported, neither the hardlinks number is displayed in ll output. ● Performance is bound by FUSE and Alluxio client
  • 14. How to Use Alluxio POSIX API 14
  • 15. Launching Standalone Fuse 15 Mount Alluxio service as a local FS path: Check out local Alluxio mount points Unmount Alluxio service: integration/fuse/bin/alluxio-fuse mount -o [mount_options] mount_point [alluxio_path] integration/fuse/bin/alluxio-fuse stat pid mount_point alluxio_path 80846 /mnt/people /people 80847 /mnt/sales /sales integration/fuse/bin/alluxio-fuse unmount mount_point
  • 16. Bash Tensorflow cat /mnt/alluxio/myInput Accessing Alluxio Service via POSIX API 16 python classify_image.py --model_dir /mnt/fuse/imagenet/
  • 18. A New JNI-based FUSE Impl (available since 2.5.0) 18
  • 19. Integrating libfuse (in C) with Java Client 19 ● Previously based on 3rd party JNR-based FUSE library ● Now on a new 1st party JNI-based FUSE library ○ On libfuse directly to enable more optimizations ○ Close to native libfuse performance ○ Support high concurrency
  • 20. JNR-FUSE Hard to debug 20 ● JNR-FUSE has many dependencies, hard to debug and fix. ● Didn’t support callback functions well. When a native thread call JVM, it will attach to JVM which is relatively expensive.
  • 21. Community Collaboration ● Community-driven collaboration ○ Contributors from NJU, Alibaba, Tencent, Alluxio ● Already in used by Microsoft in Production 21
  • 23. Target Scenarios 23 ● Multi-node, multi-thread machine learning/deep learning workloads. ● Read path has better performance benefits compared to write path ● Medium to large files have better performance than small files
  • 24. Local RPC Elimination (available soon in 2.6.0) 24
  • 25. Idea: ● Motivated by training workloads reading many small files ○ Standalone Alluxio-FUSE process is a long-running client translating FUSE API calls to Alluxio client RPCs ○ RPCs required to communicate with workers, even on cache hit ● Combining Alluxio-FUSE functionality into Alluxio worker 25
  • 26. Launching Fuse on Worker 26 ● Configure alluxio-site.properties on worker nodes: alluxio.worker.fuse.enabled=true alluxio.worker.fuse.mount.point=/mnt/alluxio-service alluxio.worker.fuse.mount.options=kernel_cache,entry_timeout=7200,attr_ti meout=7200 ● Then Start Worker Process through, Alluxio namespace can be accessed via a local path /mnt/alluxio-service
  • 27. Other Optimizations ● DONE: Moduliazed JNI-Fuse library (github repo) ● TODO: Optimize gRPC performance on remote cache hit ● TODO: Support libfuse 3.x (issue ticket) ● And many more coming.. Join Alluxio weekly community sync to create solutions together! 27
  • 28. Reference ● Using Alluxio to Optimize and Improve Performance of Kubernetes-Based Deep Learning in the Cloud (link) ● ALLUXIO POSIX API documentation (English or Chinese) ● Turn Cloud Storage or HDFS IntoYour Local File System for Faster AI Model Training With TensorFlow (link) ● Fuse realization theory (Chinese TBT link) 28
  • 29. Questions? Welcome to join the Alluxio Community! www.alluxio.io/slack | @alluxio 29