Alluxio Monthly Webinar
Feb. 27, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Tarik Bennett (Senior Solutions Engineer, Alluxio)
As GenAI and AI continue to transform businesses, scaling these workloads requires optimized underlying infrastructure. A multi-cloud architecture allows organizations to leverage different cloud services to meet diverse workload demands while maximizing efficiency, reducing costs, and avoiding vendor lock-in. However, achieving a multi-cloud vision can be challenging.
In this webinar, Tarik will share how an agonistic data layer, like Alluxio, allows you to embrace the separation of storage from compute and simplify the adoption of multi-cloud for AI.
- Learn why leveraging multiple cloud providers is critical for balancing performance, scalability, and cost of your AI platform
- Discover how an agnostic data layer like Alluxio provides seamless data access in multi-cloud that bridges storage and compute without data replication
- Gain insights into real-world examples and best practices for deploying AI across on-prem, hybrid, and multi-cloud environments
5. Source: Gartner 2023
1. By 2028, the adoption of AI will culminate in over
50% of cloud compute resources… up from less
than 10% in 2023.
2. Global spending on public cloud services is forecast to increase 20.4% in 2024… the
source of growth will be combination of cloud vendor price increases and increased
utilization.
3. Deep learning models fed by images, internet-scale applications or even telemetry data
have ever growing data requirements.
AI Adoption is Ballooning Cloud Costs
6. ● Efficient distributed computing
● Workload scheduling
● Modernizing or reducing legacy storage
● Minimizing data movement
● Improving data access
● Increasing scalability
Efficiencies via Platform Improvements
7. Source: Gartner 2023
According to the survey, almost half (47%) of C-suite
executives don’t feel prepared for the accelerating rate
of technological change.
Further, only 27% claim their organizations are ready to scale up generative AI, and 44% say it
will take more than six months to do so and take advantage of the potential benefits.
Scalability and Cloud Agility
8. Technical
● Improves scalability
● Enables hybrid cloud
● Expanded access to GPUs
● Best-of-breed AI tools available
Non-Technical
● Leverage in cloud negotiations
● Security and governance, privacy, etc
● Service resilience
● Flexible access to the most
cost-effective resources
Why Multi-Cloud?
9. Agility Comes with Some Overhead
● Data replication between DCs or regions
Multi-Cloud Challenges
Source: Alluxio
10. Agility Comes with Some Overhead
● Data replication between DCs or regions
● Disruptive, costly or prolonged migrations to upgrade
HDFS
Object
Store
Multi-Cloud Challenges
11. Agility Comes with Some Overhead
● Data replication between DCs or regions
● Disruptive, costly or prolonged migrations to upgrade
● Overlapping resources in cloud + on-prem
compute compute compute
Multi-Cloud Challenges
12. Agility Comes with Some Overhead
● Data replication between DCs or regions
● Disruptive, costly or prolonged migrations to upgrade
● Overlapping resources in cloud + on-prem
● Need to address non-technical requirements within CSPs
Multi-Cloud Challenges
13. Given Multi-Cloud Benefits for AI, You Can Optimize
● Simplify wherever possible
● Reduce replication wherever possible
● Finding cost efficiencies via caching or other means
● Increase data locality
● Unify data access
● Increase throughput of commodity storage
● Reduce bandwidth congestion
Best Practices
14. ● Multi-Cloud architecture
○ Google Cloud Platform (GCP)
○ Oracle® Cloud Infrastructure (OCI)
● Data orchestration and caching
Uber Multi-Cloud Architecture (Future)
Source: Uber Jing Zhao 2024
20. Some data cannot be persisted in the cloud. Security teams will often
approve ephemeral cache, while other options will be denied.
High Performance Data Access
Sensitive model
training data
Data evicted
from the cache
Benefits of Caching for Sensitive Data
21. Standalone Cluster
High Performance Data Access Layer
Data from multiple sources served to GPU nodes
Virtual Caching Across Local
GPU Storage
Data source synced to Virtual Alluxio Storage and
shared between GPU nodes
Alluxio Deployment Options for AI
23. BUSINESS BENEFIT:
TECH BENEFIT:
Increase GPU
utilization
50%
93%
File System
Training
Data
Training
Data
M
o
d
e
l
s
Training
Data
Models
Model
Training
Model
Training
Model
Deployment
Model
Inference
Downstream
Applications
Model
Update
Training Clouds Offline Cloud Online Cloud
APAC Quora CASE STUDY:
High Performance AI Platform for LLM
2 - 4X faster
time-to-market
Before Alluxio: (1) Low GPU Utilization, (2) Overloaded Storage, (3) Network Congestion & Slow Model Refresh