Ray (https://github.com/ray-project/ray) is a framework developed at UC Berkeley and maintained by Anyscale for building distributed AI applications. Over the last year, the broader machine learning ecosystem has been rapidly adopting Ray as the primary framework for distributed execution. In this talk, we will overview how libraries such as Horovod (https://horovod.ai/), XGBoost, and Hugging Face Transformers, have integrated with Ray. We will then showcase how Uber leverages Ray and these ecosystem integrations to simplify critical production workloads at Uber. This is a joint talk between Anyscale and Uber.
33. Ecosystem - Horovod
- Fast and easy distributed training for any framework
- Run Horovod on Ray
- Any cloud provider or k8s with Ray cluster launcher
- Hyperparameter search integration for Horovod
- Benefits of ecosystem (data processing, serving)
- Integration released in Horovod 0.20
- ~400 lines of code
34. Ecosystem - Ludwig
- Code-free deep learning (Auto ML)
- Given inputs and outputs, Ludwig builds the right model for any task
36. Ludwig: scalability challenges
- Single worker for preprocessing
- Whole dataset must fit in-memory (Pandas)
- Hyperparameter Optimization
- Optimize over preprocessing (feature engineering)
- Optimize over model params
- Optimizer over model architecture (encoders / decoders)
38. Challenges with ML workflows
- Rewrite major sections of the code:
- Pandas -> Spark transformers
- Maintain two distinct code paths
- Each step heavyweight, allocates heterogenous infra
- Airflow
- But what about hyperparameter optimization?
- Dynamic process
- Difficult to model using static workflow definitions
39. Examining the Ray ecosystem
Dask
- Drop-in replacement for Pandas
- Pure-Python data processing (low overhead, easy debugging)
- GPU acceleration with RAPIDS / cuDF
Horovod
- Framework agnostic distributed training (TensorFlow, PyTorch, MXNet)
- Supports fault tolerance and auto-scaling
- Flexible: no restrictions on the structure of the training code
Ray
- Brings everything together as a single infra layer
- Provides scalable hyperparameter optimization and serving natively