As machine learning matures, the standard supervised learning setup is no longer sufficient. Instead of making and serving a single prediction as a function of a data point, machine learning applications increasingly must operate in dynamic environments, react to changes in the environment, and take sequences of actions to accomplish a goal. These modern applications are better framed within the context of reinforcement learning (RL), which deals with learning to operate within an environment. RL-based applications have already led to remarkable results, such as Google’s AlphaGo beating the Go world champion, and are finding their way into self-driving cars, UAVs, and surgical robotics.
These applications have very demanding computational requirements–at the high end, they may need to execute millions of tasks per second with millisecond level latencies, and support heterogeneous and dynamic computation graphs. In this talk, we present Ray, a new cluster computing framework that meets these requirements, give some application examples, and discuss how it can be integrated with Apache Spark.
14. Process inputs from different sensors in parallel & real-time
RL Application Pattern
15. Process inputs from different sensors in parallel & real-time
Execute large number of simulations, e.g., up to 100s of millions
RL Application Pattern
16. Process inputs from different sensors in parallel & real-time
Execute large number of simulations, e.g., up to 100s of millions
Rollouts outcomes are used to update policy (e.g., SGD)
Update
policy
Update
policy
…
…
Update
policy
simulations
Update
policy
…
RL Application Pattern
18. Process inputs from different sensors in parallel & real-time
Execute large number of simulations, e.g., up to 100s of millions
Rollouts outcomes are used to update policy (e.g., SGD)
Often policies implemented by DNNs
actions
observations
RL Application Pattern
19. Process inputs from different sensors in parallel & real-time
Execute large number of simulations, e.g., up to 100s of millions
Rollouts outcomes are used to update policy (e.g., SGD)
Often policies implemented by DNNs
Most RL algorithms developed in Python
RL Application Pattern
20. RL Application Requirements
Need to handle dynamic task graphs, where tasks have
• Heterogeneous durations
• Heterogeneous computations
Schedule millions of tasks/sec
Make it easy to parallelize ML algorithms written in Python
21. Ray API - remote functions
def zeros(shape):
return np.zeros(shape)
def dot(a, b):
return np.dot(a, b)
22. Ray API - remote functions
@ray.remote
def zeros(shape):
return np.zeros(shape)
@ray.remote
def dot(a, b):
return np.dot(a, b)
37. Ray architecture
WorkerDriver WorkerWorker WorkerWorker
Object Store Object Store Object Store
Local Scheduler Local Scheduler Local Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
38. Ray architecture
WorkerDriver WorkerWorker WorkerWorker
Object Store Object Store Object Store
Local Scheduler Local Scheduler Local Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
39. Ray architecture
WorkerDriver WorkerWorker WorkerWorker
Object Store Object Store Object Store
Local Scheduler Local Scheduler Local Scheduler
Global Control Store
Global Control Store
Global Control Store
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
40. Ray architecture
WorkerDriver WorkerWorker WorkerWorker
Object Store Object Store Object Store
Local Scheduler Local Scheduler Local Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
Global Control Store
Global Control Store
Global Control Store
41. Ray architecture
WorkerDriver WorkerWorker WorkerWorker
Object Store Object Store Object Store
Local Scheduler Local Scheduler Local Scheduler
Global Control Store
Global Control Store
Global Control Store
Debugging Tools
Profiling Tools
Web UI
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler
52. Pseudocode
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward
workers = [Worker() for i in range(20)]
policy = initial_policy()
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, rewards, seeds)
actions
observations
rewards
53. Pseudocode
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward
workers = [Worker() for i in range(20)]
policy = initial_policy()
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, rewards, seeds)
actions
observations
rewards
54. Pseudocode
@ray.remote
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward
workers = [Worker() for i in range(20)]
policy = initial_policy()
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, rewards, seeds)
actions
observations
rewards
55. Pseudocode
@ray.remote
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward
workers = [Worker.remote() for i in range(20)]
policy = initial_policy()
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, rewards, seeds)
actions
observations
rewards
56. Pseudocode
@ray.remote
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward
workers = [Worker.remote() for i in range(20)]
policy = initial_policy()
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation.remote(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, rewards, seeds)
actions
observations
rewards
57. Pseudocode
@ray.remote
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward
workers = [Worker.remote() for i in range(20)]
policy = initial_policy()
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation.remote(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, ray.get(rewards), seeds)
actions
observations
rewards
58. Evolution strategies on Ray
10 nodes 20 nodes 30 nodes 40 nodes 50 nodes
Reference 97K 215K 202K N/A N/A
Ray 152K 285K 323K 476K 571K
The Ray implementation takes half the amount of
code and was implemented in a couple of hours
Simulator steps per second:
60. Ray + Apache Spark
● Complementary
○ Spark handles data processing, “classic” ML algorithms
○ Ray handles emerging AI algos., e.g. reinforcement learning (RL)
● Interoperability through object store based on Apache Arrow
○ Common data layout
○ Supports multiple languages
61. Ray is a system for AI Applications
● Ray is open source! https://github.com/ray-project/ray
● We have a v0.1 release!
pip install ray
● We’d love your feedback
Philipp
Ion
Alexey
Stephanie
Johann Richard
William Mehrdad
Mike
Robert