Apache Horn is neuron-centric programming model and execution framework, inspired by Google's DistBelief, supports both data and model parallelism for training large models with massive datasets.
1. Apache Horn (Incubating)
a Large-scale Deep Learning Platform
Edward J. Yoon @eddieyoon
Oct 15, 2015 @ R3 Diva-Hall, Samsung Electronics
2. I am ..
● Member of Apache Software Foundation
● PMC member and committer, or Mentor of
○ Apache Incubator,
○ Apache Hama, Apache Horn, Apache MRQL,
○ and Apache Rya, Apache BigTop.
● Cloud Tech Lab, Software R&D Center.
○ HPC Cloud (Network Analysis, ML & DNN)
3. What’s Apache Horn?
Horn [hɔ:n]: 얼(혼) 魂 = Mind
● Horn is a clone project of Google’s DistBelief, supports
both data and model parallelism.
○ Apache Incubator Project (Since Sep 2015)
○ 9 initial members are from Samsung Electronics, Microsoft, Cldi Inc,
LINE plus, TUM, KAIST, …, etc.
4. Google’s DistBelief
● GPUs are expensive, both to buy and to rent.
● Most GPUs can only hold a relatively small amount of data in
memory and CPU-to-GPU data transfer is very slow.
○ Therefore, the training speed-up is small when the model
does not fit in GPU memory.
● DistBelief is a framework for training deep neural networks
that avoids GPUs-only approach (for the above reasons) and
solves the problems with a large number of examples and
dimensions (e.g., high-resolution images).
5. Google’s DistBelief
● It supports both Data and Model Parallelism
○ Data Parallelism: The training data is partitioned
across several machines each having its own replica
of the model. Each model trains with its partition of
the data in parallel.
○ Model Parallelism: The layers of each model
replica are distributed across machines.
7. What’s BSP?
● Bulk Synchronous Parallel
It was developed by Leslie Valiant of
Harvard University during the 1980s.
● Iteratively:
a. Local Computation
b. Communication (Message Passing)
c. Global Barrier Synchronization
8. DistBelief: Batch Optimization
Coordinator 1) finds stragglers
(slow tasks) for better load
balancing and resource usage. It
similar to Google MapReduce’s
“Backup Tasks” 2) reduces
communication overheads between
the central Parameter Server and
workers something like
Aggregators.
9. As a result:
● CPU cluster to train deep networks significantly faster
than a GPU, w/o limitation on the max size of model.
○ CPU cluster is 10x faster than a GPU.
● Trained a model with over 1 billion parameters to
achieve better than state-of-the-art performance on
ImageNet challenge.
Nov 2012: IBM simulates 530 billion neurons, 100 trillion synapses
* 1,572,864 processor cores, 1.5 PB memory, and 6,291,456 threads.
10. Wait, .. Why do we need this?
● Deep learning is likely to spur other applications beyond
speech and image recognition in the nearer term.
○ e.g., medicine, manufacturing, and transportation.
11. and, it’s a Closed Source Software
● We needs to solve size matters (training set and the
size of neural networks), but many OSS such as Caffe,
DeepDist, Spark MLlib, Deeplearning4j, and
NeuralGiraph are data or model parallel only.
● So, we started to clone the Google’s DistBelief, called
Apache Horn (Incubating).
12. The key idea of implementation
● .. is to use existing OSS distributed systems
○ Apache Hadoop: Distributed File System, Resource
Manager.
○ Apache Hama: general-purpose BSP computing
engine on top of Hadoop, which can be used for
Both data-parallel and graph-parallel in flexible
way.
13. Apache Hama: BSP framework
BSP framework
on Hama or YARN
Hadoop HDFS
Task 1 Task 2 Task 3 Task N...
Like MapReduce, Apache Hama
BSP framework schedules tasks
according to the distance between
the input data of the tasks and
request nodes.
BSP tasks are globally
synchronized after performing
computations on local data and
communication actions.
14. Global Regional Synchronization
BSP framework
on Hama or YARN
Hadoop HDFS
Task 1
Task 2
Task 3
Task 4
Like MapReduce, Apache Hama
BSP framework schedules tasks
according to the distance between
the input data of the tasks and
request nodes.
All tasks within the same
group are synchronized
with each others. Each
group works
asynchronously as
independent BSP job.
...
Task 6
Task 5
15. Async mini-batches using Regional Synchronization
BSP framework
on Hama or YARN
Hadoop HDFS
Task 1
Task 2
Task 3
Task 4
Like MapReduce, Apache Hama
BSP framework schedules tasks
according to the distance between
the input data of the tasks and
request nodes.
...
Task 5
Task 6
Each group performs
minibatch in BSP
paradigm, and interacts
with Parameter Server
asynchronously.
Parameter Swapping
Parameter Server Parameter Server
16. BSP framework
on Hama or YARN
Hadoop HDFS
Task 1
Task 2
Task 3
Task 4
Like MapReduce, Apache Hama
BSP framework schedules tasks
according to the distance between
the input data of the tasks and
request nodes.
...
Task 5
Task 6
One of group
works as a
Coordinator
Each group performs
minibatch in BSP
paradigm, and interacts
with Parameter Server
asynchronously.
Parameter Swapping
Async mini-batches using Regional Synchronization
Parameter Server Parameter Server
17. Neuron-centric Programming APIs
User-defined neuron-centric
programming APIs:
The activation and cost
functions computes the
propagated information, or
error messages and sends
its updates to Parameter
Server (but not fully
designed yet).
Similar to Google’s Pregel.
18. Job Configuration APIs
/*
* Sigmoid Activation Function
*/
public static class Sigmoid extends ActivationFunction {
public double apply(double input) {
return 1.0 / (1 + Math.exp(-input));
}
}
...
public static void main(String[] args) {
ANNJob ann = new ANNJob();
// Initialize the topology of the model
ann.addLayer(int featureDimension, Sigmoid.class, int numOfTasks);
ann.addLayer(int featureDimension, Step.class, int numOfTasks);
ann.addLayer(int featureDimention, Tanh.class, int numOfTasks);
…
ann.setCostFunction(CrossEntropy.class);
..
}
19. Job Submission Flow
BSP framework on
Apache Hama or YARN
clusters
Task 1
Task 4
Task 7
Task 2 Task 3
Task 5 Task 6
Task 8 Task 9
Parameter
Server
Parameter
Server
Parameter Swapping
One of worker
group works
as a Coordinator
Hadoop HDFS
Data
Parallelism
Model
Parallelism
Apache Horn
Client and Web UI
User’s
ANN Job