Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark

We took Google’s recently open sourced deep learning library called “TensorFlow”, which is single-machine, and built a distributed implementation on Spark. We did this by using the abstraction layer called “Distributed DataFrame”, or DDF. TensorFlow started as an internal research project at Google and has since spread to be used across the organization. Google open sourced the project in Nov. 2015, but kept the distributed version of it closed. There are other deep learning frameworks that have been implemented on Spark, but TensorFlow’s API is elegant and powerful and has proven to work at Google scale. Putting TensorFlow on Spark makes it easier for companies to harness the power of deep learning in their own businesses without sacrificing scalability.

KEY TAKE-AWAYS: (1) Google’s TensorFlow release is a single-machine implementation, (2) We have built a distributed implementation in Spark which allows TensorFlow to scale horizontally.

This talk was presented at the Spark Summit East 2016; watch it here:

Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark

  1. 1. @pentagoniac @arimoinc #SparkSummit #DDF Christopher Nguyen, PhD. | Chris Smith | Ushnish De | Vu Pham | Nanda Kishore DISTRIBUTED TENSORFLOW: SCALING GOOGLE'S DEEP LEARNING LIBRARY ON SPARK
  2. 2. @pentagoniac @arimoinc #SparkSummit #DDF Feb 2016 Distributed TensorFlow on Spark Spark Summit East June 2016 TensorFlow Ensembles Spark Hadoop Summit Nov 2015 Distributed DL on Spark + Tachyon Strata NYC
  3. 3. @pentagoniac @arimoinc #SparkSummit #DDF 1. Motivation 2. Architecture 3. Experimental Parameters 4. Results 5. Lesson Learned 6. Summary Outline
  4. 4. @pentagoniac @arimoinc #SparkSummit #DDF Motivation
  5. 5. @pentagoniac @arimoinc #SparkSummit #DDF Google TensorFlow 1 Spark Deployments 2
  6. 6. @pentagoniac @arimoinc #SparkSummit #DDF TensorFlow Review
  7. 7. @pentagoniac @arimoinc #SparkSummit #DDF Computational Graph
  8. 8. @pentagoniac @arimoinc #SparkSummit #DDF Graph of a Neural Network
  9. 9. @pentagoniac @arimoinc #SparkSummit #DDF Google uses TensorFlow for the following: ★ Search ★ Advertising ★ Speech Recognition ★ Photos ★ Maps ★ StreetView—Blurring out house numbers ★ Translate ★ YouTube Why TensorFlow?
  10. 10. @pentagoniac @arimoinc #SparkSummit #DDF DistBelief Large Scale Distributed Deep Networks, Jeff Dean et al, NIPS 2012 1. Compute Gradients 2. Update Params 
  11. 11. @pentagoniac @arimoinc #SparkSummit #DDF Arimo’s 3 Pillars App Development BIG APPS PREDICTIVE ANALYTICS NATURAL INTERFACES COLLABO- RATION Big Data + Big Compute
  12. 12. @pentagoniac @arimoinc #SparkSummit #DDF Architecture
  13. 13. @pentagoniac @arimoinc #SparkSummit #DDF TensorSpark Architecture
  14. 14. @pentagoniac @arimoinc #SparkSummit #DDF Spark & Tachyon Architectural Options Model as Broadcast Variable Spark-Only Model Stored as Tachyon File Tachyon- Storage Model Hosted in HTTP Server Param- Server Model Stored and Updated by Tachyon Tachyon- CoProc
  15. 15. @pentagoniac @arimoinc #SparkSummit #DDF Experimental Parameters
  16. 16. @pentagoniac @arimoinc #SparkSummit #DDF Data Set Size Parameters Hidden 
 Layers Hidden
 Units MNIST 50K x 784 =39.2M 1.8M 2 1024 Higgs 10M x 28 =280M 8.5M 3 2048 Molecular 150K x 2871 =430M 14.2M 3 2048 Datasets
  17. 17. @pentagoniac @arimoinc #SparkSummit #DDF Experimental Dimensions Dataset Size1 Model Size2 Compute Size3 CPU vs. GPU4
  18. 18. @pentagoniac @arimoinc #SparkSummit #DDF Results
  19. 19. @pentagoniac @arimoinc #SparkSummit #DDF
  20. 20. @pentagoniac @arimoinc #SparkSummit #DDF
  21. 21. @pentagoniac @arimoinc #SparkSummit #DDF MNIST Higgs Molecular
  22. 22. @pentagoniac @arimoinc #SparkSummit #DDF
  23. 23. @pentagoniac @arimoinc #SparkSummit #DDF
  24. 24. @pentagoniac @arimoinc #SparkSummit #DDF Lessons Learned
  25. 25. @pentagoniac @arimoinc #SparkSummit #DDF GPU vs CPU ★ GPU is 8-10x faster on local; lower gains with Spark but better for larger data sets ★ On local: GPU 8–10x faster ★ On distributed: GPU 2–3x faster, more with larger datasets ★ GPU memory is limited > Performance impact if hit ★ AWS has old GPUs, need to build TF from source
  26. 26. @pentagoniac @arimoinc #SparkSummit #DDF Distributed Gradients ★ Initial warmup phase on param server needed ★ “Wobble” effect may still happen with convergence due to mini-batches ✦ Drop “obsolete” gradients ✦ Reduce learning rate and initial parameters ✦ Reduce mini-batch size ★ Work to maximize network throughput, e.g., model compression
  27. 27. @pentagoniac @arimoinc #SparkSummit #DDF Deployment Configuration ★ Should run only 1 TF instance per machine ★ Compress gradients/parameters (e.g., as binaries) to reduce network overhead ★ Batch-size tradeoff: convergence vs. communication overhead
  28. 28. @pentagoniac @arimoinc #SparkSummit #DDF Conclusions
  29. 29. @pentagoniac @arimoinc #SparkSummit #DDF Conclusions ★ This implementation best for large datasets & small models ✦ For large models, we’re looking into 
 a) model parallelism b) model compression ★ If data fits on single machine, single-machine GPU would be best (if you have it) ★ For large datasets, leverage both speed and scale using Spark cluster with GPUs ★ Future Work: a) Tachyon b) Ensembles (Hadoop Summit - June 2016)
  30. 30. @pentagoniac @arimoinc #SparkSummit #DDF Top 10 World’s
 Most Innovative 
 in Data Science VISIT US BOOTH K8 We’re OPEN SOURCING our work at