Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark

•

34 likes•27,599 views

We took Google’s recently open sourced deep learning library called “TensorFlow”, which is single-machine, and built a distributed implementation on Spark. We did this by using the abstraction layer called “Distributed DataFrame”, or DDF. TensorFlow started as an internal research project at Google and has since spread to be used across the organization. Google open sourced the project in Nov. 2015, but kept the distributed version of it closed. There are other deep learning frameworks that have been implemented on Spark, but TensorFlow’s API is elegant and powerful and has proven to work at Google scale. Putting TensorFlow on Spark makes it easier for companies to harness the power of deep learning in their own businesses without sacrificing scalability. KEY TAKE-AWAYS: (1) Google’s TensorFlow release is a single-machine implementation, (2) We have built a distributed implementation in Spark which allows TensorFlow to scale horizontally. This talk was presented at the Spark Summit East 2016; watch it here: https://www.youtube.com/watch?v=-QtcP3yRqyM&index=5&list=PLMpF-Q2WyoGudT3z-VPGs-6nnv45cod_R

Technology

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Christopher Nguyen, PhD. | Chris Smith | Ushnish De | Vu Pham | Nanda Kishore
DISTRIBUTED TENSORFLOW:
SCALING GOOGLE'S
DEEP LEARNING LIBRARY ON SPARK

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Feb 2016
Distributed
TensorFlow
on Spark
Spark Summit East
June 2016
TensorFlow
Ensembles
Spark
Hadoop Summit
Nov 2015
Distributed
DL on Spark
+ Tachyon
Strata NYC

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
1. Motivation
2. Architecture
3. Experimental Parameters
4. Results
5. Lesson Learned
6. Summary
Outline

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Motivation

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Google TensorFlow
1
Spark Deployments
2

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
TensorFlow Review

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Computational Graph

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Graph of a Neural Network

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Google uses TensorFlow for the following:
★ Search
★ Advertising
★ Speech Recognition
★ Photos
★ Maps
★ StreetView—Blurring out house numbers
★ Translate
★ YouTube
Why TensorFlow?

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
DistBelief
Large Scale Distributed Deep Networks, Jeff Dean et al, NIPS 2012
1. Compute
Gradients
2. Update Params  
(Descent)

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Arimo’s 3 Pillars
App Development
BIG APPS
PREDICTIVE
ANALYTICS
NATURAL
INTERFACES
COLLABO-
RATION
Big Data + Big Compute

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Architecture

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
TensorSpark Architecture

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Spark & Tachyon Architectural Options
Model as Broadcast
Variable
Spark-Only
Model Stored
as
Tachyon File
Tachyon-
Storage
Model Hosted
in
HTTP Server
Param-
Server
Model Stored
and Updated
by Tachyon
Tachyon-
CoProc

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Experimental
Parameters

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Data Set Size Parameters Hidden  
Layers
Hidden 
Units
MNIST 50K x 784
=39.2M
1.8M 2 1024
Higgs
10M x 28
=280M 8.5M 3 2048
Molecular
150K x 2871
=430M 14.2M 3 2048
Datasets

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Experimental Dimensions
Dataset Size1
Model Size2
Compute Size3
CPU vs. GPU4

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Results

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
MNIST
Higgs
Molecular

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Lessons Learned

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
GPU vs CPU
★ GPU is 8-10x faster on local; lower gains with Spark but better for larger
data sets
★ On local: GPU 8–10x faster
★ On distributed: GPU 2–3x faster, more with larger datasets
★ GPU memory is limited > Performance impact if hit
★ AWS has old GPUs, need to build TF from source

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Distributed Gradients
★ Initial warmup phase on param server needed
★ “Wobble” effect may still happen with convergence due to mini-batches
✦ Drop “obsolete” gradients
✦ Reduce learning rate and initial parameters
✦ Reduce mini-batch size
★ Work to maximize network throughput, e.g., model compression

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Deployment Configuration
★ Should run only 1 TF instance per machine
★ Compress gradients/parameters (e.g., as binaries) to reduce network overhead
★ Batch-size tradeoff: convergence vs. communication overhead

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Conclusions

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Conclusions
★ This implementation best for large datasets & small models
✦ For large models, we’re looking into  
a) model parallelism b) model compression
★ If data fits on single machine, single-machine GPU would be best (if you have it)
★ For large datasets, leverage both speed and scale using Spark cluster with GPUs
★ Future Work: a) Tachyon b) Ensembles (Hadoop Summit - June 2016)

@pentagoniac @arimoinc #SparkSummit #DDF arimo.com
Top 10 World’s 
Most Innovative  
Companies  
in Data Science
https://github.com/adatao/tensorspark
VISIT US
BOOTH
K8 We’re OPEN SOURCING
our work at

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Gen AI in Business - Global Trends Report 2024.pdfAddepto

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

"ML in Production",Oleksandr BaganFwdays

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

"Debugging python applications inside k8s environment", Andrii Soldatenko

SIP trunking in Janus @ Kamailio World 2024

What's New in Teams Calling, Meetings and Devices March 2024

Connect Wave/ connectwave Pitch Deck Presentation

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Unraveling Multimodality with Large Language Models.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Are Multi-Cloud and Serverless Good or Bad?

Artificial intelligence in cctv survelliance.pptx

Gen AI in Business - Global Trends Report 2024.pdf

DevEX - reference for building teams, processes, and platforms

"ML in Production",Oleksandr Bagan

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Scanning the Internet for External Cloud Exposures via SSL Certs

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Designing IA for AI - Information Architecture Conference 2024

Anypoint Exchange: It’s Not Just a Repo!

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark

1. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Christopher Nguyen, PhD. | Chris Smith | Ushnish De | Vu Pham | Nanda Kishore DISTRIBUTED TENSORFLOW: SCALING GOOGLE'S DEEP LEARNING LIBRARY ON SPARK

2. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Feb 2016 Distributed TensorFlow on Spark Spark Summit East June 2016 TensorFlow Ensembles Spark Hadoop Summit Nov 2015 Distributed DL on Spark + Tachyon Strata NYC

3. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com 1. Motivation 2. Architecture 3. Experimental Parameters 4. Results 5. Lesson Learned 6. Summary Outline

4. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Motivation

5. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Google TensorFlow 1 Spark Deployments 2

6. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com TensorFlow Review

7. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Computational Graph

8. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Graph of a Neural Network

9. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Google uses TensorFlow for the following: ★ Search ★ Advertising ★ Speech Recognition ★ Photos ★ Maps ★ StreetView—Blurring out house numbers ★ Translate ★ YouTube Why TensorFlow?

10. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com DistBelief Large Scale Distributed Deep Networks, Jeff Dean et al, NIPS 2012 1. Compute Gradients 2. Update Params   (Descent)

11. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Arimo’s 3 Pillars App Development BIG APPS PREDICTIVE ANALYTICS NATURAL INTERFACES COLLABO- RATION Big Data + Big Compute

12. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Architecture

13. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com TensorSpark Architecture

14. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Spark & Tachyon Architectural Options Model as Broadcast Variable Spark-Only Model Stored as Tachyon File Tachyon- Storage Model Hosted in HTTP Server Param- Server Model Stored and Updated by Tachyon Tachyon- CoProc

15. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Experimental Parameters

16. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Data Set Size Parameters Hidden   Layers Hidden  Units MNIST 50K x 784 =39.2M 1.8M 2 1024 Higgs 10M x 28 =280M 8.5M 3 2048 Molecular 150K x 2871 =430M 14.2M 3 2048 Datasets

17. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Experimental Dimensions Dataset Size1 Model Size2 Compute Size3 CPU vs. GPU4

18. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Results

19. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com

20. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com

21. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com MNIST Higgs Molecular

22. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com

23. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com

24. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Lessons Learned

25. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com GPU vs CPU ★ GPU is 8-10x faster on local; lower gains with Spark but better for larger data sets ★ On local: GPU 8–10x faster ★ On distributed: GPU 2–3x faster, more with larger datasets ★ GPU memory is limited > Performance impact if hit ★ AWS has old GPUs, need to build TF from source

26. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Distributed Gradients ★ Initial warmup phase on param server needed ★ “Wobble” effect may still happen with convergence due to mini-batches ✦ Drop “obsolete” gradients ✦ Reduce learning rate and initial parameters ✦ Reduce mini-batch size ★ Work to maximize network throughput, e.g., model compression

27. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Deployment Configuration ★ Should run only 1 TF instance per machine ★ Compress gradients/parameters (e.g., as binaries) to reduce network overhead ★ Batch-size tradeoff: convergence vs. communication overhead

28. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Conclusions

29. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Conclusions ★ This implementation best for large datasets & small models ✦ For large models, we’re looking into   a) model parallelism b) model compression ★ If data fits on single machine, single-machine GPU would be best (if you have it) ★ For large datasets, leverage both speed and scale using Spark cluster with GPUs ★ Future Work: a) Tachyon b) Ensembles (Hadoop Summit - June 2016)

30. @pentagoniac @arimoinc #SparkSummit #DDF arimo.com Top 10 World’s  Most Innovative   Companies   in Data Science https://github.com/adatao/tensorspark VISIT US BOOTH K8 We’re OPEN SOURCING our work at

Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Distributed TensorFlow: Scaling Google's Deep Learning Library on Spark