building intelligent systems with large scale deep learning

Building Intelligent Systems with
Large Scale Deep Learning
Jeff Dean
Google Brain team
g.co/brain
Presenting the work of many people at Google

Google Brain Team Mission:
Make Machines Intelligent.
Improve People’s Lives.

How do we do this?
● Conduct long-term research (>200 papers, see g.co/brain & g.co/brain/papers)
○ Unsupervised learning of cats, Inception, word2vec, seq2seq, DeepDream, image
captioning, neural translation, Magenta, ML for robotics control, healthcare, …
● Build and open-source systems like TensorFlow (see tensorflow.org and
https://github.com/tensorflow/tensorflow)
● Collaborate with others at Google and Alphabet to get our work into
the hands of billions of people (e.g., RankBrain for Google Search, GMail
Smart Reply, Google Photos, Google speech recognition, Google Translate, Waymo, …)
● Train new researchers through internships and the Google Brain
Residency program

Main Research Areas
● General Machine Learning Algorithms and Techniques
● Computer Systems for Machine Learning
● Natural Language Understanding
● Perception
● Healthcare
● Robotics
● Music and Art Generation

research.googleblog.com/2017/01
/the-google-brain-team-looking-ba
ck-on.html

Accuracy
Scale (data size, model size)
1980s and 1990s
neural networks
other approaches

more
computeAccuracy
neural networks
other approaches
1980s and 1990s

more
computeAccuracy
neural networks
other approaches
Now

Growth of Deep Learning at Google
and many more . . . .
Directories containing model description files

Experiment Turnaround Time and Research Productivity
● Minutes, Hours:
○ Interactive research! Instant gratification!
● 1-4 days
○ Tolerable
○ Interactivity replaced by running many experiments in parallel
● 1-4 weeks
○ High value experiments only
○ Progress stalls
● >1 month
○ Don’t even try

Google Confidential + Proprietary (permission granted to share within NIST)
Build the right tools

Open, standard software for
general machine learning
Great for Deep Learning in
particular
First released Nov 2015
Apache 2.0 license
http://tensorflow.org/
and
https://github.com/tensorflow/tensorflow

TensorFlow Goals
Establish common platform for expressing machine learning ideas and systems
Make this platform the best in the world for both research and production use
Open source it so that it becomes a platform for everyone, not just Google

Near-linear performance gains with each additional 8x NVIDIA® Tesla® K80
server added to the cluster

TensorFlow supports many platforms
Raspberry Pi
AndroidiOS
1st-gen TPU
GPUCPU
Cloud TPU

TensorFlow supports many languages
Java

2010
2013
2013
2011
2013
late 2015

ML is done in many places
TensorFlow GitHub stars by GitHub user profiles w/ public locations
Source: http://jrvis.com/red-dwarf/?user=tensorflow&repo=tensorflow

TensorFlow: A Vibrant Open-Source Community
● Rapid development, many outside contributors
○ 475+ non-Google contributors to TensorFlow 1.0
○ 15,000+ commits in 15 months
○ Many community created tutorials, models, translations, and projects
■ ~7,000 GitHub repositories with ‘TensorFlow’ in the title
● Direct engagement between community and TensorFlow team
○ 5000+ Stack Overflow questions answered
○ 80+ community-submitted GitHub issues responded to weekly
● Growing use in ML classes: Toronto, Berkeley, Stanford, ...

Confidential & ProprietaryGoogle Cloud Platform 24
[glacier]
Google Photos
24

Reuse same model for completely
different problems
Same basic model structure
trained on different data,
useful in completely different contexts
Example: given image → predict interesting pixels

We have tons of vision problems
Image search, StreetView, Satellite Imagery,
Translation, Robotics, Self-driving Cars,
www.google.com/sunroof

Computers can now see
Large implications for healthcare

MEDICAL IMAGING
Using similar model for detecting diabetic
retinopathy in retinal images

Performance on par or slightly better than the median of 8 U.S.
board-certified ophthalmologists (F-score of 0.95 vs. 0.91).
http://research.googleblog.com/2016/11/deep-learning-for-detection-of-diabetic.html

Computers can now see
Large implications for robotics

Combining Vision with Robotics
“Deep Learning for Robots: Learning
from Large-Scale Interaction”, Google
Research Blog, March, 2016
“Learning Hand-Eye Coordination for
Robotic Grasping with Deep Learning
and Large-Scale Data Collection”,
Sergey Levine, Peter Pastor, Alex
Krizhevsky, & Deirdre Quillen,
Arxiv, arxiv.org/abs/1603.02199

Confidential + Proprietary
Self-Supervised and End-to-end Pose Estimation

Confidential + Proprietary
TCN + Self-Supervision (No Labels!)

Scientific Applications of ML

Predicting Properties of Molecules
Toxic?
Bind with a given protein?
Quantum properties.
Message
Passing Neural
NetAspirin
● Chemical space is too big, so chemists often rely on virtual screening.
● Machine Learning can help search this large space.
● Molecules are graphs, nodes=atoms and edges=bonds (and other stuff)
● Message Passing Neural Nets unify and extend many neural net models
that are invariant to graph symmetries
● State of the art results predicting output of expensive quantum chemistry
calculations, but ~300,000 times faster
https://research.googleblog.com/2017/04/predicting-properties-of-molecules-with.html and
https://arxiv.org/abs/1702.05532 and https://arxiv.org/abs/1704.01212 (latter to appear in ICML 2017)

Measuring live cells with
image to image regression
“Seeing More”

Enabling technology: Image to image regression
Input True Depth Predicted Depth

Depth prediction on portrait data

Input Saturation Defocus
Applications for camera effects

Predict cellular markers
from transmission microscopy?

Human cancer cells / DIC / nuclei (blue)
and cell mask (green)

Human iPSC neurons / phase contrast /
nuclei (blue), dendrites (green), and
axons (red)

Scaling language understanding models

Sequence-to-Sequence Model
A B C
v
D __ X Y Z
X Y Z Q
Input sequence
Target sequence
[Sutskever & Vinyals & Le NIPS 2014]
Deep LSTM

Sequence-to-Sequence Model: Machine Translation
v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How
Quelle est taille?votre <EOS>

v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How
tall
How

v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall are
Quelle est taille?votre <EOS> How tall

v
Input sentence
Target sentence
[Sutskever & Vinyals & Le NIPS 2014] How tall you?are
Quelle est taille?votre <EOS> How aretall

v
Input sentence
[Sutskever & Vinyals & Le NIPS 2014]
At inference time:
Beam search to choose most probable
over possible output sequences

Small
Feed-Forward
Neural Network
Incoming Email
Activate
Smart Reply?
yes/no
Smart Reply
Google Research Blog
- Nov 2015

Small
Feed-Forward
Neural Network
Incoming Email
Activate
Smart Reply?
Deep Recurrent
Neural Network
Generated Replies
yes/no
Smart Reply
Google Research Blog
- Nov 2015

April 1, 2009: April Fool’s Day joke
Nov 5, 2015: Launched Real Product
Feb 1, 2016: >10% of mobile Inbox replies
Smart Reply

Sequence to Sequence model
applied to Google Translate

https://arxiv.org/abs/1609.08144

Encoder LSTMs
Decoder LSTMs
<s> Y1 Y3
SoftMax
Y1 Y2 </s>
X3 X2 </s>
8 Layers
Gpu1
Gpu2
Gpu2
Gpu3
+ + +
Gpu8
Attention
+ ++
Gpu1
Gpu2
Gpu3
Gpu8
Google Neural Machine Translation Model
One
model
replica:
one
machine
w/ 8
GPUs

Model + Data Parallelism
...
Params
Many
replicas
Parameters
distributed across
many parameter
server machines
Params Params
...

neural (GNMT)
phrase-based (PBMT)
English
>
Spanish
English
>
French
English
>
Chinese
Spanish
>
English
French
>
English
Chinese
>
English
Translation model
Translationquality
0
1
2
3
4
5
6
human
perfect translation
Neural Machine Translation
Closes gap between old system
and human-quality translation
by 58% to 87%
Enables better communication
across the world
research.googleblog.com/2016/09/a-neural-network-for-machine.html

BACKTRANSLATION
FROM JAPANESE (en->ja->en)
Kilimanjaro is a mountain of 19,710 feet covered with snow, which is
said to be the highest mountain in Africa. The summit of the west is
called “Ngaje Ngai” God ‘s house in Masai language. There is a dried
and frozen carcass of a leopard near the summit of the west. No one
can explain what the leopard was seeking at that altitude.
Kilimanjaro is 19,710 feet of the mountain covered with snow, and
it is said that the highest mountain in Africa. Top of the west,
“Ngaje Ngai” in the Maasai language, has been referred to as the
house of God. The top close to the west, there is a dry, frozen
carcass of a leopard. Whether the leopard had what the demand at
that altitude, there is no that nobody explained.
Google Neural Machine Translation (new system):
Phrase-Based Machine Translation (old system):

Automated machine learning
(“learning to learn”)

Current:
Solution = ML expertise + data + computation

Current:
Solution = ML expertise + data + computation
Can we turn this into:
Solution = data + 100X computation
???

Early encouraging signs
Trying multiple different approaches:
(1) RL-based architecture search
(2) Model architecture evolution
(3) Learn how to optimize

Idea: model-generating model trained via RL
(1) Generate ten models
(2) Train them for a few hours
(3) Use loss of the generated models as reinforcement
learning signal
Appeared in ICLR 2017
arxiv.org/abs/1611.01578

CIFAR-10 Image Recognition Task

“Normal” LSTM cell
Cell discovered by
architecture search
Penn Tree Bank Language Modeling Task

Learn2Learn: Learn the Optimization Update Rule
Neural Optimizer Search using Reinforcement Learning,
Irwan Bello, Barret Zoph, Vijay Vasudevan, and Quoc Le. To appear in ICML 2017

More computational power needed
Deep learning is transforming how we
design computers

Special computation properties
reduced
precision
ok
about 1.2
× about 0.6
about 0.7
1.21042
× 0.61127
0.73989343
NOT

handful of
specific
operations
× =
reduced
precision
ok
about 1.2
× about 0.6
about 0.7
1.21042
× 0.61127
0.73989343
NOT
Special computation properties

Tensor Processing Unit v2
Google-designed device for neural net training and inference
● 180 teraflops of computation, 64 GB of memory
Revealed in May at Google I/O

TPU Pod
64 2nd-gen TPUs
11.5 petaflops
4 terabytes of memory

Will be Available through Google Cloud
Cloud TPU - virtual machine w/180 TFLOPS TPUv2 device attached
Programmed via TensorFlow
Same program will run with only minor
modifications on CPUs, GPUs, & TPUs

Making 1000 Cloud TPUs available for free to top researchers who are
committed to open machine learning research
We’re excited to see what researchers will do with much more computation!
g.co/tpusignup

Custom ML models Pre-trained ML models
Machine Learning
Engine
TensorFlow
Vision API
Translation
API
Natural
Language API
Speech API Jobs API
Machine Learning in Google Cloud
Video
Intelligence API

Machine Learning for
Higher Performance Machine Learning
Models

Device Placement with Reinforcement Learning
Measured time
per step gives
RL reward signal
Placement model (trained
via RL) gets graph as input
+ set of devices, outputs
device placement for each
graph node
Device Placement Optimization with Reinforcement Learning,
Azalia Mirhoseini, Hieu Pham, Quoc Le, Mohammad Norouzi, Samy Bengio, Benoit Steiner, Yuefeng Zhou,
Naveen Kumar, Rasmus Larsen, and Jeff Dean, to appear in ICML 2017, arxiv.org/abs/1706.04972
+19.7% faster vs. expert human for InceptionV3+19.3% faster vs. expert human for NMT model

more
computeAccuracy
neural networks
other approaches
Future

Example queries of the future
Which of these eye
images shows
symptoms of diabetic
retinopathy?
Please fetch me a cup
of tea from the kitchen
Describe this video
in Spanish
Find me documents related to
reinforcement learning for
robotics and summarize them
in German

Conclusions
Deep neural networks are making significant strides in
speech, vision, language, search, robotics, healthcare, …
If you’re not considering how to use deep neural nets to solve
your problems, you almost certainly should be

g.co/brain
More info about our work

building intelligent systems with large scale deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to building intelligent systems with large scale deep learning

Similar to building intelligent systems with large scale deep learning (20)

More from mustafa sarac

More from mustafa sarac (20)

Recently uploaded

Recently uploaded (20)

building intelligent systems with large scale deep learning