Call Girls Indiranagar Just Call š 7737669865 š Top Class Call Girl Service B...
Ā
Strata London - Deep Learning 05-2015
1. Deep learning
made doubly easy with
reusable deep features
Carlos Guestrin
Dato, CEO
University of Washington, Amazon Prof. of ML
2. Successful apps
in 2015 must be
intelligent
Machine
learning
key to next-gen apps
ā¢ Recommenders
ā¢ Fraud detection
ā¢ Ad targeting
ā¢ Financial models
ā¢ Personalized medicine
ā¢ Churn prediction
ā¢ Smart UX
(video & text)
ā¢ Personal assistants
ā¢ IoT
ā¢ Socials nets
ā¢ ā¦Last decade:
Data management
Now:
Intelligent apps
?
Last 5 years:
Traditional analytics
3. The ML pipeline circa 2013
DATA
ML
Algorithm
My curve is
better than
your curve
Write a
paper
4. 2015: Production ML pipeline
DATA
YourWebServiceor
IntelligentApp
ML
Algorithm
Data
cleaning
&
feature
eng
Offline
eval &
Parameter
search
Deploy
model
Data engineering Data intelligence Deployment
Using deep learning
Goal: Platform to help implement, manage, optimize entire pipeline
7. 7
Simple example: Spam filtering
ā¢ A user walks into an emailā¦
- Will she thinks its spam?
ā¢ Whatās the probability email is spam?
Text of email
User info
Source info
Input: x
MODEL
Yes!
No
Output:
Probability of y
8. 8
Feature engineering:
the painful black art of transforming raw inputs
into useful inputs for ML algorithm
ā¢ E.g., important words, stemming text, complex
transformation of inputs,ā¦
MODEL
Yes!
No
Output:
Probability of y
Feature
extraction
Features: Ī¦(x)
Text of email
User info
Source info
Input: x
10. 10
Linear classifiers
ā¢ Most common classifier
- Logistic regression
- SVMs
- ā¦
ā¢ Decision correspond to
hyperplane:
- Line in high dimensional
space
w0 + w1 x1 + w2 x2 > 0 w0 + w1 x1 + w2 x2 < 0
11. 11
Graph representation of classifier:
useful for defining neural networks
x
1
x
2
x
d
y
ā¦
1
w2 w0 + w1 x1 + w2 x2 + ā¦ + wd xd
> 0, output 1
< 0, output 0
Input Output
12. 12
What can a linear classifier represent
x1 OR x2 x1 AND x2
x
1
x
2
1
y
-0.5
1
1
x
1
x
2
1
y
-1.5
1
1
13. 13
What canāt a simple linear classifier represent?
XOR
the counterexample
to everything
Need non-linear features
14. Solving the XOR problem: Adding a layer
XOR = x1 AND NOT x2 OR NOT x1 AND x2
z
1
-0.5
1
-1
z1 z2
z
2
-0.5
-1
1
x
1
x
2
1
y
1 -0.5
1
1
Thresholded to 0 or 1
15. 15
A neural network
ā¢ Layers and layers and layers of linear models and non-linear
transformation
ā¢ Around for about 50 years
- Fell in ādisfavorā in 90s
ā¢ In last few years, big resurgence
- Impressive accuracy on a several benchmark problems
- Powered by huge datasets, GPUs, & modeling/learning alg
improvements
x
1
x
2
1
z
1
z
2
1
y
17. 17
Image features
ā¢ Features = local detectors
- Combined to make prediction
- (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth
18. 18
Many hand create features existā¦
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
19. 19
Standard image classification approach
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
Car?
20. 20
Many hand create features existā¦
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
ā¦ but very painful to design
21. 21
Use neural network to learn features
Each layer learns features, at different levels of abstraction
Y LeCun
MA Ranzato
Deep Learning = Learning Hierarchical Representations
It's deep if it has more than one stage of non-linear feature
transformation
Trainable
Classifier
Low-Level
Feature
Mid-Level
Feature
High-Level
Feature
Feature visualization of convolutional net trained on ImageNet from [ Zeiler & Fergus 2013]
22. 22
Many tricks needed to work wellā¦
ā¢ Different types of layers, connections,ā¦ needed for high accuracy
Krizhevsky et al.
ā12
30. Deep learning score card
Pros
ā¢ Enables learning of features rather
than hand tuning
ā¢ Impressive performance gains on
- Computer vision
- Speech recognition
- Some text analysis
ā¢ Potential for much more impact
Cons
31. Deep learning workflow
Lots of
labeled data
Training set
Validation set
80%
20%
Learn deep
neural net
model
Validate
32. Deep learning score card
Pros
ā¢ Enables learning of features rather
than hand tuning
ā¢ Impressive performance gains on
- Computer vision
- Speech recognition
- Some text analysis
ā¢ Potential for much more impact
Cons
ā¢ Computationally really expensive
ā¢ Requires a lot of data for high
accuracy
ā¢ Extremely hard to tune
- Choice of architecture
- Parameter types
- Hyperparameters
- Learning algorithm
- ā¦
ā¢ Computational + so many choices =
incredibly hard to tune
34. 40
Change image classification approach?
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
Car?
Can we learn features
from data,
even when
we donāt have
data or time?
35. 41
Transfer learning:
Use data from one domain to help learn on another
Lots of data:
Learn
neural net
Great
accuracy on
cat v. dogvs.
Some data: Neural net as
feature extractor
+
Simple classifier
Great accuracy on
101 categories
Old idea, explored for deep learning by Donahue et al. ā14
36. 42
Whatās learned in a neural net
Neural net trained for Task 1: cat vs. dog
Very specific to Task 1
Should be ignored for other tasks
More generic
Can be used as feature extractor
vs.
37. 43
Transfer learning in more detailā¦
Neural net trained for Task 1: cat vs. dog
Very specific to Task 1
Should be ignored for other tasks
More generic
Can be used as feature extractor
Keep weights fixed!
For Task 2, predicting 101 categories, learn only end part
Use simple classifier
e.g., logistic regression, SVMs
Class
?
38. 44
Careful where you cutā¦
Last few layers tend to be too specific
Y LeCun
MA Ranzato
Deep Learning = Learning Hierarchical Representations
It's deep if it has more than one stage of non-linear feature
transformation
Trainable
Classifier
Low-Level
Feature
Mid-Level
Feature
High-Level
Feature
Feature visualization of convolutional net trained on ImageNet from [ Zeiler & Fergus 2013]
Too specific for
car detectionUse these!
39. Transfer learning with deep features
Training set
Validation set
80%
20%
Learn
simple
model
Some
labeled data
Extract
features with
neural net
trained on
different task
Validate
Deploy in
production
42. Simple text classification with bag of words
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
ā¦
Zaire 0
Use simple classifier
e.g., logistic regression, SVMs
Class
?
One āfeatureā per word
43. Word2Vec: Neural network for finding high
dimensional representation per word Mikolov et al. ā13
Skip-gram Model: From a word, predict nearby words in sentence
Awesome learning talk at
Strata
deep
300 dim
representation
300 dim
representation
300 dim
representation
300 dim
representation
300 dim
representation
300 dim
representation
Neural net
Viewed as deep
features
44. 50
Related words placed nearby high dim space
Projecting 300 dim space into 2 dim with PCA (Mikolov et al. ā13)
45. Classifier:
e.g., logistic regression, SVMs with
300 x number_of_words parameters
Class
?
Embed each
word into
300 dim
space
Text classification with word embeddings
aardvark 0
about 2
all 2
Africa 1
apple 0
anxious 0
...
gas 1
...
oil 1
ā¦
Zaire 0
49. 55
DATA
ML
Algorithm
Deployment?
ā¢ Write spec, other team
implements in āproductionā language
o 6-12 months
o Stale/irrelevant model/approach
o 2 teams maintaining 2 systems
Custom
Model
Data Engineers, Data
Architects, DevOps,
App Developers
App
A
P
I
Data Scientist
50. ML deployment requirements
56
Easy to
integrate
Rest API
Scalable
Fault tolerant
Flexible
Any model,
any Python
App
A
P
I
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Dato
Models
Dato
Models
Dato
Models
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Python
Python
Python
51. 57
Do-It-Yourself
ā¢ Web Service layer:
- Tornado, Flask, Keen, Django, ā¦
ā¢ Caching layer:
- Redis, Cassandra, Memcached,
DynamoDb, MySQL, ā¦
ā¢ Logs:
- Logback, LogStash, Splunk, Loggly, ā¦
ā¢ Metrics:
- AWS CloudWatch, Mixpanel, Librato, ā¦
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
A
P
I
C
A
C
H
E
LB
GLC
Model
GLC
Model
GLC
Model
Python
Python
Python
App
52. 58
ā¦ or use Dato Predictive Services
YourWebServiceor
IntelligentApp
ML Model
Dato
Predictive Services
CachingLayer
Predictive
ObjectServer
Serves predictions in a robust, scalable, incremental fashion
Better
ML Model
Serve any model: GraphLab Create, scikit-learn, Python, ā¦
53. ā¢ Out-of-core computation
ā¢ Tools for feature engineering
ā¢ Rich data type support
ā¢ Models built for scale
ā¢ App-oriented toolkits
ā¢ Advanced ML & Extensible
ā¢ Deploy models as low-latency REST services
ā¢ Same code for distributed computation
ā¢ Elastically scale up or out with one command
ā¢ Job monitoring & model management
ā¢ Deploy existing Python code & models
ā¢ Run on AWS EC2 or Hadoop Yarn
SGraph
Create Engine
SFrameCanvas
Machine Learning Toolkits SDK
GraphLab Create Dato Distributed Dato Predictive Services
Predictive Engine
REST Client Direct
Model Mgmt
Distributed Engine
DirectJob Client
Job Mgmt
Dato Platform
55. Deep learning made easy with deep features
Deep learning: exciting ML development
Slow, lots of tuning,
needs lots of data
Deep features: reuse deep models for new domains
Needs less data Faster training times Much simpler tuning
Can still achieve excellent performance
Editor's Notes
So I got started with ML by taking a class. Data -> to ML algo, and then generate a plot.
Of course this isnāt how actual applications are written, but this is often where customers are starting when approaching taking ML to production.