This document discusses using deep learning and deep features to build an app that finds similar images. It begins with an overview of deep learning and how neural networks can learn complex patterns in data. The document then discusses how pre-trained neural networks can be used as feature extractors for other domains through transfer learning. This reduces data and tuning requirements compared to training new deep learning models. The rest of the document focuses on building an image similarity service using these techniques, including training a model with GraphLab Create and deploying it as a web service with Dato Predictive Services.
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
1. Learn to Build an App to
Find Similar Images using
Deep Learning
Piotr Teterwak
Dato, Machine Learning Engineer
2. 2
First things first, installation
http://bit.ly/dato_pydata
import graphlab
sf = graphlab.Sframe()
sa = graphlab.Sarray(range(100)).apply(lambda x: x)
4. Graphlab Create: Production ML Pipeline
DATA
YourWebServiceor
IntelligentApp
ML
Algorithm
Data
cleaning
&
feature
eng
Offline
eval &
Parameter
search
Deploy
model
Data engineering Data intelligence Deployment
Goal: Platform to help implement, manage, optimize entire pipeline
9. 9
Simple example: Spam filtering
• A user opens an email…
- Will she thinks its spam?
• What’s the probability email is spam?
Text of email
User info
Source info
Input: x
MODEL
Yes!
No
Output:
Probability of y
10. 10
Feature engineering:
the painful black art of transforming raw inputs
into useful inputs for ML algorithm
• E.g., important words, complex transformation of input,…
MODEL
Yes!
No
Output:
Probability of y
Feature
extraction
Features: Φ(x)
Text of email
User info
Source info
Input: x
17. 17
Graph representation of classifier:
Useful for defining neural networks
x
1
x
2
x
d
y
…
1
w2 w0 + w1 x1 + w2 x2 + … + wd xd
> 0, output 1
< 0, output 0
Input Output
18. 18
What can a linear classifier represent?
x1 OR x2 x1 AND x2
x
1
x
2
1
y
-0.5
1
1
x
1
x
2
1
y
-1.5
1
1
19. Solving the XOR problem: Adding a layer
XOR = x1 AND NOT x2 OR NOT x1 AND x2
z
1
-0.5
1
-1
z1 z2
z
2
-0.5
-1
1
x
1
x
2
1
y
1 -0.5
1
1
Thresholded to 0 or 1
21. 21
Deep Neural Networks
• Can model any function with enough hidden units.
• This is tremendously powerful: given enough units, it is
possible to train a neural network to solve arbitrarily
difficult problems.
• But also very difficult to train, too many parameters
means too much memory+computation time.
22. 22
Neural Nets and GPU’s
• Many operations in Neural Net training can happen in
parallel
• Reduces to matrix operations, many of which can be
easily parallelized on a GPU.
34. 35
Image features
• Features = local detectors
- Combined to make prediction
- (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth
35. 36
Standard image classification approach
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
Face
36. 37
Many hand crafted features exist…
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
… but very painful to design
37. 38
Change image classification approach?
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$
Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier
e.g., logistic regression, SVMs
FaceCan we learn features
from data?
38. 39
Use neural network to learn features
Input
Learned hierarchy
Output
Lee et al. ‘Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations’ ICML 2009
39. Sample results
• Traffic sign recognition
(GTSRB)
- 99.2% accuracy
• House number recognition
(Google)
- 94.3% accuracy
40
40. Krizhevsky et al. ’12:
60M parameters, won 2012 ImageNet competition
41
44. Deep learning score card
Pros
• Enables learning of features rather
than hand tuning
• Impressive performance gains on
- Computer vision
- Speech recognition
- Some text analysis
• Potential for much more impact
Cons
45. Deep learning workflow
Lots of
labeled data
Training set
Validation set
80%
20%
Learn deep
neural net
model
Validate
46. Deep learning score card
Pros
• Enables learning of features rather
than hand tuning
• Impressive performance gains on
- Computer vision
- Speech recognition
- Some text analysis
• Potential for much more impact
Cons
• Computationally really expensive
• Requires a lot of data for high
accuracy
• Extremely hard to tune
- Choice of architecture
- Parameter types
- Hyperparameters
- Learning algorithm
- …
• Computational + so many choices =
incredibly hard to tune
47. 48
Can we do better?
Input
Learned hierarchy
Output
Lee et al. ‘Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations’ ICML 2009
49. 50
Transfer learning:
Use data from one domain to help learn on another
Lots of data:
Learn
neural net
Great
accuracy
Some data: Neural net as
feature extractor
+
Simple classifier
Great accuracy on
new problem
Old idea, explored for deep learning by Donahue et al. ’14
50. 51
What’s learned in a neural net
Neural net trained for Task 1
Very specific to Task 1More generic
Can be used as feature extractor
vs.
51. 52
Transfer learning in more detail…
Neural net trained for Task 1
Very specific to Task 1More generic
Can be used as feature extractor
Keep weights fixed!
For Task 2, learn only end part
Use simple classifier
e.g., logistic regression, SVMs
Class?
52. 53
Using ImageNet-trained network as extractor for
general features
• Using classic AlexNet architechture pioneered by Alex Krizhevsky
et. al in ImageNet Classification with Deep Convolutional Neural
Networks
• It turns out that a neural network trained on ~1 million images of
about 1000 classes makes a surprisingly general feature extractor
• First illustrated by Donahue et al in DeCAF: A Deep Convolutional
Activation Feature for Generic Visual Recognition
53
53. Transfer learning with deep features
Training set
Validation set
80%
20%
Learn
simple
model
Some
labeled data
Extract
features with
neural net
trained on
different task
Validate
Deploy in
production
62. Deep learning made easy with deep features
Deep learning: exciting ML development
Slow, lots of tuning,
needs lots of data
Can still achieve excellent performance
Deep features:
reuse deep models for new domains
Needs less data Faster training times Much simpler tuning
63. 70
Next…
Build a Deep Learning model with GLC
Creating an Image Similarity Service
Deploying with Predictive Services
64. 71
First things first, installation
http://bit.ly/dato_pydata
import graphlab
sf = graphlab.Sframe()
sa = graphlab.Sarray(range(100)).apply(lambda x: x)
Editor's Notes
Dress Similarity instead-then explanation(couple more slides) + jump to the notebook.
Distance – distance between the extracted features. Each set of extracted features for an image forms a vector.
Images whose deep visual features are similar have similar sets of extracted features.
We can measure quantitatively how similar two images are by measuring the Euclidean distance between these sets of features, represented as a vector.
Explain nearest neighbors.
- Each image has same # of deep features.- This creates a space, where each dress is a point.- More similar images are closer together, distance-wise, in that space.
What we’re going to talk about today:
Introduction to Deep Learning. What is a deep learning model? What are its uses.
Using GraphLab Create to build a deep learning model
Using the Mnist Dataset
Learn about network topology
How to evaluate and improve a neural network
Using transfer learning to build an Image Similarity Service, with the dresses data set shown in the Keynote Demo.
What makes images similar
Creating an application utilizing transfer Learning, nearest neighbors.
Deploying it as a predictive service
We want you to walk away from this session with a better intuition on using GraphLab Create to work with Deep Neural Networks, and have some great ideas on how to use it for your products and services.