This document discusses analyzing video data with GraphLab Create. It introduces Dato's products for ingesting, transforming, modeling and deploying machine learning models on unstructured data like images, text, graphs and tabular data. It then outlines a demo of using computer vision and face recognition techniques to match actors' faces from movie frames to subtitles and screenplay text. Instructions are provided for installing GraphLab Create and links shared for additional resources.
4. Dato Confidential4
Business
must be intelligent
Machine learning
applications
• Recommenders
• Fraud detection
• Ad targeting
• Financial models
• Personalized medicine
• Churn prediction
• Smart UX
(video & text)
• Personal assistants
• IoT
• Socials networks
• Log analysis
Last decade:
Data management
Now:
Intelligent apps
?
Last 5 years:
Traditional analytics
8. Dato Confidential
Creating a model pipeline using Dato products
Ingest Transform Model Deploy
Unstructured Data
SFrame Engine
(FREE, open
source)
GraphLab Create
(Scalable Machine
Learning Python
Library,
4K/machine/year)
Predictive Services
(Serving + Load Balancing
+ AB Testing,
10K/machine/year)
10. Dato Confidential10
What will we cover today?
1. Match a movie’s screenplay with its subtitles.
- Now we know who says what and when.
2. Extract frames, then actors’ faces, from the movie.
- We’ll use opencv for video manipulation and face detection.
3. Train a face recognition model over the faces.
- What’s the smallest portion of the movie we can get good
results from?
10
11. Dato Confidential11
Python vs. Anaconda
• You can download Python for free from python.org .
- Python with its standard library.
• Or, you could download the Anaconda distribution.
- Python + tons of installed packages + package managers.
• It’s the same Python, but Anaconda includes both pip and
also with it’s own package manager, conda.
11
12. Dato Confidential12
pip vs. conda vs. virtualenv
pip – install Python packages.
conda – install Python packages + any OS packages required
for your package to work (libraries etc).
$ conda install -c menpo opencv3=3.1.0
virtualenv – separate environment (by manipulating the
$PYTHONPATH etc.) so packages won’t break.
You can have multiple Python versions on the same machine,
and use a Python version in different environments.
12
13. Dato Confidential13
Look Deeper!
1) Building a Face Recognition System with OpenCV in the blink of an Eye
• https://github.com/rragundez/PyData
• Live video from webcam, online analytics
2) Using mxnet for deep feature extraction
• https://github.com/dmlc/mxnet/blob/master/example/notebooks/predict-
with-pretrained-model.ipynb
• mxnet is now integrated into GraphLab!
3) mxnet-face
• https://github.com/tornadomeet/mxnet-face
Company began 7 years ago in Carnegie Mellon University as an open-source project.
Now a company with 50+ employees and a recently opened EMEA office here in Israel.
Customers
Yes, we are selling
(100+ paying customers, brand names)
Intelligent apps are predictive
From analytics (queries over known data) to predictive (discovering the unknown).
Supported data types
# end of corporate slides
GLC in a line
Steps in the model pipeline creation
From inspiration to production
The tools that we are making and what are they doing for this pipeline.
My goal today is that you’ll install it.