Yinyin Liu presents at SD Robotics Meetup on November 8th, 2016. Deep learning has made great success in image understanding, speech, text recognition and natural language processing. Deep Learning also has tremendous potential to tackle the challenges in robotic vision, and sensorimotor learning in a robotic learning environment. In this talk, we will talk about how current and future deep learning technologies can be applied for robotic applications.
4. Nervana Systems Proprietary
4
• What is Deep Learning and What Can It Do Today?
• How DL helps Robotics?
• Deep Reinforcement Learning
• Finding the Right Frameworks For You
12. Nervana Systems Proprietary
12
A method for extracting features at multiple
levels of abstraction
• Features are discovered from data
• Performance improves with more data
• Network can express complex transformations
• High degree of representational power
16. Nervana Systems Proprietary
16
• What is Deep Learning and What Can It Do Today?
• How DL helps Robotics?
• Deep Reinforcement Learning
• Finding the Right Frameworks For You
21. Nervana Systems Proprietary
21
• But most of the consumer robots either do not move or move around on a base
• Home robots are still far from providing home service, e.g. cooking, cleaning, taking care of people.
• Robot movement is a difficult
• It is challenging for robot to know how to interact with objects, not to mention having the level of dexterity of human
22. Nervana Systems Proprietary
22
• Trying to tackle the problem of robotic grasping
• 14 Separate robots to collect data in parallel,
800k grasp attempts collected, over 7 months
• Each grasp consists of T time steps. At the end
of the T, grasp success is evaluated. Then T
samples of (image, current pose, success label)
data are collected
• No human labelling needed!
Levine et.al (2016)
https://research.googleblog.com/2016/03/deep-
learning-for-robots-learning-from.html
23. Nervana Systems Proprietary
23
Prediction network: CNN learn to predict the outcome of a grasp, given
• An image before grasp begins
• An image at current time
• A motor command - 3D translation vector
https://arxiv.org/pdf/1603.02199v4.pdf
24. Nervana Systems Proprietary
24
Servoing mechanism:
• User the predictor network
• Choose the motor commands from a pool of samples with the best score
Prediction network
score
score
score
score
25. Nervana Systems Proprietary
25
• End-to-end learning
what are objects vs. gripper
what is the right orientation to grasp
what is the right motor command
• Learn from repetitively trials
• A useful training paradigm is RL
26. Nervana Systems Proprietary
26
• What is Deep Learning and What Can It Do Today?
• How DL helps Robotics?
• Deep Reinforcement Learning
• Finding the Right Frameworks For You
27. Nervana Systems Proprietary
27
• RL – defines the goal, reward, training paradigm
• DL – gives the mechanics
• RL + DL = AI*
http://icml.cc/2016/tutorials/deep_rl_tutorial.pdf
* By David Silver
32. Nervana Systems Proprietary
32
• As one network approximating the Q value, and output layer represents values for each action, the
algorithm deals with discrete and small finite-set of actions only.
• Apply actor-critic architecture to continuous action space
• Add BatchNorm – help to generalize to different problems
• High-dimensional tasks simulated in MuJoCo.
• Race game simulated using Torcs.
Lillicrap et. al. (Deepmind, ICLR 2016)
https://arxiv.org/pdf/1509.02971v5.pdf
33. Nervana Systems Proprietary
33
• What is Deep Learning and What Can It Do Today?
• How DL helps Robotics?
• Deep Reinforcement Learning
• Finding the Right Frameworks For You
34. Nervana Systems Proprietary
34
To make progress on robotics:
• Need a lot of data to improve on executing tasks
• Need interaction with the environment
- costly for real world experiments
- need simulator for a variety of tasks
• Need benchmarks
- ImageNet drove a lot of progress for the vision problems in supervised learning
- lack of standardized environment, tasks, or metrics for RL publications and
comparison
Like to see a show of hands:
who has heard of some or most of these terminologies
who uses DL for their day-to-day work?
In this talk, we will cover several topics as follows. Start with ….
After each of the topics, I will be happy to pause the take any questions you have.
In the last four years, deep learning has made its way into the heart of almost every AI application.
And across all these domains, deep learning architectures have been very successful, blowing away the competition.
On the surface, it seems almost unreasonably effective across these domains.
First image: Tumor positive, negative
End-to-end object localization system.
Model on the left is able to learn to map each of the pixel into a category.
Model on the right is able to generate bounding boxes for the precise location of the detected objects.
Video recognition using 3D convolution.
End to end Speech Recognition using Deep Learning
Why is this cool?
- ASR systems around for a long time
- End to end means starting with raw spectrogram data, a deep neural net can produce text
- without any hand-engineering of features
- network learns features from data
Combine Deep network with RL, a network is able to learn to play games from scratch. Similar technology being developed further by DeepMind became the algorithm behind AlphaGo.
Things like HOG and SIFT features. Nearest neighbor clustering, L2 distance.
Able to replace the nonlinear classifier by a linear classifier, because the multi layer model is nonlinear.
Historical perspective
Input → designed features → output
Input → designed features → SVM → output
Input → learned features → SVM → output
Input → levels of learned features → output
One key features of DL is that the model tends to be a end-to-end system, in the sense that, for example, if we want to classify an image, we pass the NxN raw pixels directly to a deep neural network as input, and we provide the desired output labels, and if all goes well, the model will automatically find the right features from the data.
* conceptual shift in thinking, from: "how do you engineer the best features?" to
"how do you guide the model towards finding the best features?
but many old practices from machine learning still apply!
The end-to-end nature of deep learning allows it a wide range of applicability. One could substitue videos instead of images, for example – or speech, or text, .. The output could be object identity, hair style, or a set of optimal moves in a video games.
With DL: Features are learned from data rather than hand-engineered
- at multiple levels of representation
- L1 lines/bands - L3 intricate patterns (honeycomb)
- L5 faces
A method for extracting features at multiple levels of abstraction
Features are discovered from data
Performance improves with more data
Network can express complex transformations
High degree of representational power
Here are just some of the examples where the Nervana platform has been applied to real-world problems such as Detecting tumors in healthcare, counting plants in agricultural robotics, finding oil rich regions in seismic data, building better speech interfaces in cars, building a timeseries search engine for finance, and engineering better organisms through amino acid sequence analysis
In this talk, we will cover several topics as follows. Start with ….
After each of the topics, I will be happy to pause the take any questions you have.
The DL progress on computer vision obviously help robotic vision as well. For models that can do image segmentation and object localization, they definitely help a robot or an agent with scene understand and navigate around an environment.
Inevitably, the natural language processing is an instrumental piece as well. A module as part of the robot should able to understand human command and execute on the command
In this talk, we will cover several topics as follows. Start with ….
After each of the topics, I will be happy to pause the take any questions you have.
As one network approximating the Q value, and output layer represents values for each action, the algorithm deals with discrete and finite-set of actions only.
Code makes it look real
In this talk, we will cover several topics as follows. Start with ….
After each of the topics, I will be happy to pause the take any questions you have.
So where do the various frameworks match on to these attributes?
Training is the pain point, and we are 2-3x faster than any other framework out there. That is today; when are new chip comes out we are going to be 10-20x faster than the nearest competitor. Models that originally would take you 1 month to train will take <2 days to train. This is a gamechanger. Training speed = performance.. The more iterations you can fit the better tuned your model will be for performance.
In order to leverage this speed, you need resources for both training and inference. There is a reason the industry is moving to a cloud solution – it is much more cost and labor-efficient then trying to build an internal on-premise compute cluster, especially one that can scale with your computing and inference demands and return results with low latency.For example, image you wanted to deploy 100 trained models – you can buy 100 GPU boxes for each model to do inference, or you can use a DL cloud solution like us to flexibly handle the resources and scale with demand.
We also provide enterprise-level customer support, which is actually very important. You are paying us; if you run into issues getting your model to train or have a bug fix that is needed for your model, we are here to provide fast support and guidance. Or if you need a speed-up and there is a GPU kernel we can write that can increase your speed by 50%, we are here. Good luck submitting an issue to an open-source framework such as caffe or waiting for Google to get back to you. This will save customers a lot of time and headache in the long run, especially folks that are relatively new to deep learning.
don’t have to make a model from scratch
- many examples of pre-trained models
mention Yinyin-Fast-RCNN, Sathish C3D, babI