17. Gradient descent is a first-order
iterative optimization algorithm.
To find a local minimum of a
function using gradient descent,
one takes steps proportional to
the negative of the gradient (or of
the approximate gradient) of the
function at the current point.
18. Gradient descent is a first-order
iterative optimization algorithm.
To find a local minimum of a
function using gradient descent,
one takes steps proportional to
the negative of the gradient (or of
the approximate gradient) of the
function at the current point.
ಠ_ಠ
I wanted to learn more about machine learning.
The talks I watched on machine learning were either way too high level or way too low level for me
I see lots of charts that look like this and talk about things like Linear Regression, Gradient Descent, etc.
With definitions that look like this.
And this summarizes how I feel
I have two main goals with this talk
I like practicality, so I want to take a problem, solve it, and share the experience so others can do it as well
And I want to show something in the context of a real world use case. Really by the end of my talk, I hope some audience members go “Oh, that’s neat, looks easy, I’ll try it out myself”
I’m going to do my best to balance between practical and technical and as a result I might end up oversimplifying some stuff. To be honest this is meant to be a really high level overview. Hopefully, this will coincide with my goals to keep things practical.
Let’s start with a brief experiment
I want you all to tell me whether or not something is food.
Or more specifically, “Is it a photographer food” since some people are quick to point out fallacy of my question. Easy enough right? Let’s hope this will be a fruitful experiment
Is this food? Well of course it is, It’s a fruit and therefor food. But how do we truly know.
For many of us, the answer is simple, we’ve probably eaten an apple before.
But that doesn’t really get to the meat of our problem now does it
Is this food?
I speculate that not everyone here has specifically eaten Yakitori, but it’s fairly likely you’ve eaten something similar to it. Therefor we can probably easily identify this as food, and most likely we all instantaneously recognized this as food. Pretty sweet huh?
Is this food?
I think we’d all say yes, but this represents an interesting edge case. No food is actually visible in this picture and yet when I see this I immediately associate it with food. However,
When I see this, I’m just a little sad. But I also don’t think of this as food, clearly when I see this I think of garbage. And again this is an interesting edge case. Alright just a couple more examples.
Okay nothing really new here, I’m pretty sure we’ve all a hard shelled taco before, and therefore can clearly identify this as food, but what about
this.
I really hope no one said yes
This is many ways was core to our survival. Being able to see these patterns and take specific actions in response. It’s one of the key factors that has let us develop language, develop tools, let us escape predators, etc. But while we’re exceptional at it, we’re slow, and we hate menial tasks
So like many problems we want to automate this kind of problem. But automating is difficult because as humans we can recall information in an extraordinary fuzzy way and make similarities between past experiences to extract new information, but programming this kind of process is very difficult
Machine learning can be used as an approximation of this kind of behavior.
Clustering, for example if you have an image we can group colors based on their proximity to determine the dominant color in an image.
Classification, we provide buckets or categories for data to fall as well as examples to train with in order to classify some new data into a category.
Train as you go, provide some positive effect that you feed back into the system so it can improve over iterations
Using Machine learning is one means of solving this problem
The perceptron algorithm was invented in 1957 by Frank Rosenblatt. This machine was designed for image recognition. One of the problems with this style of image recognition was that it couldn’t learn an XOR function and so it could only learn linearly separable patterns.
This problem actually caused research and development of neural networks to stagnate quite a bit. However, let’s fast forward to 2015.
In 2015, we get the first public release of TensorFlow.
And thus ends a brief history of artificial neural networks
Developed by Google as a system capable or building and training Neural networks which are represented with something called a Data Flow Graph.
Which might look something like this. Each layer in this graph takes in a tensor and returns a tensor. Performing some operation on the tensor.
Looking at this image, one would assume you the answer is simply “You will never know”, but the simple answer is that a Tensor is just an n-dimensional array of values.
Which might look something like this. Each layer in this graph takes in a tensor and returns a tensor.
Which might look something like this. Each layer in this graph takes in a tensor and returns a tensor.
Which might look something like this. Each layer in this graph takes in a tensor and returns a tensor.
Now this particular Data Flow Graph represents a trained Neural Network which has been labeled Inception.
Specifically Inception V3
Inception is a pre built data flow graph useful for categorizing images
Specifically it was designed to categorize image from ImageNet. <<read description>>. You can also download whole sets of data from ImageNet which you could use for training purposes yourself.
Let’s return to our Problem, is it food. I mentioned at the start of my talk that I wanted to focus on a real world use case.
At Cookpad where I work, we have this exact problem. We want to be able to tell which photos are actually photos of cooked recipes.
We want to do this for a number of reasons. First we want to make sure the content we’re showing is what the user expects to see. Users love to break systems, and love to be malicious
We need to protect ourselves and our users from these kinds of abuses of the system.
So I want to recreate some of the functionality that system employs. When I started doing this I decided I was going to build a rails app that could do this, but I made a few mistakes along the way.
I started by trying to user tensorflow.rb, but I wasn’t able to get it to properly build on my machine running OSX. My guess is this is an issue with clang, but I didn’t have any luck trying to compile it with GCC either. One of the suggestions given in the README is to use Docker.
After setting up the docker image, I tried to compile a program that would let me retrain Inception V3 for my image set, but I couldn’t get that program to compile either. After many attempts I ran into my favorite “Docker” issue.
So I’d like to talk about “My” road to success with one small note
So let’s start with installation. Getting everything setup can actually be pretty simple.
Use python. <<speak about why python/ruby>>.
The next step was to take Inception and figure out how to retrain it.
In order to train imagenet, we want to do something called transfer learning. This is where we reuse most of what the neural network has already learned, and we just retrain the last layer.
In order to do this, we need a bunch of images for the cases in our problem. This was easy because I was able to just reuse the images that my company used, but if you need to collect your own images it could take a while. To be honest this is probably the hardest part.
Your data set is extremely important. You not only need good examples for things that match your desired categories, but you also need good examples of things that aren’t. If all you’ve ever seen is food, then everything looks like food.
Each folder has between 1000 to 2000 images (except text which has around 600). The images can be small since inception will end up resizing them to a maximum of 299x299px before running any operations on the image. If you have non-square images, you might want to do the resizing yourself in order to make sure the subject is fully in the picture.
I was never able to get the C++ script to work, but Python is here to rescue again. Now I don’t want to go through the retrain.py script line by line since it’s around 1000 lines, but in general, it downloads inception, pops off the last layer and retrains it with the images in our directory based on the folder structure.
A bottleneck is an “informal term” referring to the output of the previous layer. Since this process will want to refer to different images many times we want to cache these values. The output above is from running the retrain functionality with the bottlenecks already calculated so it was fast.
Tensorflow will split your data three ways - training, validation and testing. 70:20:10 - Training is the data you tune your model on, testing is the data your model never sees as a final set to test accuracy. - Validation is used to avoid overfitting - making sure improvements in training accuracy actually appear in an unseen dataset. - Cross entropy is your loss metric - it’s the metric the model is trying to minimize, rather than focusing directly on accuracy. That’s why it’s not a percentage.
AFTER: There is a label_image application written in C++, but I had no luck with that one
until with tf.Session()
And that’s the end, hopefully that was some food for thought for everyone