SlideShare a Scribd company logo
1 of 36
Fizz buzz in tensorflow
Joel Grus
Research Engineer, AI2
@joelgrus
About me
Research engineer at AI2
we're hiring!
(in Seattle)
(where normal people can afford to buy a house)
(sort of)
Previously SWE at Google, data science at
VoloMetrix, Decide, Farecast/Microsoft
Wrote a book ------->
Fizz Buzz, in case you're not familiar
Write a program that prints the numbers 1 to 100, except that
if the number is divisible by 3, instead print "fizz"
if the number is divisible by 5, instead print "buzz"
if the number is divisible by 15, instead print "fizzbuzz"
weed-out problem
the backstory
Saw an online discussion about the stupidest way to solve fizz buzz
Thought, "I bet I can come up with a stupider way"
Came up with a stupider way
Blog post went viral
Sort of a frivolous thing to use up my 15 minutes of fame on, but so
be it
super simple solution
haskell
fizzBuzz :: Integer -> String
fizzBuzz i
| i `mod` 15 == 0 = "fizzbuzz"
| i `mod` 5 == 0 = "buzz"
| i `mod` 3 == 0 = "fizz"
| otherwise = show i
mapM_ (putStrLn . fizzBuzz) [1..100]
ok, then python
def fizz_buzz(i):
if i % 15 == 0: return "fizzbuzz"
elif i % 5 == 0: return "buzz"
elif i % 3 == 0: return "fizz"
else: return str(i)
for i in range(1, 101):
print(fizz_buzz(i))
taking on
fizz buzz as a
machine
learning problem
outputs
given a number, there are four mutually exclusive cases
1.output the number itself
2.output "fizz"
3.output "buzz"
4.output "fizzbuzz"
so one natural representation of the output is a vector of length 4
representing the predicted probability of each case
ground truth
def fizz_buzz_encode(i):
if i % 15 == 0: return np.array([0, 0, 0, 1])
elif i % 5 == 0: return np.array([0, 0, 1, 0])
elif i % 3 == 0: return np.array([0, 1, 0, 0])
else: return np.array([1, 0, 0, 0])
feature selection - Cheating
feature selection - cheating clever
def x(i):
return np.array([1, i % 3 == 0, i % 5 == 0])
def predict(x):
return np.dot(x, np.array([[ 1, 0, 0, -1],
[-1, 1, -1, 1],
[-1, -1, 1, 1]]))
for i in range(1, 101):
prediction = np.argmax(predict(x(i)))
print([i, "fizz", "buzz", "fizzbuzz"][prediction])
It's hard to
imagine an
interviewer
who wouldn't
be impressed
by even this
simple
solution.
feature selection - cheating clever
divisible by 3 not
divisible by 3
divisible by 5
not
divisible by 5
what if we aren't that clever?
binary encoding, say 10
digits (up to 1023)
1 -> [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
2 -> [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
3 -> [1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
and so on
in comments, someone
suggested one-hot decimal
encoding the digits, say up to
999
315 -> [0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
and so on
training data
need to generate fizz buzz for 1 to 100, so don't want to train
on those
binary: train on 101 - 1023
one-hot decimal digits: train on 101 - 999
then use 1 to 100 as the test data
tensorflow in one slide
import numpy as np
import tensorflow as tf
X = tf.placeholder("float", [None, input_dim])
Y = tf.placeholder("float", [None, output_dim])
beta = tf.Variable(tf.random_normal(beta_shape, stddev=0.01))
def model(X, beta):
# some function of X and beta
p_yx = model(X, beta)
cost = some_cost_function(p_yx, Y)
train_op = tf.train.SomeOptimizer.minimize(cost)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
for _ in range(num_epochs):
sess.run(train_op, feed_dict={X: trX, Y: trY})
the extent of what I know
about
standard
imports
placeholders for our
data
parameters to
learn
some parametric
model
applied to the symbolic
variables
train by minimizing some cost
function
create session and initialize
variables
train using
data
Visualizing the results (a hard problem by itself)
1 100correct "11"
incorrect "buzz"
actual "fizzbuzz"
correct "fizz"
black + red = predictions
black + tan = actuals
predicted "fizz"
actual "buzz"
[[30, 11, 6, 2],
[12, 8, 4, 1],
[ 4, 3, 2, 3],
[ 4, 2, 0, 0]]
linear regression
def model(X, w, b):
return tf.matmul(X, w) + b
py_x = model(data.X, w, b)
cost = tf.reduce_mean(tf.pow(py_x - data.Y, 2))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)
binary
decimal
[[54, 27, 14, 6],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]]
[[54, 27, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 14, 6],
[ 0, 0, 0, 0]]
black + red = predictions
black + tan = actuals
logistic regression
def model(X, w, b):
return tf.matmul(X, w) + b
py_x = model(data.X, w, b)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, data.Y))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)
binary
[[54, 27, 14, 6],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]]
[[54, 27, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 14, 6],
[ 0, 0, 0, 0]]
decimal
black + red = predictions
black + tan = actuals
multilayer perceptron
def model(X, w_h, w_o, b_h, b_o):
h = tf.nn.relu(tf.matmul(X, w_h) + b_h) # 1 hidden layer with ReLU
activation
return tf.matmul(h, w_o) + b_o
py_x = model(data.X, w_h, w_o, b_h, b_o)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, data.Y))
train_op = tf.train.RMSPropOptimizer(learning_rate=0.0003,
decay=0.8,
momentum=0.4).minimize(cost)
from here on, no more decimal encoding, it's
really good at "divisible by 5" and really bad at
by # of hidden units (after 1000's of epochs)
5
10
25
50
100
200
[[52, 2, 1, 0],
[ 0, 25, 0, 0],
[ 1, 0, 13, 0],
[ 0, 0, 0, 6]]
[[45, 16, 3, 0],
[ 8, 11, 1, 0],
[ 0, 0, 10, 0],
[ 0, 0, 0, 6]]
black + red = predictions
black + tan = actuals
deep learning
def model(X, w_h1, w_h2, w_o, b_h1, b_h2, b_o, keep_prob):
h1 = tf.nn.dropout(tf.nn.relu(tf.matmul(X, w_h1) + b_h1), keep_prob)
h2 = tf.nn.relu(tf.matmul(h1, w_h2) + b_h2)
return tf.matmul(h2, w_o) + b_o
def py_x(keep_prob):
return model(data.X, w_h1, w_h2, w_o, b_h1, b_h2, b_o, keep_prob)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x(keep_prob=0.5),
data.Y))
train_op = tf.train.RMSPropOptimizer(learning_rate=0.0003, decay=0.8,
momentum=0.4).minimize(cost)
predict_op = tf.argmax(py_x(keep_prob=1.0), 1)
HIDDEN LAYERS (50% dropout in 1st hidden layer)
[100, 100]
will sometimes get it 100% right, but not reliably
[2000, 2000]
seems to get it exactly right every time (in ~200 epochs)
black + red = predictions
black + tan = actuals
But how does it work?
25-hidden-neuron shallow net was simplest interesting model
in particular, it gets all the "divisible by 15" exactly right
not obvious to me how to learn "divisible by 15" from binary
[[45, 16, 3, 0],
[ 8, 11, 1, 0],
[ 0, 0, 10, 0],
[ 0, 0, 0, 6]]
black + red = predictions
black + tan = actuals
which inputs produce largest "fizz buzz" values?
(120, array([ -4.51552565, -11.66495565, -17.10086776, 0.32237191])),
(240, array([ -5.04136949, -12.02974626, -17.35017639, 0.07112655])),
(90, array([ -4.52364648, -11.48799399, -16.91179542, -0.20747044])),
(465, array([ -4.95231711, -11.88604214, -17.5155363 , -0.34996536])),
(210, array([ -5.04364677, -11.85627498, -17.17183826, -0.4049097 ])),
(720, array([ -4.98066528, -11.68684173, -17.01117473, -0.46671827])),
(345, array([ -4.49738021, -11.34621705, -16.88004503, -0.4713167 ])),
(600, array([ -4.48999048, -11.30909995, -16.70980522, -0.53889132])),
(360, array([ -9.32991992, -15.18924931, -17.8993147 , -4.35817601])),
(480, array([ -9.79430086, -15.72038142, -18.51560547, -4.38727747])),
(450, array([ -9.80194752, -15.54985676, -18.32664509, -4.89815184])),
(330, array([ -9.34660544, -15.01537882, -17.69651957, -4.95658813])),
(960, array([ -9.74109305, -15.37921101, -18.16552369, -4.95677615])),
(840, array([ -9.31266483, -14.83212949, -17.49181923, -5.26606825])),
(105, array([ -8.73320381, -11.08279653, -9.31921242, -5.52620068])),
(225, array([ -9.22702329, -11.50045288, -9.64725618, -5.76014854])),
(585, array([ -8.62907369, -10.84616688, -9.23592859, -5.79517941])),
(705, array([ -9.12030976, -11.2651869 , -9.56738927, -6.02974533])),
last column only needs
to be larger than the
other columns but in
this case it works out --
these are all divisible
by 15
notice that they cluster
into similar outputs
notice also that we
have pairs of numbers
that differ by 120
a stray observation
If two numbers differ by a multiple of 15, they have the same
fizz buzz output
If a network could ignore differences that are multiples of 15 (or
30, or 45, or so on), that could be a good start
Then only have to learn the correct output for each equivalence
class
Very few "fizz buzz" equivalence classes
two-bit SWAPS that are congruent mod 15
-8 +128 = +120
120 [0 0 0 1 1 1 1 0 0 0]
240 [0 0 0 0 1 1 1 1 0 0]
+2 -32 = -30 (from 120/240)
90 [0 1 0 1 1 0 1 0 0 0]
210 [0 1 0 0 1 0 1 1 0 0]
-32 +512 = +480 (from 120/240)
600 [0 0 0 1 1 0 1 0 0 1]
720 [0 0 0 0 1 0 1 1 0 1]
+1 -256 = -255 (from 600/720)
345 [1 0 0 1 1 0 1 0 1 0]
465 [1 0 0 0 1 0 1 1 1 0]
two-bit SWAPS that are congruent mod 15
-8 +128 = +120
120 [0 0 0 1 1 1 1 0 0 0]
240 [0 0 0 0 1 1 1 1 0 0]
+2 -32 = -30
90 [0 1 0 1 1 0 1 0 0 0]
210 [0 1 0 0 1 0 1 1 0 0]
-32 +512 = +480
600 [0 0 0 1 1 0 1 0 0 1]
720 [0 0 0 0 1 0 1 1 0 1]
+1 -256 = -255
345 [1 0 0 1 1 0 1 0 1 0]
465 [1 0 0 0 1 0 1 1 1 0]
-8 +128
360 [0 0 0 1 0 1 1 0 1 0]
480 [0 0 0 0 0 1 1 1 1 0]
330 [0 1 0 1 0 0 1 0 1 0]
450 [0 1 0 0 0 0 1 1 1 0]
840 [0 0 0 1 0 0 1 0 1 1]
960 [0 0 0 0 0 0 1 1 1 1]
two-bit SWAPS that are congruent mod 15
-8 +128 = +120
120 [0 0 0 1 1 1 1 0 0 0]
240 [0 0 0 0 1 1 1 1 0 0]
+2 -32 = -30
90 [0 1 0 1 1 0 1 0 0 0]
210 [0 1 0 0 1 0 1 1 0 0]
-32 +512 = +480
600 [0 0 0 1 1 0 1 0 0 1]
720 [0 0 0 0 1 0 1 1 0 1]
+1 -256 = -255
345 [1 0 0 1 1 0 1 0 1 0]
465 [1 0 0 0 1 0 1 1 1 0]
-8 +128
360 [0 0 0 1 0 1 1 0 1 0]
480 [0 0 0 0 0 1 1 1 1 0]
330 [0 1 0 1 0 0 1 0 1 0]
450 [0 1 0 0 0 0 1 1 1 0]
840 [0 0 0 1 0 0 1 0 1 1]
960 [0 0 0 0 0 0 1 1 1 1]
105 [1 0 0 1 0 1 1 0 0 0]
225 [1 0 0 0 0 1 1 1 0 0]
-32 +512
585 [1 0 0 1 0 0 1 0 0 1]
705 [1 0 0 0 0 0 1 1 0 1]
any neuron with the same weight on those
two inputs will produce the same outcome
if they're swapped
if you want to drive yourself mad, spend a
few hours staring at the neuron weights
themselves!
lessons learned
It's hard to turn a joke blog post into a talk
Feature selection is important (we already knew that)
Stupid problems sometimes contain really interesting
subtleties
Sometimes "black box" models actually reveal those
subtleties if you look at them the right way
sorry for
not being
just a joke
talk!
thanks!
code: github.com/joelgrus
blog: joelgrus.com
twitter: @joelgrus
(will tweet out link to slides, so go follow!)
book: --------------------------->
(might add a chapter about slides, so go
buy just in case!)

More Related Content

What's hot

The Ring programming language version 1.3 book - Part 42 of 88
The Ring programming language version 1.3 book - Part 42 of 88The Ring programming language version 1.3 book - Part 42 of 88
The Ring programming language version 1.3 book - Part 42 of 88Mahmoud Samir Fayed
 
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181Mahmoud Samir Fayed
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10Mark Chang
 
Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語ikdysfm
 
러닝머신 말고 머신러닝
러닝머신 말고 머신러닝러닝머신 말고 머신러닝
러닝머신 말고 머신러닝Ji Gwang Kim
 
Digital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manualDigital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manualRamesh Sundar
 
The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31Mahmoud Samir Fayed
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature SelectionJames Huang
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
関数プログラミングことはじめ revival
関数プログラミングことはじめ revival関数プログラミングことはじめ revival
関数プログラミングことはじめ revivalNaoki Kitora
 
The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210Mahmoud Samir Fayed
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileKushagraChadha1
 
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180Mahmoud Samir Fayed
 
The Ring programming language version 1.5.4 book - Part 51 of 185
The Ring programming language version 1.5.4 book - Part 51 of 185The Ring programming language version 1.5.4 book - Part 51 of 185
The Ring programming language version 1.5.4 book - Part 51 of 185Mahmoud Samir Fayed
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFTimur Safin
 

What's hot (19)

The Ring programming language version 1.3 book - Part 42 of 88
The Ring programming language version 1.3 book - Part 42 of 88The Ring programming language version 1.3 book - Part 42 of 88
The Ring programming language version 1.3 book - Part 42 of 88
 
The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181The Ring programming language version 1.5.2 book - Part 24 of 181
The Ring programming language version 1.5.2 book - Part 24 of 181
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10
 
Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語Haskellで学ぶ関数型言語
Haskellで学ぶ関数型言語
 
러닝머신 말고 머신러닝
러닝머신 말고 머신러닝러닝머신 말고 머신러닝
러닝머신 말고 머신러닝
 
Digital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manualDigital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manual
 
The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31The Ring programming language version 1.5 book - Part 9 of 31
The Ring programming language version 1.5 book - Part 9 of 31
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
関数プログラミングことはじめ revival
関数プログラミングことはじめ revival関数プログラミングことはじめ revival
関数プログラミングことはじめ revival
 
Corona sdk
Corona sdkCorona sdk
Corona sdk
 
R part II
R part IIR part II
R part II
 
The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210The Ring programming language version 1.9 book - Part 62 of 210
The Ring programming language version 1.9 book - Part 62 of 210
 
Algorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical FileAlgorithm Design and Analysis - Practical File
Algorithm Design and Analysis - Practical File
 
The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180The Ring programming language version 1.5.1 book - Part 23 of 180
The Ring programming language version 1.5.1 book - Part 23 of 180
 
Swift Study #2
Swift Study #2Swift Study #2
Swift Study #2
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
The Ring programming language version 1.5.4 book - Part 51 of 185
The Ring programming language version 1.5.4 book - Part 51 of 185The Ring programming language version 1.5.4 book - Part 51 of 185
The Ring programming language version 1.5.4 book - Part 51 of 185
 
Go vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoFGo vs C++ - CppRussia 2019 Piter BoF
Go vs C++ - CppRussia 2019 Piter BoF
 

Viewers also liked

Honest Buildings at BE2Talks
Honest Buildings at BE2TalksHonest Buildings at BE2Talks
Honest Buildings at BE2TalksBe2camp Admin
 
Top Real Estate Tech Startups
Top Real Estate Tech StartupsTop Real Estate Tech Startups
Top Real Estate Tech StartupsKevin Brunnock
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudWrangleConf
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)WrangleConf
 
Wrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangleConf
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangleConf
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance WrangleConf
 
Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangleConf
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...WrangleConf
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseWrangleConf
 
Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangleConf
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug DiscoveryWrangleConf
 
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...WrangleConf
 
Instacart_Presentation[1]
Instacart_Presentation[1]Instacart_Presentation[1]
Instacart_Presentation[1]Nishant Saboo
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Deliveroo - NOAH15 London
Deliveroo - NOAH15 London Deliveroo - NOAH15 London
Deliveroo - NOAH15 London NOAH Advisors
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016MLconf
 
The slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollarsThe slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollarsBuffer
 

Viewers also liked (18)

Honest Buildings at BE2Talks
Honest Buildings at BE2TalksHonest Buildings at BE2Talks
Honest Buildings at BE2Talks
 
Top Real Estate Tech Startups
Top Real Estate Tech StartupsTop Real Estate Tech Startups
Top Real Estate Tech Startups
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to Cloud
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)
 
Wrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at ScaleWrangle 2016: Malware Tracking at Scale
Wrangle 2016: Malware Tracking at Scale
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HR
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance
 
Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small Data
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product Sense
 
Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes Data
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug Discovery
 
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
Wrangle 2016: Seeing Behaviors as Humans Do: Uncovering Hidden Patterns in Ti...
 
Instacart_Presentation[1]
Instacart_Presentation[1]Instacart_Presentation[1]
Instacart_Presentation[1]
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Deliveroo - NOAH15 London
Deliveroo - NOAH15 London Deliveroo - NOAH15 London
Deliveroo - NOAH15 London
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
The slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollarsThe slide deck we used to raise half a million dollars
The slide deck we used to raise half a million dollars
 

Similar to Fizz buzz in tensorflow using neural networks

An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)Julien SIMON
 
Tensor flow description of ML Lab. document
Tensor flow description of ML Lab. documentTensor flow description of ML Lab. document
Tensor flow description of ML Lab. documentjeongok1
 
Intro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopIntro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopProttay Karim
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureAWS Germany
 
Down the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoRemco Wendt
 
The Ring programming language version 1.5.2 book - Part 52 of 181
The Ring programming language version 1.5.2 book - Part 52 of 181The Ring programming language version 1.5.2 book - Part 52 of 181
The Ring programming language version 1.5.2 book - Part 52 of 181Mahmoud Samir Fayed
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in pythonRemco Wendt
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185Mahmoud Samir Fayed
 
Raspberry Pi à la GroovyFX
Raspberry Pi à la GroovyFXRaspberry Pi à la GroovyFX
Raspberry Pi à la GroovyFXStephen Chin
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
The Ring programming language version 1.5.4 book - Part 46 of 185
The Ring programming language version 1.5.4 book - Part 46 of 185The Ring programming language version 1.5.4 book - Part 46 of 185
The Ring programming language version 1.5.4 book - Part 46 of 185Mahmoud Samir Fayed
 
circ.db.dbcircleserver(1).py#!usrlocalbinpython3im.docx
circ.db.dbcircleserver(1).py#!usrlocalbinpython3im.docxcirc.db.dbcircleserver(1).py#!usrlocalbinpython3im.docx
circ.db.dbcircleserver(1).py#!usrlocalbinpython3im.docxchristinemaritza
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 
The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184Mahmoud Samir Fayed
 

Similar to Fizz buzz in tensorflow using neural networks (20)

An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)
 
Tensor flow description of ML Lab. document
Tensor flow description of ML Lab. documentTensor flow description of ML Lab. document
Tensor flow description of ML Lab. document
 
Gans
GansGans
Gans
 
GANs
GANsGANs
GANs
 
Intro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopIntro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshop
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
 
Down the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in Django
 
The Ring programming language version 1.5.2 book - Part 52 of 181
The Ring programming language version 1.5.2 book - Part 52 of 181The Ring programming language version 1.5.2 book - Part 52 of 181
The Ring programming language version 1.5.2 book - Part 52 of 181
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in python
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185The Ring programming language version 1.5.4 book - Part 60 of 185
The Ring programming language version 1.5.4 book - Part 60 of 185
 
Raspberry Pi à la GroovyFX
Raspberry Pi à la GroovyFXRaspberry Pi à la GroovyFX
Raspberry Pi à la GroovyFX
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Graphical representation of Stack
Graphical representation of StackGraphical representation of Stack
Graphical representation of Stack
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
The Ring programming language version 1.5.4 book - Part 46 of 185
The Ring programming language version 1.5.4 book - Part 46 of 185The Ring programming language version 1.5.4 book - Part 46 of 185
The Ring programming language version 1.5.4 book - Part 46 of 185
 
circ.db.dbcircleserver(1).py#!usrlocalbinpython3im.docx
circ.db.dbcircleserver(1).py#!usrlocalbinpython3im.docxcirc.db.dbcircleserver(1).py#!usrlocalbinpython3im.docx
circ.db.dbcircleserver(1).py#!usrlocalbinpython3im.docx
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 
The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184The Ring programming language version 1.5.3 book - Part 62 of 184
The Ring programming language version 1.5.3 book - Part 62 of 184
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Fizz buzz in tensorflow using neural networks

  • 1. Fizz buzz in tensorflow Joel Grus Research Engineer, AI2 @joelgrus
  • 2.
  • 3. About me Research engineer at AI2 we're hiring! (in Seattle) (where normal people can afford to buy a house) (sort of) Previously SWE at Google, data science at VoloMetrix, Decide, Farecast/Microsoft Wrote a book ------->
  • 4. Fizz Buzz, in case you're not familiar Write a program that prints the numbers 1 to 100, except that if the number is divisible by 3, instead print "fizz" if the number is divisible by 5, instead print "buzz" if the number is divisible by 15, instead print "fizzbuzz"
  • 6.
  • 7. the backstory Saw an online discussion about the stupidest way to solve fizz buzz Thought, "I bet I can come up with a stupider way" Came up with a stupider way Blog post went viral Sort of a frivolous thing to use up my 15 minutes of fame on, but so be it
  • 8.
  • 9.
  • 10. super simple solution haskell fizzBuzz :: Integer -> String fizzBuzz i | i `mod` 15 == 0 = "fizzbuzz" | i `mod` 5 == 0 = "buzz" | i `mod` 3 == 0 = "fizz" | otherwise = show i mapM_ (putStrLn . fizzBuzz) [1..100] ok, then python def fizz_buzz(i): if i % 15 == 0: return "fizzbuzz" elif i % 5 == 0: return "buzz" elif i % 3 == 0: return "fizz" else: return str(i) for i in range(1, 101): print(fizz_buzz(i))
  • 11. taking on fizz buzz as a machine learning problem
  • 12. outputs given a number, there are four mutually exclusive cases 1.output the number itself 2.output "fizz" 3.output "buzz" 4.output "fizzbuzz" so one natural representation of the output is a vector of length 4 representing the predicted probability of each case
  • 13. ground truth def fizz_buzz_encode(i): if i % 15 == 0: return np.array([0, 0, 0, 1]) elif i % 5 == 0: return np.array([0, 0, 1, 0]) elif i % 3 == 0: return np.array([0, 1, 0, 0]) else: return np.array([1, 0, 0, 0])
  • 15.
  • 16. feature selection - cheating clever def x(i): return np.array([1, i % 3 == 0, i % 5 == 0]) def predict(x): return np.dot(x, np.array([[ 1, 0, 0, -1], [-1, 1, -1, 1], [-1, -1, 1, 1]])) for i in range(1, 101): prediction = np.argmax(predict(x(i))) print([i, "fizz", "buzz", "fizzbuzz"][prediction]) It's hard to imagine an interviewer who wouldn't be impressed by even this simple solution.
  • 17. feature selection - cheating clever divisible by 3 not divisible by 3 divisible by 5 not divisible by 5
  • 18. what if we aren't that clever? binary encoding, say 10 digits (up to 1023) 1 -> [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] 2 -> [0, 1, 0, 0, 0, 0, 0, 0, 0, 0] 3 -> [1, 1, 0, 0, 0, 0, 0, 0, 0, 0] and so on in comments, someone suggested one-hot decimal encoding the digits, say up to 999 315 -> [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0] and so on
  • 19. training data need to generate fizz buzz for 1 to 100, so don't want to train on those binary: train on 101 - 1023 one-hot decimal digits: train on 101 - 999 then use 1 to 100 as the test data
  • 20. tensorflow in one slide import numpy as np import tensorflow as tf X = tf.placeholder("float", [None, input_dim]) Y = tf.placeholder("float", [None, output_dim]) beta = tf.Variable(tf.random_normal(beta_shape, stddev=0.01)) def model(X, beta): # some function of X and beta p_yx = model(X, beta) cost = some_cost_function(p_yx, Y) train_op = tf.train.SomeOptimizer.minimize(cost) with tf.Session() as sess: sess.run(tf.initialize_all_variables()) for _ in range(num_epochs): sess.run(train_op, feed_dict={X: trX, Y: trY}) the extent of what I know about standard imports placeholders for our data parameters to learn some parametric model applied to the symbolic variables train by minimizing some cost function create session and initialize variables train using data
  • 21. Visualizing the results (a hard problem by itself) 1 100correct "11" incorrect "buzz" actual "fizzbuzz" correct "fizz" black + red = predictions black + tan = actuals predicted "fizz" actual "buzz" [[30, 11, 6, 2], [12, 8, 4, 1], [ 4, 3, 2, 3], [ 4, 2, 0, 0]]
  • 22. linear regression def model(X, w, b): return tf.matmul(X, w) + b py_x = model(data.X, w, b) cost = tf.reduce_mean(tf.pow(py_x - data.Y, 2)) train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost) binary decimal [[54, 27, 14, 6], [ 0, 0, 0, 0], [ 0, 0, 0, 0], [ 0, 0, 0, 0]] [[54, 27, 0, 0], [ 0, 0, 0, 0], [ 0, 0, 14, 6], [ 0, 0, 0, 0]] black + red = predictions black + tan = actuals
  • 23. logistic regression def model(X, w, b): return tf.matmul(X, w) + b py_x = model(data.X, w, b) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, data.Y)) train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost) binary [[54, 27, 14, 6], [ 0, 0, 0, 0], [ 0, 0, 0, 0], [ 0, 0, 0, 0]] [[54, 27, 0, 0], [ 0, 0, 0, 0], [ 0, 0, 14, 6], [ 0, 0, 0, 0]] decimal black + red = predictions black + tan = actuals
  • 24. multilayer perceptron def model(X, w_h, w_o, b_h, b_o): h = tf.nn.relu(tf.matmul(X, w_h) + b_h) # 1 hidden layer with ReLU activation return tf.matmul(h, w_o) + b_o py_x = model(data.X, w_h, w_o, b_h, b_o) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, data.Y)) train_op = tf.train.RMSPropOptimizer(learning_rate=0.0003, decay=0.8, momentum=0.4).minimize(cost) from here on, no more decimal encoding, it's really good at "divisible by 5" and really bad at
  • 25. by # of hidden units (after 1000's of epochs) 5 10 25 50 100 200 [[52, 2, 1, 0], [ 0, 25, 0, 0], [ 1, 0, 13, 0], [ 0, 0, 0, 6]] [[45, 16, 3, 0], [ 8, 11, 1, 0], [ 0, 0, 10, 0], [ 0, 0, 0, 6]] black + red = predictions black + tan = actuals
  • 26. deep learning def model(X, w_h1, w_h2, w_o, b_h1, b_h2, b_o, keep_prob): h1 = tf.nn.dropout(tf.nn.relu(tf.matmul(X, w_h1) + b_h1), keep_prob) h2 = tf.nn.relu(tf.matmul(h1, w_h2) + b_h2) return tf.matmul(h2, w_o) + b_o def py_x(keep_prob): return model(data.X, w_h1, w_h2, w_o, b_h1, b_h2, b_o, keep_prob) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x(keep_prob=0.5), data.Y)) train_op = tf.train.RMSPropOptimizer(learning_rate=0.0003, decay=0.8, momentum=0.4).minimize(cost) predict_op = tf.argmax(py_x(keep_prob=1.0), 1)
  • 27. HIDDEN LAYERS (50% dropout in 1st hidden layer) [100, 100] will sometimes get it 100% right, but not reliably [2000, 2000] seems to get it exactly right every time (in ~200 epochs) black + red = predictions black + tan = actuals
  • 28. But how does it work? 25-hidden-neuron shallow net was simplest interesting model in particular, it gets all the "divisible by 15" exactly right not obvious to me how to learn "divisible by 15" from binary [[45, 16, 3, 0], [ 8, 11, 1, 0], [ 0, 0, 10, 0], [ 0, 0, 0, 6]] black + red = predictions black + tan = actuals
  • 29. which inputs produce largest "fizz buzz" values? (120, array([ -4.51552565, -11.66495565, -17.10086776, 0.32237191])), (240, array([ -5.04136949, -12.02974626, -17.35017639, 0.07112655])), (90, array([ -4.52364648, -11.48799399, -16.91179542, -0.20747044])), (465, array([ -4.95231711, -11.88604214, -17.5155363 , -0.34996536])), (210, array([ -5.04364677, -11.85627498, -17.17183826, -0.4049097 ])), (720, array([ -4.98066528, -11.68684173, -17.01117473, -0.46671827])), (345, array([ -4.49738021, -11.34621705, -16.88004503, -0.4713167 ])), (600, array([ -4.48999048, -11.30909995, -16.70980522, -0.53889132])), (360, array([ -9.32991992, -15.18924931, -17.8993147 , -4.35817601])), (480, array([ -9.79430086, -15.72038142, -18.51560547, -4.38727747])), (450, array([ -9.80194752, -15.54985676, -18.32664509, -4.89815184])), (330, array([ -9.34660544, -15.01537882, -17.69651957, -4.95658813])), (960, array([ -9.74109305, -15.37921101, -18.16552369, -4.95677615])), (840, array([ -9.31266483, -14.83212949, -17.49181923, -5.26606825])), (105, array([ -8.73320381, -11.08279653, -9.31921242, -5.52620068])), (225, array([ -9.22702329, -11.50045288, -9.64725618, -5.76014854])), (585, array([ -8.62907369, -10.84616688, -9.23592859, -5.79517941])), (705, array([ -9.12030976, -11.2651869 , -9.56738927, -6.02974533])), last column only needs to be larger than the other columns but in this case it works out -- these are all divisible by 15 notice that they cluster into similar outputs notice also that we have pairs of numbers that differ by 120
  • 30. a stray observation If two numbers differ by a multiple of 15, they have the same fizz buzz output If a network could ignore differences that are multiples of 15 (or 30, or 45, or so on), that could be a good start Then only have to learn the correct output for each equivalence class Very few "fizz buzz" equivalence classes
  • 31. two-bit SWAPS that are congruent mod 15 -8 +128 = +120 120 [0 0 0 1 1 1 1 0 0 0] 240 [0 0 0 0 1 1 1 1 0 0] +2 -32 = -30 (from 120/240) 90 [0 1 0 1 1 0 1 0 0 0] 210 [0 1 0 0 1 0 1 1 0 0] -32 +512 = +480 (from 120/240) 600 [0 0 0 1 1 0 1 0 0 1] 720 [0 0 0 0 1 0 1 1 0 1] +1 -256 = -255 (from 600/720) 345 [1 0 0 1 1 0 1 0 1 0] 465 [1 0 0 0 1 0 1 1 1 0]
  • 32. two-bit SWAPS that are congruent mod 15 -8 +128 = +120 120 [0 0 0 1 1 1 1 0 0 0] 240 [0 0 0 0 1 1 1 1 0 0] +2 -32 = -30 90 [0 1 0 1 1 0 1 0 0 0] 210 [0 1 0 0 1 0 1 1 0 0] -32 +512 = +480 600 [0 0 0 1 1 0 1 0 0 1] 720 [0 0 0 0 1 0 1 1 0 1] +1 -256 = -255 345 [1 0 0 1 1 0 1 0 1 0] 465 [1 0 0 0 1 0 1 1 1 0] -8 +128 360 [0 0 0 1 0 1 1 0 1 0] 480 [0 0 0 0 0 1 1 1 1 0] 330 [0 1 0 1 0 0 1 0 1 0] 450 [0 1 0 0 0 0 1 1 1 0] 840 [0 0 0 1 0 0 1 0 1 1] 960 [0 0 0 0 0 0 1 1 1 1]
  • 33. two-bit SWAPS that are congruent mod 15 -8 +128 = +120 120 [0 0 0 1 1 1 1 0 0 0] 240 [0 0 0 0 1 1 1 1 0 0] +2 -32 = -30 90 [0 1 0 1 1 0 1 0 0 0] 210 [0 1 0 0 1 0 1 1 0 0] -32 +512 = +480 600 [0 0 0 1 1 0 1 0 0 1] 720 [0 0 0 0 1 0 1 1 0 1] +1 -256 = -255 345 [1 0 0 1 1 0 1 0 1 0] 465 [1 0 0 0 1 0 1 1 1 0] -8 +128 360 [0 0 0 1 0 1 1 0 1 0] 480 [0 0 0 0 0 1 1 1 1 0] 330 [0 1 0 1 0 0 1 0 1 0] 450 [0 1 0 0 0 0 1 1 1 0] 840 [0 0 0 1 0 0 1 0 1 1] 960 [0 0 0 0 0 0 1 1 1 1] 105 [1 0 0 1 0 1 1 0 0 0] 225 [1 0 0 0 0 1 1 1 0 0] -32 +512 585 [1 0 0 1 0 0 1 0 0 1] 705 [1 0 0 0 0 0 1 1 0 1] any neuron with the same weight on those two inputs will produce the same outcome if they're swapped if you want to drive yourself mad, spend a few hours staring at the neuron weights themselves!
  • 34. lessons learned It's hard to turn a joke blog post into a talk Feature selection is important (we already knew that) Stupid problems sometimes contain really interesting subtleties Sometimes "black box" models actually reveal those subtleties if you look at them the right way
  • 35. sorry for not being just a joke talk!
  • 36. thanks! code: github.com/joelgrus blog: joelgrus.com twitter: @joelgrus (will tweet out link to slides, so go follow!) book: ---------------------------> (might add a chapter about slides, so go buy just in case!)