3. Who
Are
We?
• Jon
"Na@y"
Natkins
(@na@yice)
• Field
Engineer
at
WibiData
• Before
that,
Cloudera
SoJware
Engineer
• Before
that,
VerMca
SoJware/Field
Engineer
• Juliet
Hougland
(@JulietHougland)
• PlaPorm
Engineer
at
WibiData
• MS
in
Applied
Math
and
BA
in
Math-‐Physics
4. What
is
Kiji?
The
Kiji
Project
is
a
modular,
open-‐source
framework
that
enables
developers
and
analysts
to
collect,
analyze
and
use
data
in
real-‐Mme
applicaMons.
• kiji.org
• github.com/kijiproject
7. Modeling
with
KijiMR
Producers
• Operates
on
a
single
row
in
a
table.
• Generate
derived
data:
o Apply
a
classifier
o Assign
a
user
to
a
cluster
or
segment
o Recommend
new
items
Gatherers
• Mapper
with
KijiTable
input.
• Used
when
training
models.
14. Fresheners
Compute
Lazily
Freshness
Policy
Read
a
column
Get
from
HBase
Fresh?
Yes,
return
to
client
KijiScoring
API
HBase
15. Fresheners
Compute
Lazily
Freshness
Policy
Read
a
column
Get
from
HBase
Fresh?
Yes,
return
to
client
KijiScoring
API
HBase
Producer
Freshen
Cache
for
next
Mme
16. How
can
we
make
"freshenable"
models?
Population interests
change slowly
Individual interests
change quickly
17. How
can
we
make
"freshenable"
models?
Population interests
change slowly
Individual interests
change quickly
Models
don't
need
to
retrained
frequently
ApplicaMon
of
a
model
should
be
fast
18. How
can
we
make
"freshenable"
models?
Individual interests
change quickly
ApplicaMon
of
a
model
should
be
fast
• Train
a
model
over
your
enMre
data
set
• Save
fi@ed
model
parameters
to
a
file,
or
another
table
• Access
the
model
parameters
through
a
KeyValueStore
when
scoring
new
data
with
a
producer.
19. More
Modeling
with
KijiMR
KeyValueStores
• Allows
access
to
external
data
in
Producers
and
Gatherers.
• Supports
various
file
formats
as
well
as
tables.
• Makes
joining
dataset
together
very
easy.
• The
mechanism
for
accessing
fi@ed
model
parameters
when
freshening.
20. • A real-time product recommendation system
• Content-based model using product
descriptions and TF-IDF
KijiShopping
UsersKijiShopping
Web Application
KijiSchema
Avro, HBase
KijiMR
MapReduce
KijiScoring
21. KijiShopping
Data
Collec<on
• User Logins
• Product Information
o Names, descriptions, SKU information
• User Ratings
o Explicit ratings from users
How do we go from data to recommendations?
23. TF-‐IDF
• Term Frequency
o How often does this term appear in this document?
• Document Frequency
o How many documents does this term appear in?
• TF-IDF
o How important is this term to this document?
• In KijiShopping, each is a separate job
24. • Written as a Producer
o Executed on the Product table as a Map-only job
o WordCount on a per-record basis
Compu<ng
Term
Frequency
HBase
Read Product
Description
Count Words
in Product
Description
Write Word Counts
Back
25. • Written as a Gatherer
o Executed on the Product table as a MapReduce job
o Groups by words
Compu<ng
Document
Frequency
HBase
Read Term
Frequencies Map
Emit
(Word, 1)
Write Document
Frequencies
HDFS
Reduce
Group By
Word
26. • Written as a Producer
o Executed on the Product table as a Map-only job
o Pulls in Document Frequencies as a KVStore
Compu<ng
TF-‐IDF
HBase
Read Term
Frequencies
Divide
TF by DF
Write TF-IDFs
Back
HDFS
Read
Document
Frequencies
via KVStore
27. • Batch training process
• Associations stored in a model table
Associa<ng
Words
with
Products
gourmet
knife
"gourmet"
Products
"knife"
Products
tfidfgourmet
tfidfknife
28. Determine
a
User's
Preferred
Words
• Stored in a user table
Natty
gourmet
knife
wgourmet
wknife
29. • Producers incorporate models using
KeyValueStores
Combining
User
Ra<ngs
and
Models
Natty
gourmet
knife
"gourmet"
Products
"knife"
Products
wgourmet
wknife
tfidfgourmet
tfidfknife
34. Want
to
know
more?
• The Kiji Project
o kiji.org
o github.com/kijiproject
• KijiShopping
o github.com/wibidata/kiji-shopping
Questions about this presentation?
o juliet@wibidata.com
o natty@wibidata.com
35. Want
to
know
more?
• Come see us at
the WibiData
booth
• Join us at KijiCon
tomorrow