SlideShare a Scribd company logo
1 of 42
Download to read offline
Customer Behaviour Analytics:
Billions of Events to
one Customer-Product Graph
Strata London, 12th November 2013
Presented by Paul Lam
About Paul Lam
Joined uSwitch.com as first Data Scientist in 2010
• developed internal data products
• built distributed data architecture
• team of 3 with a developer and a statistician
Code contributor to various open source tools
• Cascalog, a big data processing library built on top of
Cascading (comparable to Apache Pig)
• Incanter, a statistical computing platform in Clojure
Author of Web Usage Mining: Data Mining Visitor Patterns
From Web Server Logs* to be published in late 2014
* tentative title
What is it
Customer-Product Graph
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:name “Lily”}

[:BUY {:timestamp 2013-11-01 13:00:00}]
bought
viewed
Question: Who bought an iPhone?
{:name “Bob”}
{:product “iPhone”}
{:name “Tom”}

{:name “Emily”}
{:product “American Express”}
{:name “Lily”}
bought
viewed
Query: Who bought an iPhone?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}
START

x=node:node_auto_index(product='iPhone')Express”}
{:product “American
{:name “Lily”}
MATCH (person)-[:BUY]->(x)

RETURN person
Question: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “American Express”}

{:name “Lily”}
bought
viewed
Query: What else did they buy?
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

START x=node:node_auto_index(product='iPhone')
{:name
MATCH “Emily”}

{:product “American Express”}

(x)<-[:BUY]-(person)-[:BUY]->(y)
{:name “Lily”}

RETURN y
Hypothesis: People that buy X has interest in Y
{:name “Bob”}

X
{:product “iPhone”}

{:name “Tom”}

{:name “Emily”}

Y
{:product “American Express”}

{:name “Lily”}
bought
viewed
Query: Who to recommend Y
{:name “Bob”}

X

{:product “iPhone”}
START
{:name “Tom”}
x=node:node_auto_index(product='iPhone'),
y=node:node_auto_index(product='American
Express')
{:name “Emily”}
Y Looked
{:product “American Express”}
MATCH (p)-[:BUY]->(x),
at AE

(p)-[:VIEW]->(y)
WHERE NOT (p)-[:BUY]->(y)
{:name “Lily”}

RETURN p

Haven’t
bought AE
Product Recommendation by Reasoning Example

Interactive demo at http://bit.ly/customer_graph

1. Start with an idea
2. Trace to connected nodes
3. Identify patterns from viewpoint of those nodes
4. Repeat from #1 until discovering actionable item
5. Apply pattern
Challenge: Event Data to Graph Data

User ID

Product ID Action

Bob

iPhone

Bought

Tom

iPhone

Bought

Emily

iPhone

Bought

Bob

AE

Bought

Emily

AE

Viewed

Lily

AE

Bought
Why should you care
Customer Journey

14
Purchase Funnel

Interest

Desire

Action
15
Understanding Customer Intents

Interest

Desire

Action

16
Customer Experience as a Graph

17
http://insideintercom.io/criticism-and-two-way-streets/
18
A Feedback System to Drive Profit

19
Minimise effort between Q & A

Question
Answer

Data Interrogation
One Approach: Make data querying easier

Query = Function(Data) [1]
~ Function(Data Structure)

[1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
Data Structure: Relations versus Relations
a Edges
ak
Sale ID

User ID

Product ID Profit

1

1

1

£100

2

1

2

£50

{:profit £100}
{:profit £50}

User ID

Name

Product ID Name

1

Bob

1

iPhone

2

Emily

2

A.E.
Using the right database for the right task

RDBMS

Graph DB

Data

Attributes

Entities and relations

Model

Record-based

Associative

Relation

By-product of
normalisation

First class citizen

Example use

Reporting

Reasoning
How does it work
User actions as time-stamped records

time

HDFS

Paul Ingles, “User as Data”, Euroclojure 2012
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
From Input to Output with Hadoop, aka ETL step
1. interface
3. output data

2. input data

4... The 80%
HDFS

Reshape

Graph Database
Hadoop interface to Neo4J
•
•
•
•

Cascading-Neo4j tap [1]
Faunus Hadoop binaries [2]
CSV files*
etc.
[1] http://github.com/pingles/cascading.neo4j
[2] http://thinkaurelius.github.io/faunus/

HDFS

Reshape

Graph Database
Input data stored on HDFS

time

1

2

3

4

User

1
2

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

4

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

3

Timestamp
Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Nodes and Edges CSVs to go into a property graph
Node ID

Properties

1

{:name “Paul”, email: “paul.lam@uswitch.com”}

2

{:domain “google.com”}

3

{:page “/homepage/”}

...

...

5

{:product “iPhone”}

From To

Type

Properties

1

2

:SOURCE

{:timestamp “2013-11-01 13:00”}

1

3

:VIEWED

{:timestamp “2013-11-01 13:00”}

...

...

...

...

1

5

:BOUGHT

{:timestamp “2013-11-01 13:05”}
Records to Graph in 3 Steps

^ impo
rtable C

SV

1. Design graph
2. Extract Nodes
3. Build Relations

31
Step 1: Designing your graph
{:name “Paul”
:email “paul.lam@uswitch.com}
:viewed {:timestamp ... }
{:page “homepage”}

:bought {:timestamp ... }
:viewed { ... }
{:product “iPhone”}

:viewed { ... }
{:page “blog”}

:source {:timestamp ... }

{:domain “google.com”}
32
Step 2: Extract list of entity nodes
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

Timestamp

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Step 3: Building node-to-node relations
User

Timestamp

Viewed Page

Referrer

Paul

2013-11-01 13:00

/homepage/

google.com

Paul

2013-11-01 13:01

/blog/

/homepage/

User

Timestamp

Viewed Product

Price

Referrer

Paul

2013-11-01 13:04

iPhone

£500

/blog/

User

Timestamp

Purchased

Paid

Attrib.

Paul

2013-11-01 13:05

iPhone

£500

google.com

User

Landed

Referral

Email

Paul

2013-11-01 13:00

google.com

paul.lam@uswitch.com
Do this across all customers and products
Use your data processing tool of choice:
Apache Hive
• Apache Pig
• Cascading
and more ...
• Scalding
• Cascalog
• Spark
• your favourite programming language
•

Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functions
• build entity nodes and meta nodes
• sink data into database with Cascading-Neo4j Tap
•

36
Cascalog code to build user nodes
145 lines of Cascalog code in production
• a couple hundred lines more of utility functions
• build entity nodes and meta nodes
• sink data into database with Cascading-Neo4j Tap
•

37
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direct and categoric relations
• sink data with Cascading-Neo4j Tap
•

38
Code to build user to product click relations
160 lines of Cascalog code in production
• + utility functions
• build direct and categoric relations
• sink data with Cascading-Neo4j Tap
•

39
Our User Event to Graph Data Pipeline

time

HDFS

Reshape

Graph Database
Summary

HDFS

Reshape

Graph
Contact
Paul Lam, data scientist at uSwitch.com
Email: paul@quantisan.com
Twitter: @Quantisan

More Related Content

Similar to Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

Driving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfDriving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConf
Hans Scharler
 

Similar to Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph (20)

Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Driving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConfDriving the Internet of Things with Mobile Apps at tiConf
Driving the Internet of Things with Mobile Apps at tiConf
 
User Story Mapping in Practice
User Story Mapping in PracticeUser Story Mapping in Practice
User Story Mapping in Practice
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Eposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things eraEposition - the Online Object ID system for the Internet of things era
Eposition - the Online Object ID system for the Internet of things era
 
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
Our Data, Ourselves: The Data Democracy Deficit (EMF CAmp 2014)
 
Logs & Visualizations at Twitter
Logs & Visualizations at TwitterLogs & Visualizations at Twitter
Logs & Visualizations at Twitter
 
Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery Build an App with JavaScript & jQuery
Build an App with JavaScript & jQuery
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
IOT Paris Seminar 2015 - Connected Objects makers, How to deal with Data?
 
Data Collection and Consumption
Data Collection and ConsumptionData Collection and Consumption
Data Collection and Consumption
 
Data Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! ResearchData Cloud - Yury Lifshits - Yahoo! Research
Data Cloud - Yury Lifshits - Yahoo! Research
 
yogesh CV
yogesh CVyogesh CV
yogesh CV
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Opportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to CollaborateOpportunities for IT and SLA Professionals to Collaborate
Opportunities for IT and SLA Professionals to Collaborate
 
Preparing for Release to the App Store
Preparing for Release to the App StorePreparing for Release to the App Store
Preparing for Release to the App Store
 
Building Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and SalesforceBuilding Social Enterprise with Ruby and Salesforce
Building Social Enterprise with Ruby and Salesforce
 
IT-Security@Contemporary Life
IT-Security@Contemporary LifeIT-Security@Contemporary Life
IT-Security@Contemporary Life
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologies
 

More from Paul Lam (9)

Mozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and TechnologyMozambique, Smallholder Farming, and Technology
Mozambique, Smallholder Farming, and Technology
 
When a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a barWhen a machine learning researcher and a software engineer walk into a bar
When a machine learning researcher and a software engineer walk into a bar
 
Evolution of Our Software Architecture
Evolution of Our Software ArchitectureEvolution of Our Software Architecture
Evolution of Our Software Architecture
 
A gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojureA gentle introduction to functional programming through music and clojure
A gentle introduction to functional programming through music and clojure
 
Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)
 
Clojure in US vs Europe
Clojure in US vs EuropeClojure in US vs Europe
Clojure in US vs Europe
 
2014 docker boston fig for developing microservices
2014 docker boston   fig for developing microservices2014 docker boston   fig for developing microservices
2014 docker boston fig for developing microservices
 
Composing re-useable ETL on Hadoop
Composing re-useable ETL on HadoopComposing re-useable ETL on Hadoop
Composing re-useable ETL on Hadoop
 
An agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log dataAn agile approach to knowledge discovery on web log data
An agile approach to knowledge discovery on web log data
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Customer Behaviour Analytics: Billions of Events to one Customer-Product Property Graph

  • 1. Customer Behaviour Analytics: Billions of Events to one Customer-Product Graph Strata London, 12th November 2013 Presented by Paul Lam
  • 2. About Paul Lam Joined uSwitch.com as first Data Scientist in 2010 • developed internal data products • built distributed data architecture • team of 3 with a developer and a statistician Code contributor to various open source tools • Cascalog, a big data processing library built on top of Cascading (comparable to Apache Pig) • Incanter, a statistical computing platform in Clojure Author of Web Usage Mining: Data Mining Visitor Patterns From Web Server Logs* to be published in late 2014 * tentative title
  • 4. Customer-Product Graph {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} [:BUY {:timestamp 2013-11-01 13:00:00}] bought viewed
  • 5. Question: Who bought an iPhone? {:name “Bob”} {:product “iPhone”} {:name “Tom”} {:name “Emily”} {:product “American Express”} {:name “Lily”} bought viewed
  • 6. Query: Who bought an iPhone? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} START x=node:node_auto_index(product='iPhone')Express”} {:product “American {:name “Lily”} MATCH (person)-[:BUY]->(x) RETURN person
  • 7. Question: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  • 8. Query: What else did they buy? {:name “Bob”} X {:product “iPhone”} {:name “Tom”} START x=node:node_auto_index(product='iPhone') {:name MATCH “Emily”} {:product “American Express”} (x)<-[:BUY]-(person)-[:BUY]->(y) {:name “Lily”} RETURN y
  • 9. Hypothesis: People that buy X has interest in Y {:name “Bob”} X {:product “iPhone”} {:name “Tom”} {:name “Emily”} Y {:product “American Express”} {:name “Lily”} bought viewed
  • 10. Query: Who to recommend Y {:name “Bob”} X {:product “iPhone”} START {:name “Tom”} x=node:node_auto_index(product='iPhone'), y=node:node_auto_index(product='American Express') {:name “Emily”} Y Looked {:product “American Express”} MATCH (p)-[:BUY]->(x), at AE (p)-[:VIEW]->(y) WHERE NOT (p)-[:BUY]->(y) {:name “Lily”} RETURN p Haven’t bought AE
  • 11. Product Recommendation by Reasoning Example Interactive demo at http://bit.ly/customer_graph 1. Start with an idea 2. Trace to connected nodes 3. Identify patterns from viewpoint of those nodes 4. Repeat from #1 until discovering actionable item 5. Apply pattern
  • 12. Challenge: Event Data to Graph Data User ID Product ID Action Bob iPhone Bought Tom iPhone Bought Emily iPhone Bought Bob AE Bought Emily AE Viewed Lily AE Bought
  • 19. A Feedback System to Drive Profit 19
  • 20. Minimise effort between Q & A Question Answer Data Interrogation
  • 21. One Approach: Make data querying easier Query = Function(Data) [1] ~ Function(Data Structure) [1] Figure 1.3 from Big Data (preview v11) by Nathan Marz and James Warren
  • 22. Data Structure: Relations versus Relations a Edges ak Sale ID User ID Product ID Profit 1 1 1 £100 2 1 2 £50 {:profit £100} {:profit £50} User ID Name Product ID Name 1 Bob 1 iPhone 2 Emily 2 A.E.
  • 23. Using the right database for the right task RDBMS Graph DB Data Attributes Entities and relations Model Record-based Associative Relation By-product of normalisation First class citizen Example use Reporting Reasoning
  • 24. How does it work
  • 25. User actions as time-stamped records time HDFS Paul Ingles, “User as Data”, Euroclojure 2012
  • 26. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  • 27. From Input to Output with Hadoop, aka ETL step 1. interface 3. output data 2. input data 4... The 80% HDFS Reshape Graph Database
  • 28. Hadoop interface to Neo4J • • • • Cascading-Neo4j tap [1] Faunus Hadoop binaries [2] CSV files* etc. [1] http://github.com/pingles/cascading.neo4j [2] http://thinkaurelius.github.io/faunus/ HDFS Reshape Graph Database
  • 29. Input data stored on HDFS time 1 2 3 4 User 1 2 Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User 4 Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User 3 Timestamp Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 30. Nodes and Edges CSVs to go into a property graph Node ID Properties 1 {:name “Paul”, email: “paul.lam@uswitch.com”} 2 {:domain “google.com”} 3 {:page “/homepage/”} ... ... 5 {:product “iPhone”} From To Type Properties 1 2 :SOURCE {:timestamp “2013-11-01 13:00”} 1 3 :VIEWED {:timestamp “2013-11-01 13:00”} ... ... ... ... 1 5 :BOUGHT {:timestamp “2013-11-01 13:05”}
  • 31. Records to Graph in 3 Steps ^ impo rtable C SV 1. Design graph 2. Extract Nodes 3. Build Relations 31
  • 32. Step 1: Designing your graph {:name “Paul” :email “paul.lam@uswitch.com} :viewed {:timestamp ... } {:page “homepage”} :bought {:timestamp ... } :viewed { ... } {:product “iPhone”} :viewed { ... } {:page “blog”} :source {:timestamp ... } {:domain “google.com”} 32
  • 33. Step 2: Extract list of entity nodes User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 34. Step 3: Building node-to-node relations User Timestamp Viewed Page Referrer Paul 2013-11-01 13:00 /homepage/ google.com Paul 2013-11-01 13:01 /blog/ /homepage/ User Timestamp Viewed Product Price Referrer Paul 2013-11-01 13:04 iPhone £500 /blog/ User Timestamp Purchased Paid Attrib. Paul 2013-11-01 13:05 iPhone £500 google.com User Landed Referral Email Paul 2013-11-01 13:00 google.com paul.lam@uswitch.com
  • 35. Do this across all customers and products Use your data processing tool of choice: Apache Hive • Apache Pig • Cascading and more ... • Scalding • Cascalog • Spark • your favourite programming language • Paco Nathan, “The Workflow Abstraction”, Strata SC, 2013.
  • 36. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 36
  • 37. Cascalog code to build user nodes 145 lines of Cascalog code in production • a couple hundred lines more of utility functions • build entity nodes and meta nodes • sink data into database with Cascading-Neo4j Tap • 37
  • 38. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 38
  • 39. Code to build user to product click relations 160 lines of Cascalog code in production • + utility functions • build direct and categoric relations • sink data with Cascading-Neo4j Tap • 39
  • 40. Our User Event to Graph Data Pipeline time HDFS Reshape Graph Database
  • 42. Contact Paul Lam, data scientist at uSwitch.com Email: paul@quantisan.com Twitter: @Quantisan