SlideShare a Scribd company logo
1 of 38
Download to read offline
Wanderu: Lessons
Learned
Lessons Learned and Unlearned from Building a Travel
Site with Graphs and Neo4j
Eddy Wong
CTO, Wanderu.com
@eddywongch
About Wanderu.com
Search Engine for (Intercity) Buses and Trains
Demo
From pt A to pt B
A: Boston B: DC
NYC
Nomenclature: Stations,Trips
Amtrak, $101, 09/26/2013
Bolt, $25, 09/26/2013 Mega, $24, 09/26/2013
From pt A to pt B
B: Brooklyn, NY
A: Cambridge, MA 31st & 9th Ave, NYC
South Station, Boston
28st & 7th Ave, NYC
34st & 8th Ave, NYC
Our Story
• Tech Started about 1+ yr ago
• Beta in Mar, Launch in Aug
• Knew nothing about Neo4j when we
started (Jun 2012)
• Did not like the relational model: wanted
schema-less and no self-joins
• Wanted a graph model
Relational vs. Graph
Lessons
Learned
UnLearned
Idea
•Architectural
•Modeling
•Geo
Architectural
Lessons
Art: MC Escher
Our Story
• Started with MongoDB as a general store:
easy to manipulate and organize data
• Wanted a db that could preserve the
Graph Model
• Debated: Document vs. Graph
• Could not find one single db that could do
both: general store + graph
Workflow
Store
Scraping JSON
Bus Websites Non-uniform
Data
Uniform
Data
Server
noSQL
• You need to make a choice of one noSQL
database
• You need ONE (centralized) database
• The word “database” is a loaded term
• Lots of (very diff) noSQL dbs options
Our Situation
• Data is written only in one direction
• Users search for paths, then segments
• Searches are done by date
• Needed online capability
• Trip info (price/avail) could change on some
Our Solution
• Use Both: MongoDB + Neo4j
• “Docugraph” = Document + Graph
• Syncing two kinds of databases
• Eventual consistency
Pipeline
Scraping JSON
Bus Websites Non-uniform
Data
Uniform
Data
MongoDBNeo4j
Mongo
Conn
Nodes & Edges
Replica
Mechanism
MongoConnector
• MongoDB Lab project, open source, unsupported
• Uses Replica Mechanism: Oplog
• Eventually Consistent (not real time)
• Written in Python
• Main methods: Upserts and Deletes, passes doc
• Implement DocMgr->Neo4jDocMgr->py2neo
• Other impls: MongoDocMgr, SolrDocMgr,
ESDocMgr
Populating Neo4j (2)
• Created our own way of creating Edges
• Auto Node creation when Edge is created:
Could add Stations (nodes) on the fly
• py2neo requires 2 “node ref”s to create an
edge, ie. might need two round trips to
Neo4j
Edge Creator P-code
hashtable allStations = load_stations
w_create_edge (station_id a, station_id b, otherdata)
look_up a in allStations
If found -> ref_a = allStations.get(a)
If not found ->
ref_a = py2neo.create_node(a)
Add a to allStations
...
py2neo.create_edge(ref_a, ref_b, ...)
Pipeline
Scraping JSON
Bus Websites Non-uniform
Data
MongoDB
Neo4j
Mongo
ConnNodes & Edges
Replica
Mechanism
REST
Server
BOS, NYC
BOS, PHL
NYC, DC
NYC, PHL
Modeling
Lessons
Art: MC Escher
Our Story
• We tried to “dump” all data into Neo4j
• Stations -> Nodes,Trips -> Edges
• Problem: Edges had dates -> too many
Edges -> “Super Node”
• Query perf was terrible (1+ mins) and
worse as # edges increased
Our Story (2)
• Went from Cypher to Gremlin, thinking
that would have improve performance
• Needed range queries on Edges
Our Solution
• Don’t store everything in the Neo4j, only
metadata
• Use Neo4j as an index
• Don’t store entities in Nodes, only keys
• Don’t store heavy properties in Edges
Neo4j Model
source:Tobias Lindaaker, Wes Freeman
Neo4j RuntimeModel
• Relationships are in a linked list
• Properties are in a linked list
• Therefore:There is NO random access for
Relationships or Properties
• A range query of relationships required a
full scan
Our Solution (2)
• Needed ability to do range queries on
Edges
• Serve paths from Neo4j, segments from
MongoDB
• The one thing we tried to avoid we ended
up doing: Joins
• Came up with “Docugraph” approach
Docugraph
• MongoDB Collections for Nodes and Edges
• Neo4j: Only keys for nodes
• Neo4j: Only Properties relevant for queries
Nodes & Edges
• Collection for Stations (nodes)
{id: “BOS”, name: “Boston South
Station”, address: “Summer
St”, ...}
• Collection for Trips (edges)
{depart_id: “BOS”, arrive_id:
“NYC”, carrier: “Megabus”, price:
24.0, ...}
Modeling
• Storing info in two or more dbs
• Doing a “join” across multiple dbs
Joins across DBs
MongoDB: Stations Neo4j: Nodes
BOS BOS
NYC NYC
DC DC
... ...
MongoDB: Trips Neo4j: Edges
BOS-NYC BOS-NYC
BOS-DC BOS-DC
NYC-DC NYC-DC
... ...
• Forget seq id
generated by dbs
• Use a human-created
long string for id
• Convert pair into id:
depart-arrive
• For example: BOS-
NYC
Indexing Technique
• Index Trips by {origin-dest, datetime}
Querying
• REST API in node.js
• Assemble results from two sources
• Paths from Neo4j
• Segments from MongoDB
• Sort by price, duration
Geo Lessons
Art: MC Escher
Our Story
• Wanted to mix public transport data with
intercity data
• Did not want to host all public transport
data
• Created a hybrid solution
Our Solution
• Hybrid:
• Google
Autocomplete
• Google Maps
• In house station geo
lookup
Geo
• Neo4j geo func was not out of the box
• Requires jar install
• Run a Java program to index
• Needed better doc
• Ended up using MongoDB geo instead
• Make geo func out of the box
Conclusions
• Even with a join across dbs -> solution
better than relational
• 10s paths x 100s segments vs. 500k x 500k
• Glad to have picked Neo4j: doing content
gen and more geo features now
• Graph model will be useful for future
analytics->Big Data
Useful Links
• Neo4j Internals
slideshare.net/thobe/an-overview-of-neo4j-internals
• Aseem’s Lessons Learned with Neo4j
http://aseemk.com/talks/neo4j-lessons-learned#/14
• Wes Freeman, Neo4j Internals
http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf
• MongoConnector
blog.mongodb.org/post/29127828146/introducing-mongo-connector

More Related Content

Viewers also liked

GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...Neo4j
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesNeo4j
 
Graph cafe-lightning
Graph cafe-lightningGraph cafe-lightning
Graph cafe-lightningVolker Pacher
 
Management des issues Github avec Neo4j et NLP
Management des issues Github avec Neo4j et NLPManagement des issues Github avec Neo4j et NLP
Management des issues Github avec Neo4j et NLPChristophe Willemsen
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Sebastian Verheughe
 
Sustainability in Household - Global Product Innovation and Consumer Insights...
Sustainability in Household - Global Product Innovation and Consumer Insights...Sustainability in Household - Global Product Innovation and Consumer Insights...
Sustainability in Household - Global Product Innovation and Consumer Insights...Revista H&C
 
Why would I store my data in more than one database?
Why would I store my data in more than one database?Why would I store my data in more than one database?
Why would I store my data in more than one database?Kurtosys Systems
 
Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...
Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...
Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...Neo4j
 
How we use neo4j for finding public transport routes
How we use neo4j for finding public transport routesHow we use neo4j for finding public transport routes
How we use neo4j for finding public transport routesEvgenii Kozhanov
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practiceEugene Fidelin
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
GraphDay Stockholm - Fraud Prevention
GraphDay Stockholm - Fraud PreventionGraphDay Stockholm - Fraud Prevention
GraphDay Stockholm - Fraud PreventionNeo4j
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Graph Databases, a little connected tour (Codemotion Rome)
Graph Databases, a little connected tour (Codemotion Rome)Graph Databases, a little connected tour (Codemotion Rome)
Graph Databases, a little connected tour (Codemotion Rome)fcofdezc
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentationjexp
 
Microservices + Oracle: A Bright Future
Microservices + Oracle: A Bright FutureMicroservices + Oracle: A Bright Future
Microservices + Oracle: A Bright FutureKelly Goetsch
 
GraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4jGraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4jNeo4j
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingNeo4j
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 

Viewers also liked (20)

GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
 
Graph cafe-lightning
Graph cafe-lightningGraph cafe-lightning
Graph cafe-lightning
 
Management des issues Github avec Neo4j et NLP
Management des issues Github avec Neo4j et NLPManagement des issues Github avec Neo4j et NLP
Management des issues Github avec Neo4j et NLP
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
Sustainability in Household - Global Product Innovation and Consumer Insights...
Sustainability in Household - Global Product Innovation and Consumer Insights...Sustainability in Household - Global Product Innovation and Consumer Insights...
Sustainability in Household - Global Product Innovation and Consumer Insights...
 
Why would I store my data in more than one database?
Why would I store my data in more than one database?Why would I store my data in more than one database?
Why would I store my data in more than one database?
 
Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...
Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...
Route Finding in Time Dependent Graphs - Nima Montazeri and Ben Earlam @ Grap...
 
How we use neo4j for finding public transport routes
How we use neo4j for finding public transport routesHow we use neo4j for finding public transport routes
How we use neo4j for finding public transport routes
 
How NOSQL Paid off for Telenor
How NOSQL Paid off for TelenorHow NOSQL Paid off for Telenor
How NOSQL Paid off for Telenor
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
GraphDay Stockholm - Fraud Prevention
GraphDay Stockholm - Fraud PreventionGraphDay Stockholm - Fraud Prevention
GraphDay Stockholm - Fraud Prevention
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Graph Databases, a little connected tour (Codemotion Rome)
Graph Databases, a little connected tour (Codemotion Rome)Graph Databases, a little connected tour (Codemotion Rome)
Graph Databases, a little connected tour (Codemotion Rome)
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Microservices + Oracle: A Bright Future
Microservices + Oracle: A Bright FutureMicroservices + Oracle: A Bright Future
Microservices + Oracle: A Bright Future
 
GraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4jGraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4j
 
Working With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and ModelingWorking With a Real-World Dataset in Neo4j: Import and Modeling
Working With a Real-World Dataset in Neo4j: Import and Modeling
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 

Similar to Wanderu - Lessons from Building a Travel Site with Neo4j

OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyOSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyNETWAYS
 
Mongodb intro
Mongodb introMongodb intro
Mongodb introchristkv
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)MongoDB
 
S01 e01 schema-design
S01 e01 schema-designS01 e01 schema-design
S01 e01 schema-designMongoDB
 
Node.js, From Simple to Complex
Node.js, From Simple to ComplexNode.js, From Simple to Complex
Node.js, From Simple to ComplexAlexandra Anghel
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Schema design
Schema designSchema design
Schema designchristkv
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN AppMongoDB
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkMongoDB
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit DataWork-Bench
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012hungarianhc
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBRackspace
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewAntonio Pintus
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-databaseMongoDB
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...MongoDB
 
Migrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons LearnedMigrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons LearnedNick Manning
 

Similar to Wanderu - Lessons from Building a Travel Site with Neo4j (20)

OSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross LawleyOSDC 2012 | Building a first application on MongoDB by Ross Lawley
OSDC 2012 | Building a first application on MongoDB by Ross Lawley
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 MinutesMongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
MongoDB World 2018: Tutorial - MongoDB & NodeJS: Zero to Hero in 80 Minutes
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
 
S01 e01 schema-design
S01 e01 schema-designS01 e01 schema-design
S01 e01 schema-design
 
Node.js, From Simple to Complex
Node.js, From Simple to ComplexNode.js, From Simple to Complex
Node.js, From Simple to Complex
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Schema design
Schema designSchema design
Schema design
 
Building your First MEAN App
Building your First MEAN AppBuilding your First MEAN App
Building your First MEAN App
 
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation FrameworkConceptos básicos. Seminario web 5: Introducción a Aggregation Framework
Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework
 
Analyzing NYC Transit Data
Analyzing NYC Transit DataAnalyzing NYC Transit Data
Analyzing NYC Transit Data
 
Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012Schema Design by Example ~ MongoSF 2012
Schema Design by Example ~ MongoSF 2012
 
Pre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDBPre-Aggregated Analytics And Social Feeds Using MongoDB
Pre-Aggregated Analytics And Social Feeds Using MongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
MongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overviewMongoDB: a gentle, friendly overview
MongoDB: a gentle, friendly overview
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
Marc s01 e02-crud-database
Marc s01 e02-crud-databaseMarc s01 e02-crud-database
Marc s01 e02-crud-database
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
 
Migrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons LearnedMigrating from MongoDB to Neo4j - Lessons Learned
Migrating from MongoDB to Neo4j - Lessons Learned
 

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Wanderu - Lessons from Building a Travel Site with Neo4j

  • 1. Wanderu: Lessons Learned Lessons Learned and Unlearned from Building a Travel Site with Graphs and Neo4j Eddy Wong CTO, Wanderu.com @eddywongch
  • 2. About Wanderu.com Search Engine for (Intercity) Buses and Trains
  • 4. From pt A to pt B A: Boston B: DC NYC Nomenclature: Stations,Trips Amtrak, $101, 09/26/2013 Bolt, $25, 09/26/2013 Mega, $24, 09/26/2013
  • 5. From pt A to pt B B: Brooklyn, NY A: Cambridge, MA 31st & 9th Ave, NYC South Station, Boston 28st & 7th Ave, NYC 34st & 8th Ave, NYC
  • 6. Our Story • Tech Started about 1+ yr ago • Beta in Mar, Launch in Aug • Knew nothing about Neo4j when we started (Jun 2012) • Did not like the relational model: wanted schema-less and no self-joins • Wanted a graph model
  • 10. Our Story • Started with MongoDB as a general store: easy to manipulate and organize data • Wanted a db that could preserve the Graph Model • Debated: Document vs. Graph • Could not find one single db that could do both: general store + graph
  • 11. Workflow Store Scraping JSON Bus Websites Non-uniform Data Uniform Data Server
  • 12. noSQL • You need to make a choice of one noSQL database • You need ONE (centralized) database • The word “database” is a loaded term • Lots of (very diff) noSQL dbs options
  • 13. Our Situation • Data is written only in one direction • Users search for paths, then segments • Searches are done by date • Needed online capability • Trip info (price/avail) could change on some
  • 14. Our Solution • Use Both: MongoDB + Neo4j • “Docugraph” = Document + Graph • Syncing two kinds of databases • Eventual consistency
  • 15. Pipeline Scraping JSON Bus Websites Non-uniform Data Uniform Data MongoDBNeo4j Mongo Conn Nodes & Edges Replica Mechanism
  • 16. MongoConnector • MongoDB Lab project, open source, unsupported • Uses Replica Mechanism: Oplog • Eventually Consistent (not real time) • Written in Python • Main methods: Upserts and Deletes, passes doc • Implement DocMgr->Neo4jDocMgr->py2neo • Other impls: MongoDocMgr, SolrDocMgr, ESDocMgr
  • 17. Populating Neo4j (2) • Created our own way of creating Edges • Auto Node creation when Edge is created: Could add Stations (nodes) on the fly • py2neo requires 2 “node ref”s to create an edge, ie. might need two round trips to Neo4j
  • 18. Edge Creator P-code hashtable allStations = load_stations w_create_edge (station_id a, station_id b, otherdata) look_up a in allStations If found -> ref_a = allStations.get(a) If not found -> ref_a = py2neo.create_node(a) Add a to allStations ... py2neo.create_edge(ref_a, ref_b, ...)
  • 19. Pipeline Scraping JSON Bus Websites Non-uniform Data MongoDB Neo4j Mongo ConnNodes & Edges Replica Mechanism REST Server BOS, NYC BOS, PHL NYC, DC NYC, PHL
  • 21. Our Story • We tried to “dump” all data into Neo4j • Stations -> Nodes,Trips -> Edges • Problem: Edges had dates -> too many Edges -> “Super Node” • Query perf was terrible (1+ mins) and worse as # edges increased
  • 22. Our Story (2) • Went from Cypher to Gremlin, thinking that would have improve performance • Needed range queries on Edges
  • 23. Our Solution • Don’t store everything in the Neo4j, only metadata • Use Neo4j as an index • Don’t store entities in Nodes, only keys • Don’t store heavy properties in Edges
  • 25. Neo4j RuntimeModel • Relationships are in a linked list • Properties are in a linked list • Therefore:There is NO random access for Relationships or Properties • A range query of relationships required a full scan
  • 26. Our Solution (2) • Needed ability to do range queries on Edges • Serve paths from Neo4j, segments from MongoDB • The one thing we tried to avoid we ended up doing: Joins • Came up with “Docugraph” approach
  • 27. Docugraph • MongoDB Collections for Nodes and Edges • Neo4j: Only keys for nodes • Neo4j: Only Properties relevant for queries
  • 28. Nodes & Edges • Collection for Stations (nodes) {id: “BOS”, name: “Boston South Station”, address: “Summer St”, ...} • Collection for Trips (edges) {depart_id: “BOS”, arrive_id: “NYC”, carrier: “Megabus”, price: 24.0, ...}
  • 29. Modeling • Storing info in two or more dbs • Doing a “join” across multiple dbs
  • 30. Joins across DBs MongoDB: Stations Neo4j: Nodes BOS BOS NYC NYC DC DC ... ... MongoDB: Trips Neo4j: Edges BOS-NYC BOS-NYC BOS-DC BOS-DC NYC-DC NYC-DC ... ... • Forget seq id generated by dbs • Use a human-created long string for id • Convert pair into id: depart-arrive • For example: BOS- NYC
  • 31. Indexing Technique • Index Trips by {origin-dest, datetime}
  • 32. Querying • REST API in node.js • Assemble results from two sources • Paths from Neo4j • Segments from MongoDB • Sort by price, duration
  • 34. Our Story • Wanted to mix public transport data with intercity data • Did not want to host all public transport data • Created a hybrid solution
  • 35. Our Solution • Hybrid: • Google Autocomplete • Google Maps • In house station geo lookup
  • 36. Geo • Neo4j geo func was not out of the box • Requires jar install • Run a Java program to index • Needed better doc • Ended up using MongoDB geo instead • Make geo func out of the box
  • 37. Conclusions • Even with a join across dbs -> solution better than relational • 10s paths x 100s segments vs. 500k x 500k • Glad to have picked Neo4j: doing content gen and more geo features now • Graph model will be useful for future analytics->Big Data
  • 38. Useful Links • Neo4j Internals slideshare.net/thobe/an-overview-of-neo4j-internals • Aseem’s Lessons Learned with Neo4j http://aseemk.com/talks/neo4j-lessons-learned#/14 • Wes Freeman, Neo4j Internals http://wes.skeweredrook.com/graphdb-meetup-may-2013.pdf • MongoConnector blog.mongodb.org/post/29127828146/introducing-mongo-connector