SlideShare a Scribd company logo
1 of 60
Download to read offline
Importing data quickly
and easily
Mark Needham @markhneedham
Michael Hunger @mesirii
The data set
The data set
‣ Stack Exchange API
‣ Stack Exchange Data Dump
Our goal
Sprinkle
some magic
import dust
Stack Exchange API
{ "items": [{
"question_id": 24620768,
"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
"title": "Neo4j cypher query: get last N elements",
"answer_count": 1,
"score": 1,
.....
"creation_date": 1404771217,
"body_markdown": "I have a graph....How can I do that?",
"tags": ["neo4j", "cypher"],
"owner": {
"reputation": 815,
"user_id": 1212067,
....
"link": "http://stackoverflow.com/users/1212067/"
},
"answers": [{
"owner": {
"reputation": 488,
"user_id": 737080,
"display_name": "Chris Leishman",
....
},
"answer_id": 24620959,
"share_link": "http://stackoverflow.com/a/24620959",
....
"body_markdown": "The simplest would be to use an ... some discussion on this here:...",
"title": "Neo4j cypher query: get last N elements"
}]
}
JSON to CSV
JSON ??? CSV
LOAD
CSV
Initial Model
{ "items": [{
"question_id": 24620768,
"link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements",
"title": "Neo4j cypher query: get last N elements",
"answer_count": 1,
"score": 1,
.....
"creation_date": 1404771217,
"body_markdown": "I have a graph....How can I do that?",
"tags": ["neo4j", "cypher"],
"owner": {
"reputation": 815,
"user_id": 1212067,
....
"link": "http://stackoverflow.com/users/1212067/"
},
"answers": [{
"owner": {
"reputation": 488,
"user_id": 737080,
"display_name": "Chris Leishman",
....
},
"answer_id": 24620959,
"share_link": "http://stackoverflow.com/a/24620959",
....
"body_markdown": "The simplest would be to use an ... some discussion on this here:...",
"title": "Neo4j cypher query: get last N elements"
}]
}
jq: Converting JSON to CSV
jq: Converting questions to CSV
jq -r '.[] | .items[] |
[.question_id,
.title,
.up_vote_count,
.down_vote_count,
.creation_date,
.last_activity_date,
.owner.user_id,
.owner.display_name,
(.tags | join(";"))] |
@csv ' so.json
jq: Converting questions to CSV
$ head -n5 questions.csv
question_id,title,up_vote_count,down_vote_count,creation_date,last_activity_date,
owner_user_id,owner_display_name,tags
33023306,"How to delete multiple nodes by specific ID using Cypher",
0,0,1444328760,1444332194,260511,"rayman","jdbc;neo4j;cypher;spring-data-neo4j"
33020796,"How do a general search across string properties in my nodes?",
1,0,1444320356,1444324015,1429542,"osazuwa","ruby-on-rails;neo4j;neo4j.rb"
33018818,"Neo4j match nodes related to all nodes in collection",
0,0,1444314877,1444332779,1212463,"lmazgon","neo4j;cypher"
33018084,"Problems upgrading to Spring Data Neo4j 4.0.0",
0,0,1444312993,1444312993,1528942,"Grégoire Colbert","neo4j;spring-data-neo4j"
jq: Converting answers to CSV
jq -r '.[] | .items[] |
{ question_id: .question_id, answer: .answers[]? } |
[.question_id,
.answer.answer_id,
.answer.title,
.answer.owner.user_id,
.answer.owner.display_name,
(.answer.tags | join(";")),
.answer.up_vote_count,
.answer.down_vote_count] |
@csv'
jq: Converting answers to CSV
$ head -n5 answers.csv
question_id,answer_id,answer_title,owner_id,owner_display_name,tags,up_vote_count,
down_vote_count
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher",
3248864,"FylmTM","",0,0
33020796,33021958,"How do a general search across string properties in my nodes?",
2920686,"FrobberOfBits","",0,0
33018818,33020068,"Neo4j match nodes related to all nodes in collection",158701,"
Stefan Armbruster","",0,0
33018818,33024273,"Neo4j match nodes related to all nodes in collection",974731,"
cybersam","",0,0
Time to import into Neo4j...
Introducing Cypher
‣ The Graph Query Language
‣ Declarative language (think SQL) for graphs
‣ ASCII art based
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
CREATE (user:User {name:"Michael Hunger"})
Node
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
CREATE (user:User {name:"Michael Hunger"})
Label
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
CREATE (user:User {name:"Michael Hunger"})
Property
‣ CREATE create a new pattern in the graph
Cypher primer
CREATE (user:User {name:"Michael Hunger"})
CREATE (question:Question {title: "..."})
CREATE (answer:Answer {text: "..."})
CREATE (user)-[:PROVIDED]->(answer)
CREATE (answer)-[:ANSWERS]->(question);
CREATE (user)-[:PROVIDED]->(answer)
Relationship
‣ MATCH find a pattern in the graph
Cypher primer
MATCH (answer:Answer)<-[:PROVIDED]-(user:User),
(answer)-[:ANSWERS]->(question)
WHERE user.display_name = "Michael Hunger"
RETURN question, answer;
‣ MERGE find pattern if it exists,
create it if it doesn’t
MERGE (user:User {name:"Mark Needham"})
MERGE (question:Question {title: "..."})
MERGE (answer:Answer {text: "..."})
MERGE (user)-[:PROVIDED]->(answer)
MERGE (answer)-[:ANSWERS]->(question);
Cypher primer
Import using LOAD CSV
‣ LOAD CSV iterates CSV files applying the
provided query line by line
LOAD CSV [WITH HEADERS] FROM [URI/File path]
AS row
CREATE ...
MERGE ...
MATCH ...
LOAD CSV: The naive version
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
MERGE (question:Question {
id:row.question_id,
title: row.title,
up_vote_count: row.up_vote_count,
creation_date: row.creation_date})
MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name})
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN split(row.tags, ";") |
MERGE (tag:Tag {name:tagName})
MERGE (question)-[:TAGGED]->(tag));
Tip: Start with a sample
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (question:Question {
id:row.question_id,
title: row.title,
up_vote_count: row.up_vote_count,
creation_date: row.creation_date})
MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name})
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN split(row.tags, ";") |
MERGE (tag:Tag {name:tagName})
MERGE (question)-[:TAGGED]->(tag));
Tip: MERGE on a key
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (question:Question {id:row.question_id})
ON CREATE SET question.title = row.title,
question.up_vote_count = row.up_vote_count,
question.creation_date = row.creation_date
MERGE (owner:User {id:row.owner_user_id})
ON CREATE SET owner.display_name = row.owner_display_name
MERGE (owner)-[:ASKED]->(question)
FOREACH (tagName IN split(row.tags, ";") |
MERGE (tag:Tag {name:tagName})
MERGE (question)-[:TAGGED]->(tag));
Tip: Index/Constrain those keys
CREATE INDEX ON :Label(property);
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE;
Tip: Index those keys
CREATE INDEX ON :Label(property);
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE;
CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE;
CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE;
CREATE INDEX ON :Question(title);
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (question:Question {id:row.question_id})
ON CREATE SET question.title = row.title,
question.up_vote_count = row.up_vote_count,
question.creation_date = row.creation_date;
Tip: One MERGE per statement
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MERGE (owner:User {id:row.owner_user_id})
ON CREATE SET owner.display_name = row.owner_display_name;
Tip: One MERGE per statement
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
MATCH (question:Question {id:row.question_id})
MATCH (owner:User {id:row.owner_user_id})
MERGE (owner)-[:ASKED]->(question);
Tip: One MERGE per statement
Tip: Use DISTINCT
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
WITH row LIMIT 100
UNWIND split(row.tags, ";") AS tag
WITH distinct tag
MERGE (:Tag {name: tag});
Tip: Use periodic commit
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "questions.csv" AS row
MERGE (question:Question {id:row.question_id})
ON CREATE SET question.title = row.title,
question.up_vote_count = row.up_vote_count,
question.creation_date = row.creation_date;
Periodic commit
‣ Neo4j keeps all transaction state in memory
which is problematic for large CSV files
Periodic commit
‣ Neo4j keeps all transaction state in memory
which is problematic for large CSV files
‣ USING PERIODIC COMMIT flushes the
transaction after a certain number of rows
Periodic commit
‣ Neo4j keeps all transaction state in memory
which is problematic for large CSV files
‣ USING PERIODIC COMMIT flushes the
transaction after a certain number of rows
‣ Currently only works with LOAD CSV
Tip: Script your import commands
Tip: Use neo4j-shell to load script
$ ./neo4j-enterprise-3.0.0-RC1/bin/neo4j-shell
-file import.cql [-path so.db]
LOAD CSV: Summary
‣ ETL power tool
‣ Built into Neo4J since version 2.1
‣ Can load data from any URL
‣ Good for medium size data (up to 10M rows)
Procedures
‣ Neo4j 3.0.0 sees the introduction of
procedures
Procedures
‣ Neo4j 3.0.0 sees the introduction of
procedures
‣ Procedures allow you to write your own
custom code and call it from Cypher
We’ve got a present for you...
Awesome Procedures (apoc)
https://github.com/neo4j-contrib/neo4j-apoc-procedures
Introducing LOAD JSON
CALL dbms.procedures()
YIELD name AS name, signature AS signature
WITH name, signature
WHERE name = "apoc.load.json"
RETURN name, signature
+==============+=================================================+
|name |signature |
+==============+=================================================+
|apoc.load.json|apoc.load.json(url :: STRING?) :: (value :: MAP?)|
+--------------+-------------------------------------------------+
CALL apoc.load.json("http://api.stackexchange.com/2.2/questions?
pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow") YIELD value
UNWIND value.items AS row
RETURN row.question_id, row.title
╒═══════════════╤═══════════════════════════════════════════════════════════════════════════════════════
═════════════╕
│row.question_id│row.title │
╞═══════════════╪═══════════════════════════════════════════════════════════════════════════════════════
═════════════╡
│36819705 │Searching nth-level friends in neo4j.rb in bi-directional relationship graph │
├───────────────┼───────────────────────────────────────────────────────────────────────────────────────
─────────────┤
│36816608 │Copy Neo4J database using Python │
├───────────────┼───────────────────────────────────────────────────────────────────────────────────────
─────────────┤
│36815876 │Deploying neo4j-server to cloud foundry. │
├───────────────┼───────────────────────────────────────────────────────────────────────────────────────
LOAD JSON
CALL apoc.load.json("http://api.stackexchange.com/2.2/questions?
pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow") YIELD value
UNWIND value.items AS row
MERGE (question:Question {id: row.question_id})
ON CREATE SET question.title = row.title
...
LOAD JSON
Even more procedures...
NoSQL Polyglot Persistence: Tools and
Integrations
William Lyon, Neo Technology
4.30pm - 5.00pm in here
Bulk loading an initial data set
Bulk loading an initial data set
‣ Introducing the Neo4j Import Tool
Bulk loading an initial data set
‣ Introducing the Neo4j Import Tool
‣ Used to large sized initial data sets
Bulk loading an initial data set
‣ Introducing the Neo4j Import Tool
‣ Used to large sized initial data sets
‣ Skips the transactional layer of Neo4j and
concurrently writes store files directly
Importing into Neo4j
:ID(Crime) :LABEL description
export NEO=neo4j-enterprise-3.0.0
$NEO/bin/neo4j-import 
--into stackoverflow.db 
--id-type string 
--nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz 
--nodes:User extracted/Users_header.csv,extracted/Users.csv.gz 
--nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz 
--relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz 
--relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz
--relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz 
--relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz
Expects files in a certain format
:ID(Crime) :LABEL descriptionpostId:ID(Post) title body
Nodes
userId:ID(User) displayname views
Rels
:START_ID(User) :END_ID(Post)
<?xml version="1.0" encoding="utf-16"?>
<posts>
...
<row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667"
Score="358" ViewCount="24247" Body="..." OwnerUserId="8" LastEditorUserId="451518"
LastEditorDisplayName="Rich B" LastEditDate="2014-07-28T10:02:50.557" LastActivityDate="
2015-08-01T12:55:11.380" Title="When setting a form's opacity should I use a decimal or
double?" Tags="&lt;c#&gt;&lt;winforms&gt;&lt;type-conversion&gt;&lt;opacity&gt;"
AnswerCount="13" CommentCount="1" FavoriteCount="28" CommunityOwnedDate="2012-10-31T16:42:
47.213" />
...
</posts>
What do we have?
<posts>
...
<row Id="4" PostTypeId="1"
AcceptedAnswerId="7" CreationDate="
2008-07-31T21:42:52.667" Score="358"
ViewCount="24247" Body="..."
OwnerUserId="8" LastEditorUserId="
451518" LastEditorDisplayName="Rich
B" LastEditDate="2014-07-28T10:02:
50.557" LastActivityDate="2015-08-
01T12:55:11.380" Title="When setting
a form's opacity should I use a
decimal or double?"
XML to CSV
Java program
The generated files
$ cat extracted/Posts_header.csv
"postId:ID(Post)","title","postType:INT",
"createdAt","score:INT","views:INT","answers:INT",
"comments:INT","favorites:INT","updatedAt"
The generated files
$ gzcat extracted/Posts.csv.gz | head -n2
"4","When setting a forms opacity should I use a
decimal or double?","1","2008-07-31T21:42:52.667","
358","24247","13","1","28","2014-07-28T10:02:50.557"
"6","Why doesn’t the percentage width child in
absolutely positioned parent work?","1","2008-07-
31T22:08:08.620","156","11840","5","0","7","2015-04-
26T14:37:49.673"
The generated files
$ cat extracted/PostsRels_header.csv
":START_ID(Post)",":END_ID(Post)"
Importing into Neo4j
:ID(Crime) :LABEL description
export NEO=neo4j-enterprise-3.0.0
$NEO/bin/neo4j-import 
--into stackoverflow.db 
--id-type string 
--nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz 
--nodes:User extracted/Users_header.csv,extracted/Users.csv.gz 
--nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz 
--relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz 
--relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz
--relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz 
--relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz 
IMPORT DONE in 3m 10s 661ms. Imported:
31138574 nodes
77930024 relationships
218106346 properties
Tip: Make sure your data is clean
:ID(Crime) :LABEL description
‣ Use a consistent line break style
‣ Ensure headers are consistent with data
‣ Quote Special characters
‣ Escape stray quotes
‣ Remove non-text characters
Even more tips
:ID(Crime) :LABEL description
‣ Get the fastest disk you can
‣ Use separate disk for input and output
‣ Compress your CSV files
‣ The more cores the better
‣ Separate headers from data
The End
‣ https://github.com/mdamien/stackoverflow-neo4j
‣ http://neo4j.com/blog/import-10m-stack-overflow-questions/
‣ http://neo4j.com/blog/cypher-load-json-from-url/
‣ https://github.com/neo4j-contrib/neo4j-apoc-procedures
Michael Hunger @mesirii
Mark Needham @markhneedham

More Related Content

What's hot

NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hacksteindistributed matters
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Neo4j
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...MongoDB
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013Server Density
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityCurtis Mosters
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBWilliam Candillon
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQLLuca Garulli
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
Html5 drupal7 with mandakini kumari(1)
Html5 drupal7 with mandakini kumari(1)Html5 drupal7 with mandakini kumari(1)
Html5 drupal7 with mandakini kumari(1)Mandakini Kumari
 
Benchx: An XQuery benchmarking web application
Benchx: An XQuery benchmarking web application Benchx: An XQuery benchmarking web application
Benchx: An XQuery benchmarking web application Andy Bunce
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryWilliam Candillon
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Webinar: From Relational Databases to MongoDB - What You Need to Know
Webinar: From Relational Databases to MongoDB - What You Need to KnowWebinar: From Relational Databases to MongoDB - What You Need to Know
Webinar: From Relational Databases to MongoDB - What You Need to KnowMongoDB
 
From sql server to mongo db
From sql server to mongo dbFrom sql server to mongo db
From sql server to mongo dbRyan Hoffman
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkTyler Brock
 

What's hot (20)

NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
 
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
Benefits of Using MongoDB Over RDBMS (At An Evening with MongoDB Minneapolis ...
 
NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013NoSQL Infrastructure - Late 2013
NoSQL Infrastructure - Late 2013
 
OrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionalityOrientDB vs Neo4j - Comparison of query/speed/functionality
OrientDB vs Neo4j - Comparison of query/speed/functionality
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
Scalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDBScalable XQuery Processing with Zorba on top of MongoDB
Scalable XQuery Processing with Zorba on top of MongoDB
 
OrientDB introduction - NoSQL
OrientDB introduction - NoSQLOrientDB introduction - NoSQL
OrientDB introduction - NoSQL
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Html5 drupal7 with mandakini kumari(1)
Html5 drupal7 with mandakini kumari(1)Html5 drupal7 with mandakini kumari(1)
Html5 drupal7 with mandakini kumari(1)
 
Benchx: An XQuery benchmarking web application
Benchx: An XQuery benchmarking web application Benchx: An XQuery benchmarking web application
Benchx: An XQuery benchmarking web application
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Cutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQueryCutting Edge Data Processing with PHP & XQuery
Cutting Edge Data Processing with PHP & XQuery
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Webinar: From Relational Databases to MongoDB - What You Need to Know
Webinar: From Relational Databases to MongoDB - What You Need to KnowWebinar: From Relational Databases to MongoDB - What You Need to Know
Webinar: From Relational Databases to MongoDB - What You Need to Know
 
From sql server to mongo db
From sql server to mongo dbFrom sql server to mongo db
From sql server to mongo db
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 

Similar to GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger

Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyMark Needham
 
Scalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンScalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンHiroshi Ito
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.GeeksLab Odessa
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
 
PostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practicePostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practiceJano Suchal
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWaveData Works MD
 
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202Mahmoud Samir Fayed
 
Cassandra for Python Developers
Cassandra for Python DevelopersCassandra for Python Developers
Cassandra for Python DevelopersTyler Hobbs
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaCassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaDataStax Academy
 
Cassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsChristopher Batey
 
Boost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringBoost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringMiro Wengner
 
EWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven IndexingEWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven IndexingRob Tweed
 
Database queries
Database queriesDatabase queries
Database querieslaiba29012
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181Mahmoud Samir Fayed
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3Eric Evans
 
Python Code Camp for Professionals 4/4
Python Code Camp for Professionals 4/4Python Code Camp for Professionals 4/4
Python Code Camp for Professionals 4/4DEVCON
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 

Similar to GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger (20)

Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
 
Scalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーンScalaで実装してみる簡易ブロックチェーン
Scalaで実装してみる簡易ブロックチェーン
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
PostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practicePostgreSQL: Advanced features in practice
PostgreSQL: Advanced features in practice
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
DataMapper
DataMapperDataMapper
DataMapper
 
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.8 book - Part 50 of 202
 
Cassandra for Python Developers
Cassandra for Python DevelopersCassandra for Python Developers
Cassandra for Python Developers
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaCassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
 
Cassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java Applications
 
Boost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineeringBoost delivery stream with code discipline engineering
Boost delivery stream with code discipline engineering
 
EWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven IndexingEWD 3 Training Course Part 26: Event-driven Indexing
EWD 3 Training Course Part 26: Event-driven Indexing
 
Php summary
Php summaryPhp summary
Php summary
 
Database queries
Database queriesDatabase queries
Database queries
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
 
Python Code Camp for Professionals 4/4
Python Code Camp for Professionals 4/4Python Code Camp for Professionals 4/4
Python Code Camp for Professionals 4/4
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 

More from Neo4j

From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxNeo4j
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNeo4j
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 

More from Neo4j (20)

From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger

  • 1. Importing data quickly and easily Mark Needham @markhneedham Michael Hunger @mesirii
  • 3. The data set ‣ Stack Exchange API ‣ Stack Exchange Data Dump
  • 5. Stack Exchange API { "items": [{ "question_id": 24620768, "link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements", "title": "Neo4j cypher query: get last N elements", "answer_count": 1, "score": 1, ..... "creation_date": 1404771217, "body_markdown": "I have a graph....How can I do that?", "tags": ["neo4j", "cypher"], "owner": { "reputation": 815, "user_id": 1212067, .... "link": "http://stackoverflow.com/users/1212067/" }, "answers": [{ "owner": { "reputation": 488, "user_id": 737080, "display_name": "Chris Leishman", .... }, "answer_id": 24620959, "share_link": "http://stackoverflow.com/a/24620959", .... "body_markdown": "The simplest would be to use an ... some discussion on this here:...", "title": "Neo4j cypher query: get last N elements" }] }
  • 6. JSON to CSV JSON ??? CSV LOAD CSV
  • 7. Initial Model { "items": [{ "question_id": 24620768, "link": "http://stackoverflow.com/questions/24620768/neo4j-cypher-query-get-last-n-elements", "title": "Neo4j cypher query: get last N elements", "answer_count": 1, "score": 1, ..... "creation_date": 1404771217, "body_markdown": "I have a graph....How can I do that?", "tags": ["neo4j", "cypher"], "owner": { "reputation": 815, "user_id": 1212067, .... "link": "http://stackoverflow.com/users/1212067/" }, "answers": [{ "owner": { "reputation": 488, "user_id": 737080, "display_name": "Chris Leishman", .... }, "answer_id": 24620959, "share_link": "http://stackoverflow.com/a/24620959", .... "body_markdown": "The simplest would be to use an ... some discussion on this here:...", "title": "Neo4j cypher query: get last N elements" }] }
  • 9. jq: Converting questions to CSV jq -r '.[] | .items[] | [.question_id, .title, .up_vote_count, .down_vote_count, .creation_date, .last_activity_date, .owner.user_id, .owner.display_name, (.tags | join(";"))] | @csv ' so.json
  • 10. jq: Converting questions to CSV $ head -n5 questions.csv question_id,title,up_vote_count,down_vote_count,creation_date,last_activity_date, owner_user_id,owner_display_name,tags 33023306,"How to delete multiple nodes by specific ID using Cypher", 0,0,1444328760,1444332194,260511,"rayman","jdbc;neo4j;cypher;spring-data-neo4j" 33020796,"How do a general search across string properties in my nodes?", 1,0,1444320356,1444324015,1429542,"osazuwa","ruby-on-rails;neo4j;neo4j.rb" 33018818,"Neo4j match nodes related to all nodes in collection", 0,0,1444314877,1444332779,1212463,"lmazgon","neo4j;cypher" 33018084,"Problems upgrading to Spring Data Neo4j 4.0.0", 0,0,1444312993,1444312993,1528942,"Gr&#233;goire Colbert","neo4j;spring-data-neo4j"
  • 11. jq: Converting answers to CSV jq -r '.[] | .items[] | { question_id: .question_id, answer: .answers[]? } | [.question_id, .answer.answer_id, .answer.title, .answer.owner.user_id, .answer.owner.display_name, (.answer.tags | join(";")), .answer.up_vote_count, .answer.down_vote_count] | @csv'
  • 12. jq: Converting answers to CSV $ head -n5 answers.csv question_id,answer_id,answer_title,owner_id,owner_display_name,tags,up_vote_count, down_vote_count 33023306,33024189,"How to delete multiple nodes by specific ID using Cypher", 3248864,"FylmTM","",0,0 33020796,33021958,"How do a general search across string properties in my nodes?", 2920686,"FrobberOfBits","",0,0 33018818,33020068,"Neo4j match nodes related to all nodes in collection",158701," Stefan Armbruster","",0,0 33018818,33024273,"Neo4j match nodes related to all nodes in collection",974731," cybersam","",0,0
  • 13. Time to import into Neo4j...
  • 14. Introducing Cypher ‣ The Graph Query Language ‣ Declarative language (think SQL) for graphs ‣ ASCII art based
  • 15. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question);
  • 16. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question); CREATE (user:User {name:"Michael Hunger"}) Node
  • 17. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question); CREATE (user:User {name:"Michael Hunger"}) Label
  • 18. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question); CREATE (user:User {name:"Michael Hunger"}) Property
  • 19. ‣ CREATE create a new pattern in the graph Cypher primer CREATE (user:User {name:"Michael Hunger"}) CREATE (question:Question {title: "..."}) CREATE (answer:Answer {text: "..."}) CREATE (user)-[:PROVIDED]->(answer) CREATE (answer)-[:ANSWERS]->(question); CREATE (user)-[:PROVIDED]->(answer) Relationship
  • 20. ‣ MATCH find a pattern in the graph Cypher primer MATCH (answer:Answer)<-[:PROVIDED]-(user:User), (answer)-[:ANSWERS]->(question) WHERE user.display_name = "Michael Hunger" RETURN question, answer;
  • 21. ‣ MERGE find pattern if it exists, create it if it doesn’t MERGE (user:User {name:"Mark Needham"}) MERGE (question:Question {title: "..."}) MERGE (answer:Answer {text: "..."}) MERGE (user)-[:PROVIDED]->(answer) MERGE (answer)-[:ANSWERS]->(question); Cypher primer
  • 22. Import using LOAD CSV ‣ LOAD CSV iterates CSV files applying the provided query line by line LOAD CSV [WITH HEADERS] FROM [URI/File path] AS row CREATE ... MERGE ... MATCH ...
  • 23. LOAD CSV: The naive version LOAD CSV WITH HEADERS FROM "questions.csv" AS row MERGE (question:Question { id:row.question_id, title: row.title, up_vote_count: row.up_vote_count, creation_date: row.creation_date}) MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name}) MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN split(row.tags, ";") | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag));
  • 24. Tip: Start with a sample LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (question:Question { id:row.question_id, title: row.title, up_vote_count: row.up_vote_count, creation_date: row.creation_date}) MERGE (owner:User {id:row.owner_user_id, display_name: row.owner_display_name}) MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN split(row.tags, ";") | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag));
  • 25. Tip: MERGE on a key LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (question:Question {id:row.question_id}) ON CREATE SET question.title = row.title, question.up_vote_count = row.up_vote_count, question.creation_date = row.creation_date MERGE (owner:User {id:row.owner_user_id}) ON CREATE SET owner.display_name = row.owner_display_name MERGE (owner)-[:ASKED]->(question) FOREACH (tagName IN split(row.tags, ";") | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag));
  • 26. Tip: Index/Constrain those keys CREATE INDEX ON :Label(property); CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE;
  • 27. Tip: Index those keys CREATE INDEX ON :Label(property); CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE; CREATE CONSTRAINT ON (q:Question) ASSERT q.id IS UNIQUE; CREATE CONSTRAINT ON (u:User) ASSERT u.id IS UNIQUE; CREATE INDEX ON :Question(title);
  • 28. LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (question:Question {id:row.question_id}) ON CREATE SET question.title = row.title, question.up_vote_count = row.up_vote_count, question.creation_date = row.creation_date; Tip: One MERGE per statement
  • 29. LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MERGE (owner:User {id:row.owner_user_id}) ON CREATE SET owner.display_name = row.owner_display_name; Tip: One MERGE per statement
  • 30. LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 MATCH (question:Question {id:row.question_id}) MATCH (owner:User {id:row.owner_user_id}) MERGE (owner)-[:ASKED]->(question); Tip: One MERGE per statement
  • 31. Tip: Use DISTINCT LOAD CSV WITH HEADERS FROM "questions.csv" AS row WITH row LIMIT 100 UNWIND split(row.tags, ";") AS tag WITH distinct tag MERGE (:Tag {name: tag});
  • 32. Tip: Use periodic commit USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "questions.csv" AS row MERGE (question:Question {id:row.question_id}) ON CREATE SET question.title = row.title, question.up_vote_count = row.up_vote_count, question.creation_date = row.creation_date;
  • 33. Periodic commit ‣ Neo4j keeps all transaction state in memory which is problematic for large CSV files
  • 34. Periodic commit ‣ Neo4j keeps all transaction state in memory which is problematic for large CSV files ‣ USING PERIODIC COMMIT flushes the transaction after a certain number of rows
  • 35. Periodic commit ‣ Neo4j keeps all transaction state in memory which is problematic for large CSV files ‣ USING PERIODIC COMMIT flushes the transaction after a certain number of rows ‣ Currently only works with LOAD CSV
  • 36. Tip: Script your import commands
  • 37. Tip: Use neo4j-shell to load script $ ./neo4j-enterprise-3.0.0-RC1/bin/neo4j-shell -file import.cql [-path so.db]
  • 38. LOAD CSV: Summary ‣ ETL power tool ‣ Built into Neo4J since version 2.1 ‣ Can load data from any URL ‣ Good for medium size data (up to 10M rows)
  • 39. Procedures ‣ Neo4j 3.0.0 sees the introduction of procedures
  • 40. Procedures ‣ Neo4j 3.0.0 sees the introduction of procedures ‣ Procedures allow you to write your own custom code and call it from Cypher
  • 41. We’ve got a present for you... Awesome Procedures (apoc) https://github.com/neo4j-contrib/neo4j-apoc-procedures
  • 42. Introducing LOAD JSON CALL dbms.procedures() YIELD name AS name, signature AS signature WITH name, signature WHERE name = "apoc.load.json" RETURN name, signature +==============+=================================================+ |name |signature | +==============+=================================================+ |apoc.load.json|apoc.load.json(url :: STRING?) :: (value :: MAP?)| +--------------+-------------------------------------------------+
  • 43. CALL apoc.load.json("http://api.stackexchange.com/2.2/questions? pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow") YIELD value UNWIND value.items AS row RETURN row.question_id, row.title ╒═══════════════╤═══════════════════════════════════════════════════════════════════════════════════════ ═════════════╕ │row.question_id│row.title │ ╞═══════════════╪═══════════════════════════════════════════════════════════════════════════════════════ ═════════════╡ │36819705 │Searching nth-level friends in neo4j.rb in bi-directional relationship graph │ ├───────────────┼─────────────────────────────────────────────────────────────────────────────────────── ─────────────┤ │36816608 │Copy Neo4J database using Python │ ├───────────────┼─────────────────────────────────────────────────────────────────────────────────────── ─────────────┤ │36815876 │Deploying neo4j-server to cloud foundry. │ ├───────────────┼─────────────────────────────────────────────────────────────────────────────────────── LOAD JSON
  • 44. CALL apoc.load.json("http://api.stackexchange.com/2.2/questions? pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow") YIELD value UNWIND value.items AS row MERGE (question:Question {id: row.question_id}) ON CREATE SET question.title = row.title ... LOAD JSON
  • 45. Even more procedures... NoSQL Polyglot Persistence: Tools and Integrations William Lyon, Neo Technology 4.30pm - 5.00pm in here
  • 46. Bulk loading an initial data set
  • 47. Bulk loading an initial data set ‣ Introducing the Neo4j Import Tool
  • 48. Bulk loading an initial data set ‣ Introducing the Neo4j Import Tool ‣ Used to large sized initial data sets
  • 49. Bulk loading an initial data set ‣ Introducing the Neo4j Import Tool ‣ Used to large sized initial data sets ‣ Skips the transactional layer of Neo4j and concurrently writes store files directly
  • 50. Importing into Neo4j :ID(Crime) :LABEL description export NEO=neo4j-enterprise-3.0.0 $NEO/bin/neo4j-import --into stackoverflow.db --id-type string --nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz --nodes:User extracted/Users_header.csv,extracted/Users.csv.gz --nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz --relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz --relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz --relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz --relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz
  • 51. Expects files in a certain format :ID(Crime) :LABEL descriptionpostId:ID(Post) title body Nodes userId:ID(User) displayname views Rels :START_ID(User) :END_ID(Post)
  • 52. <?xml version="1.0" encoding="utf-16"?> <posts> ... <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate="2008-07-31T21:42:52.667" Score="358" ViewCount="24247" Body="..." OwnerUserId="8" LastEditorUserId="451518" LastEditorDisplayName="Rich B" LastEditDate="2014-07-28T10:02:50.557" LastActivityDate=" 2015-08-01T12:55:11.380" Title="When setting a form's opacity should I use a decimal or double?" Tags="&lt;c#&gt;&lt;winforms&gt;&lt;type-conversion&gt;&lt;opacity&gt;" AnswerCount="13" CommentCount="1" FavoriteCount="28" CommunityOwnedDate="2012-10-31T16:42: 47.213" /> ... </posts> What do we have?
  • 53. <posts> ... <row Id="4" PostTypeId="1" AcceptedAnswerId="7" CreationDate=" 2008-07-31T21:42:52.667" Score="358" ViewCount="24247" Body="..." OwnerUserId="8" LastEditorUserId=" 451518" LastEditorDisplayName="Rich B" LastEditDate="2014-07-28T10:02: 50.557" LastActivityDate="2015-08- 01T12:55:11.380" Title="When setting a form's opacity should I use a decimal or double?" XML to CSV Java program
  • 54. The generated files $ cat extracted/Posts_header.csv "postId:ID(Post)","title","postType:INT", "createdAt","score:INT","views:INT","answers:INT", "comments:INT","favorites:INT","updatedAt"
  • 55. The generated files $ gzcat extracted/Posts.csv.gz | head -n2 "4","When setting a forms opacity should I use a decimal or double?","1","2008-07-31T21:42:52.667"," 358","24247","13","1","28","2014-07-28T10:02:50.557" "6","Why doesn’t the percentage width child in absolutely positioned parent work?","1","2008-07- 31T22:08:08.620","156","11840","5","0","7","2015-04- 26T14:37:49.673"
  • 56. The generated files $ cat extracted/PostsRels_header.csv ":START_ID(Post)",":END_ID(Post)"
  • 57. Importing into Neo4j :ID(Crime) :LABEL description export NEO=neo4j-enterprise-3.0.0 $NEO/bin/neo4j-import --into stackoverflow.db --id-type string --nodes:Post extracted/Posts_header.csv,extracted/Posts.csv.gz --nodes:User extracted/Users_header.csv,extracted/Users.csv.gz --nodes:Tag extracted/Tags_header.csv,extracted/Tags.csv.gz --relationships:PARENT_OF extracted/PostsRels_header.csv,extracted/PostsRels.csv.gz --relationships:ANSWERS extracted/PostsAnswers_header.csv,extracted/PostsAnswers.csv.gz --relationships:HAS_TAG extracted/TagsPosts_header.csv,extracted/TagsPosts.csv.gz --relationships:POSTED extracted/UsersPosts_header.csv,extracted/UsersPosts.csv.gz IMPORT DONE in 3m 10s 661ms. Imported: 31138574 nodes 77930024 relationships 218106346 properties
  • 58. Tip: Make sure your data is clean :ID(Crime) :LABEL description ‣ Use a consistent line break style ‣ Ensure headers are consistent with data ‣ Quote Special characters ‣ Escape stray quotes ‣ Remove non-text characters
  • 59. Even more tips :ID(Crime) :LABEL description ‣ Get the fastest disk you can ‣ Use separate disk for input and output ‣ Compress your CSV files ‣ The more cores the better ‣ Separate headers from data
  • 60. The End ‣ https://github.com/mdamien/stackoverflow-neo4j ‣ http://neo4j.com/blog/import-10m-stack-overflow-questions/ ‣ http://neo4j.com/blog/cypher-load-json-from-url/ ‣ https://github.com/neo4j-contrib/neo4j-apoc-procedures Michael Hunger @mesirii Mark Needham @markhneedham