The openCypher Project - An Open Graph Query Language

The openCypher project
Michael Hunger

Topics
• Property Graph Model
• Cypher - A language for querying graphs
• Cypher History
• Cypher Demo
• Current implementation in Neo4j
• User Feedback
• Opening up - The openCypher project
• Governance, Contribution Process
• Planned Deliverables

The Property-Graph-Model
You know it, right?

CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Labeled Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
LOVES
LOVES
LIVES WITH
PERSON PERSON

Relational Versus Graph Models
Relational Model Graph Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person PersonPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA

Cypher Query Language
Why, How, When?

Why Yet Another Query Language (YAQL)?
• SQL and SparQL hurt our brains
• Our brains crave patterns
• It‘s all about patterns
• Creating a query language is fun (and hard work)

What is Cypher?
• A graph query language that allows for expressive and efficient
querying of graph data
• Intuitive, powerful and easy to learn
• Write graph queries by describing patterns in your data
• Focus on your domain not the mechanics of data access.
• Designed to be a human-readable query language
• Suitable for developers and operations professionals

What is Cypher?
• Cypher is declarative, which means it lets users express what
data to retrieve
• The guiding principle behind Cypher is to make simple things
easy and complex things possible
• A humane query language
• Stolen from SQL (common keywords), SPARQL (pattern
matching), Python and Haskell (collection semantics)

Why Cypher?
Compared to:
• SPARQL (Cypher came from real-world use, not academia)
• Gremlin (declarative vs imperative)
• SQL (graph-specific vs set-specific)
(Cypher)-[:LOVES]->(ASCII Art)
A language should be readable, not just writable. You will read your code
dozens more times than you write it. Regex for example are write-only.

Querying the Graph
Some Examples With Cypher

Basic Query: Who do people report to?
MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andrew”} )
REPORTS_TO
Steven Andrew
LABEL PROPERTY
NODE NODE
LABEL PROPERTY

Basic Query Comparison: Who do people report to?
SELECT *
FROM Employee as e
JOIN Employee_Report AS er ON (e.id = er.manager_id)
JOIN Employee AS sub ON (er.sub_id = sub.id)
MATCH
(e:Employee)-[:REPORTS_TO]->(mgr:Employee)
RETURN
*

Basic Query: Who do people report to?

Cypher Syntax
Only Tip of the Iceberg

Syntax: Patterns
( )-->( )
(node:Label {key:value})
(node1)-[rel:REL_TYPE {key:value}]->(node2)
(node1)-[:REL_TYPE1]->(node2)<-[:REL_TYPE2]-(node3)
(node1)-[:REL_TYPE*m..n]->(node2)

Patterns are used in
• (OPTIONAL) MATCH
• CREATE, MERGE
• shortestPath()
• Predicates
• Expressions
• (Comprehensions)

Syntax: Structure
(OPTIONAL) MATCH <patterns>
WHERE <predicates>
RETURN <expression> AS <name>
ORDER BY <expression>
SKIP <offset> LIMIT <size>

Syntax: Automatic Aggregation
MATCH <patterns>
RETURN <expr>, collect([distinct] <expression>) AS <name>,
count(*) AS freq
ORDER BY freq DESC

DataFlow: WITH
WITH <expression> AS <name>, ....
• controls data flow between query segments
• separates reads from writes
• can also
• aggregate
• sort
• paginate
• replacement for HAVING
• as many WITHs as you like

Structure: Writes
CREATE <pattern>
MERGE <pattern> ON CREATE ... ON MATCH ...
(DETACH) DELETE <entity>
SET <property,label>
REMOVE <property,label>

Data Import
[USING PERODIC COMMIT <count>]
LOAD CSV [WITH HEADERS] FROM „URL“ AS row
... any Cypher clauses, mostly match + updates ...

Collections
UNWIND (range(1,10) + [11,12,13]) AS x
WITH collect(x) AS coll
WHERE any(x IN coll WHERE x % 2 = 0)
RETURN size(coll), coll[0], coll[1..-1] ,
reduce(a = 0, x IN coll | a + x),
extract(x IN coll | x*x), filter(x IN coll WHERE x > 10),
[x IN coll WHERE x > 10 | x*x ]

Maps & Entities
WITH {age:42, name: „John“, male:true} as data
WHERE exists(data.name) AND data[„age“] = 42
CREATE (n:Person) SET n += data
RETURN [k in keys(n) WHERE k CONTAINS „a“
| {key: k, value: n[k] } ]

Optional Schema
CREATE INDEX ON :Label(property)
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE
CREATE CONSTRAINT ON (n:Label) ASSERT exists(n.property)
CREATE CONSTRAINT ON (:Label)-[r:REL]->(:Label2)
ASSERT exists(r.property)

And much more ...
neo4j.com/docs/stable/cypher-refcard

MATCH (sub)-[:REPORTS_TO*0..3]->(boss),
(report)-[:REPORTS_TO*1..3]->(sub)
WHERE boss.firstName = 'Andrew'
RETURN sub.firstName AS Subordinate,
count(report) AS Total;
Express Complex Queries Easily with Cypher
Find all direct reports and how
many people they manage,
each up to 3 levels down
Cypher Query
SQL Query

Who is in Robert’s (direct, upwards) reporting chain?
MATCH
path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)
WHERE
sub.firstName = 'Robert'
RETURN
path;

Who is in Robert’s (direct, upwards) reporting chain?

Product Cross-Sell
MATCH
(choc:Product {productName: 'Chocolade'})
<-[:ORDERS]-(:Order)<-[:SOLD]-(employee),
(employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)
RETURN
employee.firstName, other.productName, count(distinct o2) as count
ORDER BY
count DESC
LIMIT 5;

Neo4j‘s Cypher Implementation

History of Cypher
• 1.4 - Cypher initially added to Neo4j
• 1.6 - Cypher becomes part of REST API
• 1.7 - Collection functions, global search, pattern predicates
• 1.8 - Write operations
• 1.9 Type System, Traversal Matcher, Caches, String functions, more
powerful WITH, Lazyness, Profiling, Execution Plan
• 2.0 Label support, label based indexes and constraints, MERGE,
transactional HTTP endpoint, literal maps, slices, new parser, OPTIONAL
MATCH
• 2.1 – LOAD CSV, COST Planner, reduce eagerness, UNWIND, versioning
• 2.2 – COST Planner default, EXPLAIN, PROFILE, vis. Query Plan, IDP
• 2.3 -

APIs
• Embedded
• graphDb.execute(query, params);
• HTTP – transactional Cypher endpoint
• :POST /db/data/transaction[/commit] {statements:[{statement: „query“,
parameters: params, resultDataContents:[„row“], includeStats:true},....]}
• Bolt – binary protocol
• Driver driver = GraphDatabase.driver( "bolt://localhost" );
Session session = driver.session();
Result rs = session.run("CREATE (n) RETURN n");

Cypher Today - Neo4j Implementation
• Convert the input query into an abstract syntax tree (AST)
• Optimise and normalise the AST (alias expansion, constant folding etc)
• Create a query graph - a high-level, abstract representation of the query -
from the normalised AST
• Create a logical plan, consisting of logical operators, from the query graph,
using the statistics store to calculate the cost. The cheapest logical plan is
selected using IDP (iterative dynamic programming)
• Create an execution plan from the logical plan by choosing a physical
implementation for logical operators
• Execute the query
http://neo4j.com/blog/introducing-new-cypher-query-optimizer/

Cypher Today - Neo4j Implementation

Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2
• Uses database stats to select best plan
• Currently for Read Operations
• Query Plan Visualizer, finds
• Non optimal queries
• Cartesian Product
• Missing Indexes, Global Scans
• Typos
• Massive Fan-Out

openCypher
An open graph query language

Why ?
We love Cypher!
Our users love Cypher.
We want to make everyone happy through using it.
And have Cypher run on their data(base).
We want to collaborate with community and industry partners to
create the best graph query language possible!

Future of (open)Cypher
• Decouple the language from Neo4j
• Open up and make the language design process transparent
• Encourage use within of databases/tools/highlighters/etc
• Delivery of language docs, tools and implementation
• Governed by the Cypher Language Group (CLG)

CIP (Cypher Improvement Proposal)
• A CIP is a semi-formal specification
providing a rationale for new language
features and constructs
• Contributions are welcome:
submit either a CIP (as a pull request)
or a feature request (as an issue) at
the openCypher GitHub repository
• See „Ressources“ for
• accepted CIPs
• Contribution Process
• Template
github.com/opencypher/openCypher

CIP structure
• Sections include:
• motivation,
• background,
• proposal (including the
syntax and semantics),
• alternatives,
• interactions with existing
features,
• benefits,
• drawbacks
• Example of the
“STARTS WITH / ENDS
WITH / CONTAINS” CIP

Deliverables
✔ Improvement Process
✔ Governing Body
✔ Language grammar (Jan-2016)
Technology certification kit (TCK)
Cypher Reference Documentation
Cypher language specification
Reference implementation (under Apache 2.0)
Cypher style guide
Opening up the CLG

Cypher language specification
• EBNF Grammar
• Railroad diagrams
• Semantic specification
• Licensed under a Creative Commons license

Language Grammar (RELEASED Jan-30-2016)
…
Match = ['OPTIONAL', SP], 'MATCH', SP, Pattern, {Hint}, [Where] ;
Unwind = 'UNWIND', SP, Expression, SP, 'AS', SP, Variable ;
Merge = 'MERGE', SP, PatternPart, {SP, MergeAction} ;
MergeAction = ('ON', SP, 'MATCH', SP, SetClause)
| ('ON', SP, 'CREATE', SP, SetClause);
...
github.com/opencypher/openCypher/blob/master/grammar.ebnf

Technology Compliance Kit (TCK)
● Validates a Cypher implementation
● Certifies that it complies with a given version of Cypher
● Based on given dataset
● Executes a set of queries and
● Verifies expected outputs

Cypher Reference Documentation
• Style Guide
• User documentation describing the use of Cypher
• Example datasets with queries
• Tutorials
• GraphGists

Style Guide
• Label are CamelCase
• Properties and functions are lowerCamelCase
• Keywords and Relationship-Types are ALL_CAPS
• Patterns should be complete and left to right
• Put anchored nodes first
• .... to be released ...

Reference implementation (ASL 2.0)
• A fully functional implementation of key parts of the stack
needed to support Cypher inside a platform or tool
• First deliverable: parser taking a Cypher statement and parsing
it into an AST (abstract syntax tree)
• Future deliverables:
• Rule-based query planner
• Query runtime
• Distributed under the Apache 2.0 license
• Can be used as example or as a implementation foundation

The Cypher Language Group (CLG)
• The steering committee for language evolution
• Reviews feature requests and proposals (CIP)
• Caretakers of the language
• Focus on guiding principles
• Long term focus, no quick fixes & hacks
• Currently group of Cypher authors, developers and users
• Publish Meeting Minutes -> opencypher.github.io/meeting-minutes/

“Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher
query language has greatly accelerated graph database adoption.
We are looking forward to bringing Cypher’s graph pattern matching capabilities into the Spark
stack, making it easier for masses to access query graph processing.”
- Ion Stoica, CEO & Founder Databricks
“Lots of software systems could be improved by using a graph datastore. One thing holding back the
category has been the lack of a widely supported, standard graph query language. We see the
appearance of openCypher as an important step towards the broader use of graphs across the
industry.”
- Rebecca Parsons, ThoughtWorks, CTO
Some people like it

Ressources
• http://www.opencypher.org/
• https://github.com/opencypher/openCypher
• https://github.com/opencypher/openCypher/blob/master/CONTRIBUTING.
adoc
• https://github.com/opencypher/openCypher/tree/master/cip
• https://github.com/opencypher/openCypher/pulls
• http://groups.google.com/group/openCypher
• @openCypher

Please contribute
Feedback, Ideas, Proposals
Implementations
Thank You !
Questions ?

The openCypher Project - An Open Graph Query Language

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The openCypher Project - An Open Graph Query Language

Similar to The openCypher Project - An Open Graph Query Language (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

The openCypher Project - An Open Graph Query Language

Editor's Notes