We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
2. Topics
• Property Graph Model
• Cypher - A language for querying graphs
• Cypher History
• Cypher Demo
• Current implementation in Neo4j
• User Feedback
• Opening up - The openCypher project
• Governance, Contribution Process
• Planned Deliverables
4. CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Labeled Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
LOVES
LOVES
LIVES WITH
PERSON PERSON
5. Relational Versus Graph Models
Relational Model Graph Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person PersonPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
7. Why Yet Another Query Language (YAQL)?
• SQL and SparQL hurt our brains
• Our brains crave patterns
• It‘s all about patterns
• Creating a query language is fun (and hard work)
8. What is Cypher?
• A graph query language that allows for expressive and efficient
querying of graph data
• Intuitive, powerful and easy to learn
• Write graph queries by describing patterns in your data
• Focus on your domain not the mechanics of data access.
• Designed to be a human-readable query language
• Suitable for developers and operations professionals
9. What is Cypher?
• Cypher is declarative, which means it lets users express what
data to retrieve
• The guiding principle behind Cypher is to make simple things
easy and complex things possible
• A humane query language
• Stolen from SQL (common keywords), SPARQL (pattern
matching), Python and Haskell (collection semantics)
10. Why Cypher?
Compared to:
• SPARQL (Cypher came from real-world use, not academia)
• Gremlin (declarative vs imperative)
• SQL (graph-specific vs set-specific)
(Cypher)-[:LOVES]->(ASCII Art)
A language should be readable, not just writable. You will read your code
dozens more times than you write it. Regex for example are write-only.
12. Basic Query: Who do people report to?
MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andrew”} )
REPORTS_TO
Steven Andrew
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
13. Basic Query Comparison: Who do people report to?
SELECT *
FROM Employee as e
JOIN Employee_Report AS er ON (e.id = er.manager_id)
JOIN Employee AS sub ON (er.sub_id = sub.id)
MATCH
(e:Employee)-[:REPORTS_TO]->(mgr:Employee)
RETURN
*
18. Patterns are used in
• (OPTIONAL) MATCH
• CREATE, MERGE
• shortestPath()
• Predicates
• Expressions
• (Comprehensions)
19. Syntax: Structure
(OPTIONAL) MATCH <patterns>
WHERE <predicates>
RETURN <expression> AS <name>
ORDER BY <expression>
SKIP <offset> LIMIT <size>
20. Syntax: Automatic Aggregation
MATCH <patterns>
RETURN <expr>, collect([distinct] <expression>) AS <name>,
count(*) AS freq
ORDER BY freq DESC
21. DataFlow: WITH
WITH <expression> AS <name>, ....
• controls data flow between query segments
• separates reads from writes
• can also
• aggregate
• sort
• paginate
• replacement for HAVING
• as many WITHs as you like
23. Data Import
[USING PERODIC COMMIT <count>]
LOAD CSV [WITH HEADERS] FROM „URL“ AS row
... any Cypher clauses, mostly match + updates ...
24. Collections
UNWIND (range(1,10) + [11,12,13]) AS x
WITH collect(x) AS coll
WHERE any(x IN coll WHERE x % 2 = 0)
RETURN size(coll), coll[0], coll[1..-1] ,
reduce(a = 0, x IN coll | a + x),
extract(x IN coll | x*x), filter(x IN coll WHERE x > 10),
[x IN coll WHERE x > 10 | x*x ]
25. Maps & Entities
WITH {age:42, name: „John“, male:true} as data
WHERE exists(data.name) AND data[„age“] = 42
CREATE (n:Person) SET n += data
RETURN [k in keys(n) WHERE k CONTAINS „a“
| {key: k, value: n[k] } ]
26. Optional Schema
CREATE INDEX ON :Label(property)
CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE
CREATE CONSTRAINT ON (n:Label) ASSERT exists(n.property)
CREATE CONSTRAINT ON (:Label)-[r:REL]->(:Label2)
ASSERT exists(r.property)
27. And much more ...
neo4j.com/docs/stable/cypher-refcard
30. Who is in Robert’s (direct, upwards) reporting chain?
MATCH
path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)
WHERE
sub.firstName = 'Robert'
RETURN
path;
31. Who is in Robert’s (direct, upwards) reporting chain?
32. Product Cross-Sell
MATCH
(choc:Product {productName: 'Chocolade'})
<-[:ORDERS]-(:Order)<-[:SOLD]-(employee),
(employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)
RETURN
employee.firstName, other.productName, count(distinct o2) as count
ORDER BY
count DESC
LIMIT 5;
38. Cypher Today - Neo4j Implementation
• Convert the input query into an abstract syntax tree (AST)
• Optimise and normalise the AST (alias expansion, constant folding etc)
• Create a query graph - a high-level, abstract representation of the query -
from the normalised AST
• Create a logical plan, consisting of logical operators, from the query graph,
using the statistics store to calculate the cost. The cheapest logical plan is
selected using IDP (iterative dynamic programming)
• Create an execution plan from the logical plan by choosing a physical
implementation for logical operators
• Execute the query
http://neo4j.com/blog/introducing-new-cypher-query-optimizer/
40. Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2
• Uses database stats to select best plan
• Currently for Read Operations
• Query Plan Visualizer, finds
• Non optimal queries
• Cartesian Product
• Missing Indexes, Global Scans
• Typos
• Massive Fan-Out
42. Why ?
We love Cypher!
Our users love Cypher.
We want to make everyone happy through using it.
And have Cypher run on their data(base).
We want to collaborate with community and industry partners to
create the best graph query language possible!
44. Future of (open)Cypher
• Decouple the language from Neo4j
• Open up and make the language design process transparent
• Encourage use within of databases/tools/highlighters/etc
• Delivery of language docs, tools and implementation
• Governed by the Cypher Language Group (CLG)
45. CIP (Cypher Improvement Proposal)
• A CIP is a semi-formal specification
providing a rationale for new language
features and constructs
• Contributions are welcome:
submit either a CIP (as a pull request)
or a feature request (as an issue) at
the openCypher GitHub repository
• See „Ressources“ for
• accepted CIPs
• Contribution Process
• Template
github.com/opencypher/openCypher
46. CIP structure
• Sections include:
• motivation,
• background,
• proposal (including the
syntax and semantics),
• alternatives,
• interactions with existing
features,
• benefits,
• drawbacks
• Example of the
“STARTS WITH / ENDS
WITH / CONTAINS” CIP
47. Deliverables
✔ Improvement Process
✔ Governing Body
✔ Language grammar (Jan-2016)
Technology certification kit (TCK)
Cypher Reference Documentation
Cypher language specification
Reference implementation (under Apache 2.0)
Cypher style guide
Opening up the CLG
48. Cypher language specification
• EBNF Grammar
• Railroad diagrams
• Semantic specification
• Licensed under a Creative Commons license
50. Technology Compliance Kit (TCK)
● Validates a Cypher implementation
● Certifies that it complies with a given version of Cypher
● Based on given dataset
● Executes a set of queries and
● Verifies expected outputs
51. Cypher Reference Documentation
• Style Guide
• User documentation describing the use of Cypher
• Example datasets with queries
• Tutorials
• GraphGists
52. Style Guide
• Label are CamelCase
• Properties and functions are lowerCamelCase
• Keywords and Relationship-Types are ALL_CAPS
• Patterns should be complete and left to right
• Put anchored nodes first
• .... to be released ...
53. Reference implementation (ASL 2.0)
• A fully functional implementation of key parts of the stack
needed to support Cypher inside a platform or tool
• First deliverable: parser taking a Cypher statement and parsing
it into an AST (abstract syntax tree)
• Future deliverables:
• Rule-based query planner
• Query runtime
• Distributed under the Apache 2.0 license
• Can be used as example or as a implementation foundation
54. The Cypher Language Group (CLG)
• The steering committee for language evolution
• Reviews feature requests and proposals (CIP)
• Caretakers of the language
• Focus on guiding principles
• Long term focus, no quick fixes & hacks
• Currently group of Cypher authors, developers and users
• Publish Meeting Minutes -> opencypher.github.io/meeting-minutes/
55. “Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher
query language has greatly accelerated graph database adoption.
We are looking forward to bringing Cypher’s graph pattern matching capabilities into the Spark
stack, making it easier for masses to access query graph processing.”
- Ion Stoica, CEO & Founder Databricks
“Lots of software systems could be improved by using a graph datastore. One thing holding back the
category has been the lack of a widely supported, standard graph query language. We see the
appearance of openCypher as an important step towards the broader use of graphs across the
industry.”
- Rebecca Parsons, ThoughtWorks, CTO
Some people like it
Cypher query execution in Neo4j:
Convert the input query into an abstract syntax tree (AST)
Optimise and normalise the AST (alias expansion, constant folding etc)
Create a query graph - a high-level, abstract representation of the query - from the normalised AST
Create a logical plan, consisting of logical operators, from the query graph, using the statistics store to calculate the cost. The cheapest logical plan is selected using IDP (iterative dynamic programming)
Create an execution plan from the logical plan by choosing a physical implementation for logical operators
Execute the query
you are the first to see it. permissive license Apache / CC
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j