8. TinkerPop 0.x
○ “Making stuff for the fun of it..."
○ From RDF to the property graph data model
○ A Turing complete path language for graphs
○ Ripple!
○ Oh, and Gremlin
○ Blueprints
○ “JDBC for graphs”
○ RDF ←→ PG support added early on
9. TinkerPop 0.x
○ Rexster
○ Server for Blueprints-enabled graphs
○ Predecessor of Gremlin Server
○ Pipes
○ Pull-based dataflow framework
○ Frames
○ Object-oriented graph interfaces using Java annotations
13. TinkerPop 2.x
○ Furnace
○ Algorithms package built for property graphs
○ Predecessor of graph OLAP in TinkerPop3
○ New language ecosystems
○ Expansion of functionality on top of Blueprints
15. ○ Complete rewrite of TinkerPop
○ Focus on scale and performance
○ Symmetry between OLTP and OLAP
○ Gremlin becomes more central
○ Git mono-repo
○ Interfaces with not only graph DBs, but
graph processors
TinkerPop 3.x
16. ○ Not-only-JVM
○ Gremlin in native programming languages
○ Now dozens of graph systems implementing TinkerPop
○ Third-party managed libraries and tools
Apache TinkerPop
20. Escape from the JVM
○ TinkerPop originally 100% Java + Groovy
○ Still very JVM-heavy
○ Gremlin-Server is Java-only
○ How to achieve parity across languages?
○ Ideally: complete Gremlin VM in every language ecosystem
○ Code generation?
○ How to generate both:
○ Clean APIs
○ Efficient runtime code
○ ...that fit together?
21. Making life easier for graph providers
○ Creating TinkerPop implementations
○ Currently a monolithic effort for each language / environment
○ How do we:
○ Ensure consistency across implementations?
○ Reduce the workload?
○ Thoughtful test suite
○ Rigorous in terms of correct operations
○ Does not force functionality that may not fit
○ Types and constraints may help
22. Network serialization formats
○ GraphML (XML)
○ Widely supported
○ Graphs only
○ GraphSON (JSON)
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ {1.0, 2.0, 3.0}
○ GraphBinary
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ Good forward-compatibility
○ Gryo (Kryo)
○ JVM only
23. Network serialization formats
○ GraphML (XML)
○ Widely supported
○ Graphs only
○ GraphSON (JSON)
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ {1.0, 2.0, 3.0}
○ GraphBinary
○ TinkerPop-specific
○ Graphs, elements, paths, etc.
○ Good forward-compatibility
○ Gryo (Kryo)
○ JVM only
○ Bit of a format zoo
○ One format to rule them all?
○ Mappings between formats?
○ Will schemas help?
○ How about common RPC formats
○ Thrift, Protobuf, Avro, etc.
24. ○ Property graphs:
○ Strong on intuitiveness
○ Historically weak on schema
○ Lightweight property graph schemas
○ E.g. in JanusGraph, Neo4j, basic Graph.Features
○ Stronger graph schemas
○ RDF triple stores, hypergraph databases, object databases, etc.
○ Schemas facilitate composability of data and queries
○ ...enabling optimizations, mappings, migration, other good stuff
○ What’s the best fit for TinkerPop?
Schemas in TinkerPop
25. Getting transactions right
○ How to support diverse transactional models?
○ Neo4j is different than JanusGraph is different than...
○ Is there a unified approach to:
○ Threads + queries + transactions?
○ Transactional scope?
○ Transaction failures?
○ Nested transactions?
○ etc.
○ Will functional approaches to concurrency help?
26. Static analysis for traversals
○ Stop supporting opaque traversals
○ Security issues
○ Portability issues
○ Need a replacement for closures/lambdas
○ “Just write Gremlin”
○ What additional features are required?
27. Graph stream processing
○ Much of the world’s data is streaming
○ Much of that data describes entities and relationships
○ Decades of research on relational stream processing
○ 10+ years on continuous SPARQL
○ What is continuous Gremlin?
○ (RDF)-[:betterThan]->(PG) for streaming
○ RDF stream := unbounded sequence of triples
○ Property graph stream := ?
○ Need schemas, global identifiers, set operations on graphs
28. Abstractions
Data models
Query languages
Formal inference
Transformations
Embeddings
Graph +
Relational model
Streams
...
Human and machine knowledge
Knowledge graphs
Enterprise
Personal
Collaborative
Mental representations
Representation learning
Visualization and HCI
...
Processing and performance
Graph...
Ingestion
Generation
Partitioning
Compression
Concurrent systems
Parallel
Distributed
Graph analytics
Hardware acceleration
Benchmarks and metrics
...
The 1010
foot view
30. From Graph.Features to a real type system
○ No existing standard for property graphs
○ Recent community efforts
○ W3C Workshop on Web Standardization for Graph Data (March 2019)
○ Property Graph Schema Working Group (PGSWG)
○ Graph Query Language (GQL)
○ Don’t forget about external data models
○ Relational model
○ RDF and other graph models
○ Data interchange formats (Protocol Buffers, Thrift, Avro, etc.)
○ OO, ER, and semistructured data models
33. Algebraic Property Graphs
○ Last year at Data Day...
○ A Graph is a Graph is a Graph
○ Composable and bidirectional mappings
○ Formal property graph data model
○ Taxonomy of graph elements
○ Use category theory for the model
○ Developed with Ryan Wisnesky (Conexus AI)
○ Implementations in Haskell and CQL
○ Minimal cover for enterprise data
○ Analogous features in graph and non-graph data models
36. Building structure APIs
○ Vertices, edges, and properties
○ Special cases that can be derived from the type system
○ Graphs are different
○ Not described in terms of types
○ Graph API is often redundant in TinkerPop3
○ Structure APIs currently written by hand
○ In each language, for each Gremlin Language Variant
○ We can generate consistent interfaces across GLVs
○ Some tooling already exists
○ Build new tools if we want to make it easier
37. Building process APIs
○ Need abstractions for graph processing
○ Steps, constraints, traversals
○ Freebie: every traversal has a graph representation
○ Graph programs as graph data
○ Generate process APIs for each GLV
○ Using a schema; analogous to generating structure APIs
○ Possible to also generate process implementations?
○ That would be great, but... TBD
○ Code gen options: Haskell? Idris? LLVM? Custom code...
38. Abstractions for graph processing
○ Gremlin traversals are “like” monadic composition
○ Let’s make them properly monadic
○ Pure functional encapsulation of:
○ Side-effects, transactions, exception handling
○ Learn from existing functional approaches to Gremlin
○ Gremlin-Scala, Greskell, Gremlin-Haskell
40. Transforming graph data and operations
○ Need a language for schema mappings
○ In theory, that gives us:
○ Automated query rewriting
○ Automated data migration
○ Mix-and-match operations
○ Easy, right...?
41. Making a smooth transition
○ (TP3 → TP4) ≠ (TP2 → TP3)
○ Large user base, good support for TinkerPop3
○ Q: how do we:
○ Make new features useful to the current community
○ Make the migration to TinkerPop4 as seamless as possible
○ A: we try stuff out
○ “The revolution will be A/B tested”
○ Get involved!
○ gremlin-users@googlegroups.com
○ dev@tinkerpop.apache.org