1. A Distributed Database with
Explicit Semantics
and Chained RDF Graphs
GraphChain
Mirek Sopek, Przemysław Grądzki, Witold Kosowski, Dominik Kuziński, Rafał Trójczak, Robert Trypuz
3rd Workshop on Linked Data & Distributed Ledgers (LD-DL)
2. 23RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
THE TALK
AGENDA
• The Motivation
• Related Works
• GraphChain defined
• GraphChain visualized
• GraphChain challenges
• GraphChain Ontology
• The Implementations
• The future of GraphChain
GraphChain
4. 43RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
WHY
GraphChain?
• What is LEI system?
• The LEI (Legal Entity Identifier) is the most universal, global
and trustworthy identification mechanism for companies
from all countries of the world.
• It is a 20-digit, globally unique identifier for businesses
created as Legal Entities
• LEI.INFO (https://lei.info) is the LEI Linked Data Resolver.
We have been working with the LEI system,
and discovered its shortcomings …
5. 53RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
WHY
GraphChain?
• Lack of explicit data semantics
• While LEI system is distributed in nature, the technology
used to support it is of old style and inherently unsecured
• So, using LEI.INFO as the starting point, we have created a
concept of Blockchained LEI system and we have made a
number of POCs demonstrating its usefulness.
LEI system shortcomings …
6. 63RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
WHY
GraphChain?
• It was very hard to combine the three features the LEI system
required:
• Explicit data semantics
• Linked Open Data/SW data model and
• Blockchain security model
• In our Blockchain based POCs, the extensibility of Legal Entity
data was poor (unless we used simple textual serialization for
the LEI records)
• The queries across the entire LEI dataset were impossible or
very hard
What we really needed was a standard RDF Graph
database protected by a Blockchain.
However, despite our efforts …
8. 83RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
Ontologies
and LD data
models for
Blockchain
• BLONDiE - a generic Blockchain ontology developed by
Héctor Ugarte in the Semantic Blockchain project.
“BLONDiE is the most developed vocabulary for representing blockchain
concepts, with the most potential to enable reusable modelling across different
distributed ledgers in the future.” - Allan Third and John Domingue. 2017.
Linked Data Indexing of Distributed Ledgers
• EthOn is an ontology developed by Consensys
(https://consensys.net/) that is intended to be a semantical
counterpart of the Ethereum Blockchain framework.
(http://ethon.consensys.net). We noticed some problems with
its taxonomy.
• Flex (Web) Ledger is a graph data model and a protocol for
decentralized ledgers developed by Digital Bazaar. From the
data model perspective, Flex Ledger assumes the use of
generic JSON objects encapsulated in the ledger. “The ledger
data model and syntax make no assumption about which
ontology is used.”
9. 93RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
Bring
Blockchain to
legacy
databases
• BigchainDB – Blockchain database build using MongoDB
„Rather than trying to enhance blockchain technology,
BigchainDB starts with a big data distributed database and
then adds Blockchain characteristics - decentralized control,
immutability and the transfer of digital assets.”
( https://www.bigchaindb.com/features/ )
• MongoDB Blockchained – is essentially the same
development but from MongoDB perspective:
“A blockchain-enabled MongoDB that wraps the core
database (MongoDB) and implements the three blockchain
characteristics of decentralization, immutability, and assets.“
(“Building Enterprise-Grade Blockchain Databases with MongoDB”,
A MongoDB White Paper)
In both cases the essence is in the ability of the distributed
database to use legacy access methods (Querying,
Scalability, Operationalizability)
10. The birth of
GraphChain
Rather than trying to add Graphs and Ontologies to Blockchain,
GraphChain starts with RDF database and then adds
Blockchain features to the system.
11. 113RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
GraphChain
defined • An RDF graph is an unordered set of RDF triples and
a named RDF graph is an RDF graph which is assigned
a name in the form of a URI
• GraphChain is thus defined as:
• A linked chain of named RDF graphs specified by the
GraphChain ontology and an ontology for data
graph part of the GraphChain.
• A set of general mechanisms for calculating a digest
of the named RDF graphs.
• A set of network mechanisms that are responsible
for the distribution of the named RDF graphs among
the distributed peers and the for achieving the
consensus.
The main idea behind GraphChain is
to use Blockchain mechanisms
on top of an abstract RDF graph data type.
12. 123RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The GraphChain
Architecture • A single node of the GraphChain consists of several
parts:
• a web interface for communication with clients (via
the HTTP protocol),
• a web socket endpoint for communication with
others nodes,
• a cryptography module for handling of digest
calculation,
• a triple store repository manager for storing blocks
as sets of triples and obtaining blocks from the
repository,
• and services which bind all these parts together.
A single node perspective
13. 133RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The GraphChain
process flow
14. 143RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The artistic visualization of the GraphChain
Watch it at: https://youtu.be/C7_mB_myo5w
15. The implementation
challenges
The implementation of GraphChain brings a number of challenges
that must be addressed before production-grade alternative to
the existing Blockchain implementations is offered.
16. 163RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The
implementation
challenges
• Performance of the programmatic
access to RDF graphs.
• Performance and quality of the RDF
graphs serialization used for the
broadcast of the named graphs to
other nodes.
• Computation of the RDF digests.
The most important challenges
17. 173RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The computation of
the RDF digests
• Blank nodes’ identifiers are implementation
dependent, i.e. they may change while transferring
the same graph between different
implementations, triple stores or other methods of
their instantiation.
• Every RDF graph is equivalent to an unordered set
of triples so the one and the same graph, even in
the same syntax, can be serialized in many
different ways.
• RDF graph serialization can be differently encoded;
hashing functions are encoding sensitive.
The issues
18. 183RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The computation of
the RDF digests
• “Canonicalization” — calculation of the digest as
the hash of the canonical graph serialization
• “DotHash” — calculation of the digest as the result
of the combining operation on the hashes of the
individual triples
• “Interwoven DotHash” — calculation of the digest
as the result of the combining operation on the
hashes of the individual triples and the triples
linked by blank nodes.
The proposed solutions
19. 193RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
Interwoven DotHash
• If the triple does not contain a blank node, compute it as a
hash of its N3 serialized format.
• If the triple contains a blank node as its Subject, then
compute its hash as the sum of the hashes of the N3
serialized Predicate and Object and the hashes of non-blank
nodes of all those triples where the blank node appears in
the Object nodes.
• If the triple contains a blank node as its Object, then
compute its hash as the sum of the hashes of the N3
serialized Subject and Predicate and the hashes of non-blank
nodes of all those triples where the blank node appears in
the Subject nodes.
• If the triple contains blank nodes in both Subject & Object
nodes, use the above rules twice, once for Subject, then for
Objects.
The algorithm
21. 213RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The GraphChain
Ontology
• The GraphChain ontology resembles
sequentially ordered and cryptographically
secured chain structure presented in the original
paper by Nakamoto.
• From the ontological perspective GraphChain’s
block (unit) is a reified 7-ary relation
• In the reification pattern each n-ary relation
between resources is represented as a separate
OWL class and each instance of the relation (an
n-tuple) as an instance of the class plus n
additional binary relations providing links to
each argument of the n-tuple.
An OWL ontology of chained
named RDF graphs
22. 223RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The GraphChain
Ontology
• The GraphChain ontology is a light-weight OWL
ontology:
• it consists of 2 classes, 1 object property and
6 data properties.
• It has ALQ(D) expressivity and 11 restrictions
on classes.
• It’s main purpose is to describe GraphChain
constructs
• GraphChain sets no restriction on the
ontological description of the data graphs
http://ontologies.makolab.com/bc
24. 243RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
Simple
implementations
of the GraphChain
• .NET/C#
The C# implementation uses .NET Core platform. The node itself is an
ASP.NET Core web application with REST API for a client - node
communication and WebSocket layer for Peer to Peer transmission
• Java
The Java implementation was created as a Spring-managed web
application. It uses the RDF4J library for handling semantic-related
operations. Can store RDF graphs in both the RDF4J triple store and the
AllegroGraph triple store. Adding a new storage method is easy.
• JavaScript/node.js
We have also developed a third, illustrative and simple implementation of
the node: a JavaScript implementation which is based on Naivechain. It
offers HTTP API and P2P communication between nodes. There are some
differences between our implementation and Naivechain, though.
The simple implementations
25. 253RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
Web interface for
the GraphChain
• Uses YasUI client (http://about.ayasgui.org)
• Allows for SPARQL querying both GraphChain ledger and
data graphs
• Connects to multiple nodes of the GraphChain
http://binsem.makolab.pl/gcgui/
26. 263RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
Production grade
implementation of
the GraphChain
• Hyperledger Indy is a distributed ledger, purpose-
built for decentralized identity.
• Used by Sovrin foundation (https://sovrin.org/) for
digital identity.
• Ideal solution for our LEI applications
• „Observer” nodes in the second half of 2018
• We are analyzing the results of our POC on the use
of the GraphChain and Hyperledger Indy and are
working on the production system.
Using Hyperledger Indy
28. 283RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
The plans for the
project
development
• Finalization of Hyperledger Indy implementation
• In parallel: selection of the alternative triple store
implementations
• Testing other graph “models” and engines (like
neo4j) and other “non-SQL” database models
• Creating a solution for the global LEI system.
• Considering using QRL code (Quantum Resistent
Ledger) as part of the GraphChain.
What’s next ?
30. 303RD WORKSHOP ON LINKED DATA & DISTRIBUTED LEDGERS (LD-DL), Lyon, April 24, 2018
PLEASE CONTACT US!
DR. MIREK SOPEK
CTO & founder
sopek@makolab.com
Poland:
MakoLab SA,
Demokratyczna 46,
93430 Lodz, Poland
Phone: +48 600 814 537,
www.makolab.com
USA:
2153 S.E. Hawthorne Road,
Suite 205,
Gainesville, Florida 32641
Phone: +1 551 226 5488
us.makolab.com
Dr ROBERT TRYPUZ
MakoLab SA (DC Lublin)
KUL University
robert.trypuz@makolab.com
PRZEMYSŁAW GRĄDZKI
MakoLab SA (DC Lublin)
KUL University
przemyslaw.gradzki@makolab.com
WITOLD KOSOWSKI
MakoLab SA
witold.kosowski@makolab.com
DOMINIK KUZIŃSKI
MakoLab SA
dominik.kuzinski@makolab.com
RAFAŁ TRÓJCZAK
MakoLab SA (DC Lublin)
KUL University
Rafal.trojczak@makolab.com