SlideShare a Scribd company logo
1 of 45
Download to read offline
PARALLEL AND INCREMENTAL MATERIALISATION OF
RDF/DATALOG IN RDFOX
Boris Motik
University of Oxford
March 20, 2015
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
Introduction
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
Introduction
RDFOX SUMMARY
RDFox: a new RDF store and reasoner
http://www.cs.ox.ac.uk/isg/tools/RDFox/
Features:
RAM-based storage of RDF data
Currently centralised, but a distributed system is in the works
Datalog reasoning via materialisation
Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL
Very effective parallelisation
Efficient reasoning with owl:sameAs via rewriting
Known and widely-used technique, but correctness not trivial
The Backward-Forward (B/F) incremental maintenance algorithm
Considerably improves on DRed
Compatible with rewriting of owl:sameAs
SPARQL query answering
Most of SPARQL 1.0 and some of SPARQL 1.1
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
Parallel Materialisation of Datalog
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
Parallel Materialisation of Datalog
MAIN CHALLENGES
1 Assign workload to threads evenly
Rules are generally not independent due to recursion
Static assignment of rule instances can be affected due to data skew
⇒ Dynamic assignment with low overhead needed
2 Efficiently interleave . . .
. . . querying (during evaluation of rule bodies)
. . . updates (during updates of derived facts)
3 Provide indexes for efficient rule body evaluation
Crucial for elimination of duplicate triples ⇒ ensures termination
Usually sorted (and clustered) to allow for merge joins
Hash indexes can also be used
Individual (i.e., not bulk) index updates are inefficient
B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in
Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
⇒ R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
⇒ R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
⇒ R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
⇒ R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
⇒ A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(a,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
⇒ R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
⇒ R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
⇒ A(b)
A(c)
A(d)
A(e)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(b,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
⇒ A(c)
A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(c,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
⇒ A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(d,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
⇒ A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(e,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
⇒ A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(f,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
A(f)
⇒ A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(g,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
Parallel Materialisation of Datalog
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of facts
ensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
Parallel Materialisation of Datalog
SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY
The critically algorithm depends on:
matching atoms t1, t2, t3 with ti a constant or variable
continuous concurrent updates
Our RDF storage data structure:
Hash-based indexes ⇒ naturally parallel data structure
‘Mostly’ lock-free: at least one thread makes progress at most of the time
compare: if a thread acquire a lock and dies, other threads are blocked
main benefit: performance is less susceptible to scheduling decisions
Main technical challenge: reduce thread interference
When A writes to a location cached by B, the cache of B is invalidated
Our updates ensure that threads (typically) write to different locations
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
Parallel Materialisation of Datalog
EVALUATION I: PARALLELISATION SPEEDUP
RDFox: an RDF store developed at Oxford University
http://www.cs.ox.ac.uk/isg/tools/RDFox/
8 16 24 32
2
4
6
8
10
12
14
16
18
20
ClarosL
ClarosLE
DBpediaL
DBpediaLE
LUBML 01K
LUBMU 01K
8 16 24 32
2
4
6
8
10
12
14
16
18
20
UOBML 01K
UOBMU 010
LUBMLE 01K
LUBML 05K
LUBMLE 05K
LUBMU 05K
Small concurrency overhead; parallelisation pays off already with two threads
Speedup continues to increase after we exhaust all physical cores
⇒ hyperthreading and parallelism can compensate CPU cache misses
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
Parallel Materialisation of Datalog
EVALUATION II: ORACLE’S SPARC T5
Machine specification:
4 TB of RAM
128 physical cores that support 1024 virtual cores via hyperthreading
Name Triples Time (s) Speedup Inference
Initial Resulting Threads rate
1 1024 (triples/s)
ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M
LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M
LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M
LUBML-140k: 8 G triples, materialised to 10.9 G triples
20 threads: 2000 s, inference rate 1.45 M triples/s
128 threads: 599 s, inference rate 4.84 M triples/s
Materialised dataset used about half of RAM (2 TB)
Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
Handling owl:sameAs via Rewriting
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
Handling owl:sameAs via Rewriting
HANDLING owl:sameAs VIA REWRITING
Rewriting: replace all equal constants with one representative
Well known approach; used in graphDB, Oracle, WebPIE, . . .
Much more efficient than direct materialisation
Open question I: Effective parallelisation
Lock-free maintenance of representatives
Care needed to ensure correctness and nonrepetition of derivations
Open question II: Query evaluation
Could expand rewritten data before query evaluation, but that is inefficient
Better: evaluate queries on rewritten data and expand the answer
Such a straightforward approach is incorrect:
Result cardinalities might be wrong
The presence of FILTER and BIND can make results incorrect
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
Handling owl:sameAs via Rewriting
EVALUATION OF REWRITING
UOBM 279 4 2.2M
AX 36M 1.2 332M 16,152M
REW 9.4M 9.7M 0.4 33.8M 4,256M 686
factor 3.2x 3.2x 9.9x 3.8x
Table 3: Materialisation Times with Axiomatisation and Rewriting
Test Claros DBpedia OpenCyc
Threads AX REW AX
REW
AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd sec spd sec spd
1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5
2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9
4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1
8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2
12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5
16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9
Test UniProt UOBM
Threads AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd
1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3
2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5
4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6
8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5
12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3
16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3
mode takes less than ten seconds, these results are difficult
o measure and are susceptible to skew.
Our results confirm that rewriting can significantly reduce
materialisation times. RDFox was consistently faster in the
REW mode than in the AX mode even on UniProt, where the
eduction in the number of triples is negligible. This is due to
he reduction in the number of derivations, mainly involving
ules (⇡1)–(⇡5), which is still significant on UniProt. In all
cases, the speedup of rewriting is typically much larger than
connected by the :hasSameHomeTownWith property. Thi
property is also symmetric and transitive so, for each pai
of connected resources, the number of times each triple i
derived by the transitivity rule is quadratic in the number o
connected resources. This leads to a large number of dupli
cate derivations that do not involve equality. Thus, althoug
it is helpful, rewriting does not reduce the number of deriva
tion in the same way as, for example, on Claros, which ex
plains the relatively modest speedup of REW over AX.
Speedup is bigger than the reduction in the number of triples
The number of derivations is the determining factor
Contrary to popular belief!
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
Incremental Materialisation Maintenance
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
2 Rederive facts that have an alternative derivation
C0(x) ← C0(x)D ∧ A(x)
C0(x) ← C0(x)D ∧ B(x)
Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n
C0(x) ← C0(x)D ∧ Cn(x)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ?
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a) ?
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
7 Stop propagation and terminate
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
Incremental Materialisation Maintenance
EVALUATION OF THE B/F ALGORITHM
Table 2: Experimental results
Dataset |E | |I  I0
|
Rematerialise DRed B/F
Time Derivations Time Derivations Time Derivations
(s) Fwd (s) |D| DR2 DR4 DR5 (s) |C| Bwd Sat Del Prop
LUBM-1k-L
100 113 139.4 212.5M 0.0 1.0k 1.1k 0.8k 1.0k 0.0 0.5k 0.2k 0.3k 0.2k
|E| = 133.6M 5.0k 5.5k 101.8 212.5M 0.2 55.5k 67.2k 46.9k 59.8k 0.2 23.0k 9.3k 13.7k 7.4k
|I| = 182.4M 2.5M 2.7M 138.5 208.8M 39.4 10.3M 15.2M 6.6M 11.5M 32.8 10.0M 4.1M 5.6M 3.7M
Mt = 121.5s 5.0M 5.5M 91.8 205.0M 54.8 17.8M 26.3M 10.5M 18.9M 62.3 18.8M 7.8M 10.1M 7.5M
Md = 212.5M 7.5M 8.3M 89.2 201.3M 71.5 24.3M 35.5M 13.6M 24.3M 85.4 26.7M 11.0M 14.0M 11.2M
10.0M 11.0M 99.5 197.5M 127.9 30.0M 43.1M 15.9M 28.1M 102.2 34.1M 14.0M 17.4M 15.0M
UOBM-1k-Uo
100 160 3482.0 3.6G 8797.6 1.8G 2.6G 53.2M 2.6G 5.4 0.8k 0.5k 1.3k 0.5k
|E| = 254.8M 5.0k 85.2k 3417.8 3.6G 9539.3 1.8G 2.6G 53.2M 2.6G 28.2 105.9k 17.9k 42.1k 104.1k
|I| = 2.2G 17.0M 130.9M 3903.1 3.4G 8934.3 1.8G 2.7G 63.7M 2.5G 988.8 175.8M 47.6M 104.0M 196.7M
Mt = 5034.0s 34.0M 269.0M 4084.1 3.2G 9492.5 1.9G 2.8G 68.4M 2.4G 1877.2 340.7M 87.5M 182.3M 401.1M
Md = 3.6G 51.0M 422.8M 4010.0 3.0G 10659.3 1.9G 2.9G 71.5M 2.2G 2772.7 513.7M 125.2M 246.8M 622.0M
68.0M 581.4M 3981.9 2.8G 11351.6 1.9G 2.9G 73.3M 2.1G 3737.3 687.0M 162.5M 289.5M 848.6M
Claros-L
100 212 62.9 128.6M 0.0 0.8k 1.0k 0.2k 0.5k 0.0 0.6k 0.3k 0.7k 0.5k
|E| = 18.8M 5.0k 11.3k 62.8 128.6M 0.4 37.8k 50.7k 10.9k 23.9k 0.4 29.1k 18.8k 35.3k 26.8k
|I| = 74.2M 0.6M 1.3M 62.3 125.6M 32.3 4.1M 5.5M 1.1M 2.5M 14.9 3.1M 2.0M 3.6M 3.0M
Mt = 78.9s 1.2M 2.6M 61.2 122.6M 53.2 7.8M 10.8M 2.0M 4.8M 33.6 6.1M 3.8M 6.7M 6.0M
Md = 128.6M 1.7M 4.0M 60.5 119.5M 73.6 11.4M 15.9M 2.8M 6.8M 47.8 8.9M 5.6M 9.5M 9.1M
2.3M 5.5M 60.0 116.3M 91.0 14.8M 20.9M 3.6M 8.6M 60.6 11.7M 7.3M 12.0M 12.3M
Claros-LE
100 0.5k 3992.8 12.6G 0.0 1.3k 2.0k 0.3k 0.9k 0.0 1.0k 0.7k 1.0k 1.1k
|E| = 18.8M 2.5k 178.9k 5235.1 12.6G 8077.4 5.5M 11.7G 176.6k 11.7G 10.3 216.4k 161.2k 8.8M 320.0k
|I| = 533.7M 5.0k 427.5k 4985.1 12.6G 7628.2 6.0M 11.7G 186.0k 11.7G 16.5 485.6k 369.0k 8.9M 769.3k
Mt = 4024.5s 7.5k 609.6k 4855.0 12.6G 7419.1 6.5M 11.7G 193.9k 11.7G 19.5 683.4k 516.8k 9.0M 1.1M
Md = 12.9G 10.0k 780.8k 5621.3 12.6G 7557.9 6.8M 11.7G 207.6k 11.7G 3907.2 6.0M 723.0M 11.7G 16.9M
Test Datasets Table 2 summarises the properties of the
four datasets we used in our tests.
LUBM (Guo, Pan, and Heflin 2005) is a well-known RDF
benchmark. We extracted the datalog fragment of the LUBM
a congruence relation: derivations with owl:sameAs tend to
proliferate so an efficient incremental algorithm would have
to treat this property directly; we leave developing such an
extension to our future work. Please note that omitting the
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
Conclusion
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
Conclusion
RESEARCH DIRECTIONS
Add a data/query/reasoning distribution layer:
Initial results very promising
Implementation in progress
Future work:
Investigate potential for data compression
Improve join cardinality estimation
Improve query planning
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13

More Related Content

What's hot

RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebDaniele Dell'Aglio
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Makoto Yui
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox PosterDBOnto
 
Analysis of Air Pollution in Nova Scotia Presentation
Analysis of Air Pollution in Nova Scotia PresentationAnalysis of Air Pollution in Nova Scotia Presentation
Analysis of Air Pollution in Nova Scotia PresentationCarlo Carandang
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudSujit Pal
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Querying Linked Geospatial Data with Incomplete Information
Querying Linked Geospatial Data with  Incomplete InformationQuerying Linked Geospatial Data with  Incomplete Information
Querying Linked Geospatial Data with Incomplete InformationCharalampos (Babis) Nikolaou
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013Juan Sequeda
 
RSP-QL*: Querying Data-Level Annotations in RDF Streams
RSP-QL*: Querying Data-Level Annotations in RDF StreamsRSP-QL*: Querying Data-Level Annotations in RDF Streams
RSP-QL*: Querying Data-Level Annotations in RDF Streamskeski
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingJean-Paul Calbimonte
 
DGraph: Introduction To Basics & Quick Start W/Ratel
DGraph: Introduction To Basics & Quick Start W/RatelDGraph: Introduction To Basics & Quick Start W/Ratel
DGraph: Introduction To Basics & Quick Start W/RatelKnoldus Inc.
 
LarKC Tutorial at ISWC 2009 - Data Model
LarKC Tutorial at ISWC 2009 - Data ModelLarKC Tutorial at ISWC 2009 - Data Model
LarKC Tutorial at ISWC 2009 - Data ModelLarKC
 

What's hot (16)

RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Triplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the WebTriplewave: a step towards RDF Stream Processing on the Web
Triplewave: a step towards RDF Stream Processing on the Web
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox Poster
 
Analysis of Air Pollution in Nova Scotia Presentation
Analysis of Air Pollution in Nova Scotia PresentationAnalysis of Air Pollution in Nova Scotia Presentation
Analysis of Air Pollution in Nova Scotia Presentation
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Querying Linked Geospatial Data with Incomplete Information
Querying Linked Geospatial Data with  Incomplete InformationQuerying Linked Geospatial Data with  Incomplete Information
Querying Linked Geospatial Data with Incomplete Information
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
RSP-QL*: Querying Data-Level Annotations in RDF Streams
RSP-QL*: Querying Data-Level Annotations in RDF StreamsRSP-QL*: Querying Data-Level Annotations in RDF Streams
RSP-QL*: Querying Data-Level Annotations in RDF Streams
 
Querying Incomplete Geospatial Information in RDF
Querying Incomplete Geospatial Information in RDFQuerying Incomplete Geospatial Information in RDF
Querying Incomplete Geospatial Information in RDF
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream Processing
 
DGraph: Introduction To Basics & Quick Start W/Ratel
DGraph: Introduction To Basics & Quick Start W/RatelDGraph: Introduction To Basics & Quick Start W/Ratel
DGraph: Introduction To Basics & Quick Start W/Ratel
 
LarKC Tutorial at ISWC 2009 - Data Model
LarKC Tutorial at ISWC 2009 - Data ModelLarKC Tutorial at ISWC 2009 - Data Model
LarKC Tutorial at ISWC 2009 - Data Model
 

Similar to RDFox: Parallel and Incremental Materialisation of RDF and Datalog

Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-NormalisationAjit Nayak
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012taxonbytes
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidFederico Campoli
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQLOlaf Hartig
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataRoberto García
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationVladimir Alexiev, PhD, PMP
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD VivaAidan Hogan
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013Luis Daniel Ibáñez
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 

Similar to RDFox: Parallel and Incremental Materialisation of RDF and Datalog (20)

Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
Introduction to database-Normalisation
Introduction to database-NormalisationIntroduction to database-Normalisation
Introduction to database-Normalisation
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Compact Representation of Large RDF Data Sets for Publishing and Exchange
Compact Representation of Large RDF Data Sets for Publishing and ExchangeCompact Representation of Large RDF Data Sets for Publishing and Exchange
Compact Representation of Large RDF Data Sets for Publishing and Exchange
 
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) Acid
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 
A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
The Semantics of SPARQL
The Semantics of SPARQLThe Semantics of SPARQL
The Semantics of SPARQL
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic RepresentationGetty Vocabulary Program LOD: Ontologies and Semantic Representation
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 

More from LDBC council

8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...LDBC council
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...LDBC council
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics EngineLDBC council
 
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...LDBC council
 
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network BenchmarkLDBC council
 
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...LDBC council
 
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...LDBC council
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...LDBC council
 
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...LDBC council
 
8th TUC Meeting -
8th TUC Meeting - 8th TUC Meeting -
8th TUC Meeting - LDBC council
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...LDBC council
 
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...LDBC council
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...LDBC council
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
LDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC council
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionLDBC council
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...LDBC council
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC council
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadLDBC council
 

More from LDBC council (20)

8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
 
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
 
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
8th TUC Meeting – Marcus Paradies (SAP) Social Network Benchmark
 
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
8th TUC Meeting - Sergey Edunov (Facebook). Generating realistic trillion-edg...
 
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
Weining Qian (ECNU). On Statistical Characteristics of Real-Life Knowledge Gr...
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
 
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
8th TUC Meeting | Lijun Chang (University of New South Wales). Efficient Subg...
 
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
8th TUC Meeting - Eugene I. Chong (Oracle USA). Balancing Act to improve RDF ...
 
8th TUC Meeting -
8th TUC Meeting - 8th TUC Meeting -
8th TUC Meeting -
 
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
8th TUC Meeting - David Meibusch, Nathan Hawes (Oracle Labs Australia). Frapp...
 
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
8th TUC Meeting - Martin Zand University of Rochester Clinical and Translatio...
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
LDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusionsLDBC 6th TUC Meeting conclusions
LDBC 6th TUC Meeting conclusions
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

RDFox: Parallel and Incremental Materialisation of RDF and Datalog

  • 1. PARALLEL AND INCREMENTAL MATERIALISATION OF RDF/DATALOG IN RDFOX Boris Motik University of Oxford March 20, 2015
  • 2. TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
  • 3. Introduction TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
  • 4. Introduction RDFOX SUMMARY RDFox: a new RDF store and reasoner http://www.cs.ox.ac.uk/isg/tools/RDFox/ Features: RAM-based storage of RDF data Currently centralised, but a distributed system is in the works Datalog reasoning via materialisation Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL Very effective parallelisation Efficient reasoning with owl:sameAs via rewriting Known and widely-used technique, but correctness not trivial The Backward-Forward (B/F) incremental maintenance algorithm Considerably improves on DRed Compatible with rewriting of owl:sameAs SPARQL query answering Most of SPARQL 1.0 and some of SPARQL 1.1 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
  • 5. Parallel Materialisation of Datalog TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
  • 6. Parallel Materialisation of Datalog MAIN CHALLENGES 1 Assign workload to threads evenly Rules are generally not independent due to recursion Static assignment of rule instances can be affected due to data skew ⇒ Dynamic assignment with low overhead needed 2 Efficiently interleave . . . . . . querying (during evaluation of rule bodies) . . . updates (during updates of derived facts) 3 Provide indexes for efficient rule body evaluation Crucial for elimination of duplicate triples ⇒ ensures termination Usually sorted (and clustered) to allow for merge joins Hash indexes can also be used Individual (i.e., not bulk) index updates are inefficient B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
  • 7. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 8. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM ⇒ R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 9. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) ⇒ R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 10. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) ⇒ R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(b) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 11. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) ⇒ R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(b) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 12. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) ⇒ A(a) R(c,f) R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(a,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 13. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) ⇒ R(c,f) R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(c) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 14. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) ⇒ R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(c) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 15. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) ⇒ A(b) A(c) A(d) A(e) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(b,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 16. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) ⇒ A(c) A(d) A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(c,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 17. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) ⇒ A(d) A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(d,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 18. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) ⇒ A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(e,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 19. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) A(e) ⇒ A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(f,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 20. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) A(e) A(f) ⇒ A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(g,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  • 21. Parallel Materialisation of Datalog PARALLELISING COMPUTATION Each thread extracts facts and evaluates subqueries independently The number of subqueries is determined by the number of facts ensures in practice that threads are equally loaded Requires no thread synchronisation ⇒ We partition rule instances dynamically and with little overhead Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
  • 22. Parallel Materialisation of Datalog SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY The critically algorithm depends on: matching atoms t1, t2, t3 with ti a constant or variable continuous concurrent updates Our RDF storage data structure: Hash-based indexes ⇒ naturally parallel data structure ‘Mostly’ lock-free: at least one thread makes progress at most of the time compare: if a thread acquire a lock and dies, other threads are blocked main benefit: performance is less susceptible to scheduling decisions Main technical challenge: reduce thread interference When A writes to a location cached by B, the cache of B is invalidated Our updates ensure that threads (typically) write to different locations Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
  • 23. Parallel Materialisation of Datalog EVALUATION I: PARALLELISATION SPEEDUP RDFox: an RDF store developed at Oxford University http://www.cs.ox.ac.uk/isg/tools/RDFox/ 8 16 24 32 2 4 6 8 10 12 14 16 18 20 ClarosL ClarosLE DBpediaL DBpediaLE LUBML 01K LUBMU 01K 8 16 24 32 2 4 6 8 10 12 14 16 18 20 UOBML 01K UOBMU 010 LUBMLE 01K LUBML 05K LUBMLE 05K LUBMU 05K Small concurrency overhead; parallelisation pays off already with two threads Speedup continues to increase after we exhaust all physical cores ⇒ hyperthreading and parallelism can compensate CPU cache misses Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
  • 24. Parallel Materialisation of Datalog EVALUATION II: ORACLE’S SPARC T5 Machine specification: 4 TB of RAM 128 physical cores that support 1024 virtual cores via hyperthreading Name Triples Time (s) Speedup Inference Initial Resulting Threads rate 1 1024 (triples/s) ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M LUBML-140k: 8 G triples, materialised to 10.9 G triples 20 threads: 2000 s, inference rate 1.45 M triples/s 128 threads: 599 s, inference rate 4.84 M triples/s Materialised dataset used about half of RAM (2 TB) Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
  • 25. Handling owl:sameAs via Rewriting TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
  • 26. Handling owl:sameAs via Rewriting HANDLING owl:sameAs VIA REWRITING Rewriting: replace all equal constants with one representative Well known approach; used in graphDB, Oracle, WebPIE, . . . Much more efficient than direct materialisation Open question I: Effective parallelisation Lock-free maintenance of representatives Care needed to ensure correctness and nonrepetition of derivations Open question II: Query evaluation Could expand rewritten data before query evaluation, but that is inefficient Better: evaluate queries on rewritten data and expand the answer Such a straightforward approach is incorrect: Result cardinalities might be wrong The presence of FILTER and BIND can make results incorrect B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
  • 27. Handling owl:sameAs via Rewriting EVALUATION OF REWRITING UOBM 279 4 2.2M AX 36M 1.2 332M 16,152M REW 9.4M 9.7M 0.4 33.8M 4,256M 686 factor 3.2x 3.2x 9.9x 3.8x Table 3: Materialisation Times with Axiomatisation and Rewriting Test Claros DBpedia OpenCyc Threads AX REW AX REW AX REW AX REW AX REW AX REW sec spd sec spd sec spd sec spd sec spd sec spd 1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5 2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9 4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1 8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2 12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5 16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9 Test UniProt UOBM Threads AX REW AX REW AX REW AX REW sec spd sec spd sec spd sec spd 1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3 2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5 4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6 8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5 12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3 16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3 mode takes less than ten seconds, these results are difficult o measure and are susceptible to skew. Our results confirm that rewriting can significantly reduce materialisation times. RDFox was consistently faster in the REW mode than in the AX mode even on UniProt, where the eduction in the number of triples is negligible. This is due to he reduction in the number of derivations, mainly involving ules (⇡1)–(⇡5), which is still significant on UniProt. In all cases, the speedup of rewriting is typically much larger than connected by the :hasSameHomeTownWith property. Thi property is also symmetric and transitive so, for each pai of connected resources, the number of times each triple i derived by the transitivity rule is quadratic in the number o connected resources. This leads to a large number of dupli cate derivations that do not involve equality. Thus, althoug it is helpful, rewriting does not reduce the number of deriva tion in the same way as, for example, on Claros, which ex plains the relatively modest speedup of REW over AX. Speedup is bigger than the reduction in the number of triples The number of derivations is the determining factor Contrary to popular belief! Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
  • 28. Incremental Materialisation Maintenance TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
  • 29. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 30. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 31. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 32. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: 1 Delete all facts with a derivation from A(a) C0(x)D ← A(x)D C0(x)D ← B(x)D Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n C0(x)D ← Cn(x)D Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 33. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: 1 Delete all facts with a derivation from A(a) C0(x)D ← A(x)D C0(x)D ← B(x)D Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n C0(x)D ← Cn(x)D 2 Rederive facts that have an alternative derivation C0(x) ← C0(x)D ∧ A(x) C0(x) ← C0(x)D ∧ B(x) Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n C0(x) ← C0(x)D ∧ Cn(x) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  • 34. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) B(a) C0(a) C1(a) . . . Cn(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 35. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 36. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) ? B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 37. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 38. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 39. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) ? C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 40. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 41. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable 6 So C0(a) is derivable too B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 42. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable 6 So C0(a) is derivable too 7 Stop propagation and terminate B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  • 43. Incremental Materialisation Maintenance EVALUATION OF THE B/F ALGORITHM Table 2: Experimental results Dataset |E | |I I0 | Rematerialise DRed B/F Time Derivations Time Derivations Time Derivations (s) Fwd (s) |D| DR2 DR4 DR5 (s) |C| Bwd Sat Del Prop LUBM-1k-L 100 113 139.4 212.5M 0.0 1.0k 1.1k 0.8k 1.0k 0.0 0.5k 0.2k 0.3k 0.2k |E| = 133.6M 5.0k 5.5k 101.8 212.5M 0.2 55.5k 67.2k 46.9k 59.8k 0.2 23.0k 9.3k 13.7k 7.4k |I| = 182.4M 2.5M 2.7M 138.5 208.8M 39.4 10.3M 15.2M 6.6M 11.5M 32.8 10.0M 4.1M 5.6M 3.7M Mt = 121.5s 5.0M 5.5M 91.8 205.0M 54.8 17.8M 26.3M 10.5M 18.9M 62.3 18.8M 7.8M 10.1M 7.5M Md = 212.5M 7.5M 8.3M 89.2 201.3M 71.5 24.3M 35.5M 13.6M 24.3M 85.4 26.7M 11.0M 14.0M 11.2M 10.0M 11.0M 99.5 197.5M 127.9 30.0M 43.1M 15.9M 28.1M 102.2 34.1M 14.0M 17.4M 15.0M UOBM-1k-Uo 100 160 3482.0 3.6G 8797.6 1.8G 2.6G 53.2M 2.6G 5.4 0.8k 0.5k 1.3k 0.5k |E| = 254.8M 5.0k 85.2k 3417.8 3.6G 9539.3 1.8G 2.6G 53.2M 2.6G 28.2 105.9k 17.9k 42.1k 104.1k |I| = 2.2G 17.0M 130.9M 3903.1 3.4G 8934.3 1.8G 2.7G 63.7M 2.5G 988.8 175.8M 47.6M 104.0M 196.7M Mt = 5034.0s 34.0M 269.0M 4084.1 3.2G 9492.5 1.9G 2.8G 68.4M 2.4G 1877.2 340.7M 87.5M 182.3M 401.1M Md = 3.6G 51.0M 422.8M 4010.0 3.0G 10659.3 1.9G 2.9G 71.5M 2.2G 2772.7 513.7M 125.2M 246.8M 622.0M 68.0M 581.4M 3981.9 2.8G 11351.6 1.9G 2.9G 73.3M 2.1G 3737.3 687.0M 162.5M 289.5M 848.6M Claros-L 100 212 62.9 128.6M 0.0 0.8k 1.0k 0.2k 0.5k 0.0 0.6k 0.3k 0.7k 0.5k |E| = 18.8M 5.0k 11.3k 62.8 128.6M 0.4 37.8k 50.7k 10.9k 23.9k 0.4 29.1k 18.8k 35.3k 26.8k |I| = 74.2M 0.6M 1.3M 62.3 125.6M 32.3 4.1M 5.5M 1.1M 2.5M 14.9 3.1M 2.0M 3.6M 3.0M Mt = 78.9s 1.2M 2.6M 61.2 122.6M 53.2 7.8M 10.8M 2.0M 4.8M 33.6 6.1M 3.8M 6.7M 6.0M Md = 128.6M 1.7M 4.0M 60.5 119.5M 73.6 11.4M 15.9M 2.8M 6.8M 47.8 8.9M 5.6M 9.5M 9.1M 2.3M 5.5M 60.0 116.3M 91.0 14.8M 20.9M 3.6M 8.6M 60.6 11.7M 7.3M 12.0M 12.3M Claros-LE 100 0.5k 3992.8 12.6G 0.0 1.3k 2.0k 0.3k 0.9k 0.0 1.0k 0.7k 1.0k 1.1k |E| = 18.8M 2.5k 178.9k 5235.1 12.6G 8077.4 5.5M 11.7G 176.6k 11.7G 10.3 216.4k 161.2k 8.8M 320.0k |I| = 533.7M 5.0k 427.5k 4985.1 12.6G 7628.2 6.0M 11.7G 186.0k 11.7G 16.5 485.6k 369.0k 8.9M 769.3k Mt = 4024.5s 7.5k 609.6k 4855.0 12.6G 7419.1 6.5M 11.7G 193.9k 11.7G 19.5 683.4k 516.8k 9.0M 1.1M Md = 12.9G 10.0k 780.8k 5621.3 12.6G 7557.9 6.8M 11.7G 207.6k 11.7G 3907.2 6.0M 723.0M 11.7G 16.9M Test Datasets Table 2 summarises the properties of the four datasets we used in our tests. LUBM (Guo, Pan, and Heflin 2005) is a well-known RDF benchmark. We extracted the datalog fragment of the LUBM a congruence relation: derivations with owl:sameAs tend to proliferate so an efficient incremental algorithm would have to treat this property directly; we leave developing such an extension to our future work. Please note that omitting the Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
  • 44. Conclusion TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
  • 45. Conclusion RESEARCH DIRECTIONS Add a data/query/reasoning distribution layer: Initial results very promising Implementation in progress Future work: Investigate potential for data compression Improve join cardinality estimation Improve query planning Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13