Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
RDFox: Parallel and Incremental Materialisation of RDF and Datalog
1. PARALLEL AND INCREMENTAL MATERIALISATION OF
RDF/DATALOG IN RDFOX
Boris Motik
University of Oxford
March 20, 2015
2. TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
3. Introduction
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
4. Introduction
RDFOX SUMMARY
RDFox: a new RDF store and reasoner
http://www.cs.ox.ac.uk/isg/tools/RDFox/
Features:
RAM-based storage of RDF data
Currently centralised, but a distributed system is in the works
Datalog reasoning via materialisation
Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL
Very effective parallelisation
Efficient reasoning with owl:sameAs via rewriting
Known and widely-used technique, but correctness not trivial
The Backward-Forward (B/F) incremental maintenance algorithm
Considerably improves on DRed
Compatible with rewriting of owl:sameAs
SPARQL query answering
Most of SPARQL 1.0 and some of SPARQL 1.1
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
5. Parallel Materialisation of Datalog
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
6. Parallel Materialisation of Datalog
MAIN CHALLENGES
1 Assign workload to threads evenly
Rules are generally not independent due to recursion
Static assignment of rule instances can be affected due to data skew
⇒ Dynamic assignment with low overhead needed
2 Efficiently interleave . . .
. . . querying (during evaluation of rule bodies)
. . . updates (during updates of derived facts)
3 Provide indexes for efficient rule body evaluation
Crucial for elimination of duplicate triples ⇒ ensures termination
Usually sorted (and clustered) to allow for merge joins
Hash indexes can also be used
Individual (i.e., not bulk) index updates are inefficient
B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in
Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
7. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
8. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
⇒ R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
9. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
⇒ R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
10. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
⇒ R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
11. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
⇒ R(b,e)
A(a)
R(c,f)
R(c,g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(b)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
12. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
⇒ A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(a,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
13. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
⇒ R(c,f)
R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
14. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
⇒ R(c,g)
A(b)
A(c)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: A(c)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
15. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
⇒ A(b)
A(c)
A(d)
A(e)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(b,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
16. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
⇒ A(c)
A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(c,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
17. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
⇒ A(d)
A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(d,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
18. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
⇒ A(e)
A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(e,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
19. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
⇒ A(f)
A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(f,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
20. Parallel Materialisation of Datalog
SOLUTION PART I: ALGORITHM
R(a,b)
R(a,c)
R(b,d)
R(b,e)
A(a)
R(c,f)
R(c,g)
A(b)
A(c)
A(d)
A(e)
A(f)
⇒ A(g)
A(x) ∧ R(x, y) → A(y)
For each fact:
match the fact to all body atoms to obtain subqueries
evaluate subqueries w.r.t. all previous facts
add results to the table
Current subquery: R(g,y)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
21. Parallel Materialisation of Datalog
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of facts
ensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
22. Parallel Materialisation of Datalog
SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY
The critically algorithm depends on:
matching atoms t1, t2, t3 with ti a constant or variable
continuous concurrent updates
Our RDF storage data structure:
Hash-based indexes ⇒ naturally parallel data structure
‘Mostly’ lock-free: at least one thread makes progress at most of the time
compare: if a thread acquire a lock and dies, other threads are blocked
main benefit: performance is less susceptible to scheduling decisions
Main technical challenge: reduce thread interference
When A writes to a location cached by B, the cache of B is invalidated
Our updates ensure that threads (typically) write to different locations
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
23. Parallel Materialisation of Datalog
EVALUATION I: PARALLELISATION SPEEDUP
RDFox: an RDF store developed at Oxford University
http://www.cs.ox.ac.uk/isg/tools/RDFox/
8 16 24 32
2
4
6
8
10
12
14
16
18
20
ClarosL
ClarosLE
DBpediaL
DBpediaLE
LUBML 01K
LUBMU 01K
8 16 24 32
2
4
6
8
10
12
14
16
18
20
UOBML 01K
UOBMU 010
LUBMLE 01K
LUBML 05K
LUBMLE 05K
LUBMU 05K
Small concurrency overhead; parallelisation pays off already with two threads
Speedup continues to increase after we exhaust all physical cores
⇒ hyperthreading and parallelism can compensate CPU cache misses
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
24. Parallel Materialisation of Datalog
EVALUATION II: ORACLE’S SPARC T5
Machine specification:
4 TB of RAM
128 physical cores that support 1024 virtual cores via hyperthreading
Name Triples Time (s) Speedup Inference
Initial Resulting Threads rate
1 1024 (triples/s)
ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M
LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M
LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M
LUBML-140k: 8 G triples, materialised to 10.9 G triples
20 threads: 2000 s, inference rate 1.45 M triples/s
128 threads: 599 s, inference rate 4.84 M triples/s
Materialised dataset used about half of RAM (2 TB)
Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
25. Handling owl:sameAs via Rewriting
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
26. Handling owl:sameAs via Rewriting
HANDLING owl:sameAs VIA REWRITING
Rewriting: replace all equal constants with one representative
Well known approach; used in graphDB, Oracle, WebPIE, . . .
Much more efficient than direct materialisation
Open question I: Effective parallelisation
Lock-free maintenance of representatives
Care needed to ensure correctness and nonrepetition of derivations
Open question II: Query evaluation
Could expand rewritten data before query evaluation, but that is inefficient
Better: evaluate queries on rewritten data and expand the answer
Such a straightforward approach is incorrect:
Result cardinalities might be wrong
The presence of FILTER and BIND can make results incorrect
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
27. Handling owl:sameAs via Rewriting
EVALUATION OF REWRITING
UOBM 279 4 2.2M
AX 36M 1.2 332M 16,152M
REW 9.4M 9.7M 0.4 33.8M 4,256M 686
factor 3.2x 3.2x 9.9x 3.8x
Table 3: Materialisation Times with Axiomatisation and Rewriting
Test Claros DBpedia OpenCyc
Threads AX REW AX
REW
AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd sec spd sec spd
1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5
2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9
4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1
8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2
12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5
16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9
Test UniProt UOBM
Threads AX REW AX
REW
AX REW AX
REW
sec spd sec spd sec spd sec spd
1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3
2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5
4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6
8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5
12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3
16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3
mode takes less than ten seconds, these results are difficult
o measure and are susceptible to skew.
Our results confirm that rewriting can significantly reduce
materialisation times. RDFox was consistently faster in the
REW mode than in the AX mode even on UniProt, where the
eduction in the number of triples is negligible. This is due to
he reduction in the number of derivations, mainly involving
ules (⇡1)–(⇡5), which is still significant on UniProt. In all
cases, the speedup of rewriting is typically much larger than
connected by the :hasSameHomeTownWith property. Thi
property is also symmetric and transitive so, for each pai
of connected resources, the number of times each triple i
derived by the transitivity rule is quadratic in the number o
connected resources. This leads to a large number of dupli
cate derivations that do not involve equality. Thus, althoug
it is helpful, rewriting does not reduce the number of deriva
tion in the same way as, for example, on Claros, which ex
plains the relatively modest speedup of REW over AX.
Speedup is bigger than the reduction in the number of triples
The number of derivations is the determining factor
Contrary to popular belief!
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
28. Incremental Materialisation Maintenance
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
29. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
30. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
31. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
32. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
33. Incremental Materialisation Maintenance
THE DRED ALGORITHM AT A GLANCE
Delete/Rederive (DRed): state of the art incremental maintenance algorithm
EXAMPLE
C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x)
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Materialise initial facts
Delete A(a) using DRed:
1 Delete all facts with a derivation from A(a)
C0(x)D ← A(x)D
C0(x)D ← B(x)D
Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n
C0(x)D ← Cn(x)D
2 Rederive facts that have an alternative derivation
C0(x) ← C0(x)D ∧ A(x)
C0(x) ← C0(x)D ∧ B(x)
Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n
C0(x) ← C0(x)D ∧ Cn(x)
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
34. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
35. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a)
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
36. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ?
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
37. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
38. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
39. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a) ?
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
40. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a) ?
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
41. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
42. Incremental Materialisation Maintenance
IMPROVEMENT: THE B/F ALGORITHM
In RDF, a fact often has many alternative derivations
⇒ Many facts get deleted in the first step
The Backward/Forward (B/F) algorithm: look for alternatives immediately
A(a) ×
B(a)
C0(a)
C1(a)
. . .
Cn(a)
Delete A(a) using B/F:
1 Is A(a) derivable in any other way?
2 No ⇒ delete
3 As in DRed, identify C0(a) as derivable from A(a)
4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a)
5 B(a) is explicit so it is derivable
6 So C0(a) is derivable too
7 Stop propagation and terminate
B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the
Backward/Forward Algorithm. AAAI 2015
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
44. Conclusion
TABLE OF CONTENTS
1 INTRODUCTION
2 PARALLEL MATERIALISATION OF DATALOG
3 HANDLING owl:sameAs VIA REWRITING
4 INCREMENTAL MATERIALISATION MAINTENANCE
5 CONCLUSION
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
45. Conclusion
RESEARCH DIRECTIONS
Add a data/query/reasoning distribution layer:
Initial results very promising
Implementation in progress
Future work:
Investigate potential for data compression
Improve join cardinality estimation
Improve query planning
Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13