OWL reasoning with WebPIE: calculating the closer of 100 billion triples
1. +
OWL reasoning with WebPIE:
calculating the closer of 100 billion
triples
Presented by :
Mahdi Atawna
2. +
outline
Introduction.
Paper motivation.
Methodology.
MapReduce.
WebPIE and OWL challenges.
Experiment.
Results and conclusion.
Criticism.
2
3. +
About the paper
Authored by: Jacopo Urbani, Spyros Kotoulas, Jason Maassen,
Frank van Harmelen, and Henri Bal from Vrije Universiteit
Amsterdam
an extension to a previous published paper : Scalable
Distributed Reasoning using MapReduce" in 2009 that
focused on handling the reasoning of RDFS data only.
This paper published in 2010 to extend the approach
introduces in the previous published paper to handle the
complexity OWL semantic.
3
4. +
Definitions
Semantic Reasoner: a piece of software able to infer logical
consequences from a set of asserted facts or axioms.
MapReduce : programming model that allows for massive
scalability across large number of servers in cluster
4
5. +
Paper motivation
There is a problem in most previous reasoning methods is that
they all:
Centralized
Performance depends on improving the hardware and data
structure of the computer to get more performance which reach
it limit sooner in large data-sets.
5
6. +
Research problem
Develop a method to handle large scale data.
The new method will use a scalable distributed approach
which performs the processing in parallel. by using this
approach the performance can be scaled in two dimensions ,
first by the hardware of each node , second by the number of
nodes .
6
8. +
Methodology
The researchers present a new method to handle large scale
data by using scalable distributed approach which performs the
processing in parallel.
by using this approach the performance can be scaled in two
dimensions ,
hardware of each node ,
number of nodes .
To achieve this approach , they used MapReduce .
MapReduce : programming model that allows for massive
scalability across large number of servers in cluster .
8
9. +
MapReduce !
The MapReduce term refers to two separate tasks:
Map: which takes a large set of data and break it down into
tuples ( key/value pairs).
Reduce: which performed after "Map" and takes the "Map"
output as input and reduce the input into smaller set of tuples
by combine the input tuples .
9
14. +
The previous paper
focused on RDFS,
the closer of RDF input can be computed and reach a fix-point
by applying all rules repeatedly until no new data,
it can be implemented easily on single-antecedent triples
but in multi-antecedent triples the implementation is challenging
because it needs to perform a join between these related triples
14
16. +
WebPIE reasoning engine
In this paper The researchers extend there previous work to
support OWL
introduced a new massively scalable OWL reasoning engine
called "WebPIE" which deal with the complex OWL entailment
rules.
16
17. +
OWL challenges
OWL has many challenges that WebPIE will overcome such as:
1. no rule ordering.
2. Joins between multiple instance triples.
3. Duplicate derivations.
4. multiple joins per rule.
17
18. +
OWL Horst fragment
The authors chosen to work on Horst fragment of OWL rule-set :
it's the standard used in industry ,
it can be expressed by rule set,
it make a balance between OWL full and the limited RDFS.
The OWL Horst rule-set (known as pD) consist of two parts:
1- RDFS rules ( defined as D),
2- other 16 rules (defined as p) ,
18
20. +
OWL Horst fragment
The researchers explored p rule-set and noticed that :
Some rules can be implemented using the optimization
introduced in RDFS reasoning .
Furthermore the found that rules 1 and 2 are straightforward to
implement by partitioning on subject and predicate.
All other rules need a custom algorithm to be implemented,
these rules are : transitivity, sameAs, someValuesFrom and
allValuesFrom.
20
23. +
Experiment
They make there experiments on hadoop cluster of
46 nodes equipped with
dual-core 2.4 GHz,4GB RAM, 250 GB HD, and Gigabit
Ethernet interconnect).
used three data sets :
1- UniPort (1.51 billion triples).
2- LDSR (0.9 billion triples).
3- LUMB (up to 100 billion triples ).
23
24. +
Results
The experiment results on the data-sets was as follow: UniPort
processing took 6.1 hours,
LDSR processing took 3.52 hours.
these result show that this implantation outperform other
current systems in the same field.
24
25. +
Conclusions
after making the experiment the researchers observed that the
throughput is higher (almost 0.30) for larger data-sets,.
the execution time depends on the complexity of input.
linear scalability regarding the input size and nodes number.
25
26. +
Criticism
LUMB rule-set result not found in result.
the method that they introduced can be easily implemented
using MapReduce on hadoop to paralyze the processing,.
but it will be expensive.
26