OWL reasoning with WebPIE: calculating the closer of 100 billion triples

+
OWL reasoning with WebPIE:
calculating the closer of 100 billion
triples
Presented by :
Mahdi Atawna

+
outline
 Introduction.
 Paper motivation.
 Methodology.
 MapReduce.
 WebPIE and OWL challenges.
 Experiment.
 Results and conclusion.
 Criticism.
2

+
About the paper
Authored by: Jacopo Urbani, Spyros Kotoulas, Jason Maassen,
Frank van Harmelen, and Henri Bal from Vrije Universiteit
Amsterdam
 an extension to a previous published paper : Scalable
Distributed Reasoning using MapReduce" in 2009 that
focused on handling the reasoning of RDFS data only.
 This paper published in 2010 to extend the approach
introduces in the previous published paper to handle the
complexity OWL semantic.
3

+
Definitions
 Semantic Reasoner: a piece of software able to infer logical
consequences from a set of asserted facts or axioms.
 MapReduce : programming model that allows for massive
scalability across large number of servers in cluster
4

+
Paper motivation
There is a problem in most previous reasoning methods is that
they all:
 Centralized
 Performance depends on improving the hardware and data
structure of the computer to get more performance which reach
it limit sooner in large data-sets.
5

+
Research problem
 Develop a method to handle large scale data.
 The new method will use a scalable distributed approach
which performs the processing in parallel. by using this
approach the performance can be scaled in two dimensions ,
first by the hardware of each node , second by the number of
nodes .
6

+
Methodology
 The researchers present a new method to handle large scale
data by using scalable distributed approach which performs the
processing in parallel.
 by using this approach the performance can be scaled in two
dimensions ,
 hardware of each node ,
 number of nodes .
 To achieve this approach , they used MapReduce .
 MapReduce : programming model that allows for massive
scalability across large number of servers in cluster .
8

+
MapReduce !
The MapReduce term refers to two separate tasks:
 Map: which takes a large set of data and break it down into
tuples ( key/value pairs).
 Reduce: which performed after "Map" and takes the "Map"
output as input and reduce the input into smaller set of tuples
by combine the input tuples .
9

+
The previous paper
 focused on RDFS,
 the closer of RDF input can be computed and reach a fix-point
by applying all rules repeatedly until no new data,
 it can be implemented easily on single-antecedent triples
 but in multi-antecedent triples the implementation is challenging
because it needs to perform a join between these related triples
14

+
Example of multi-antecedent :
A rdf:type X,
X rdfs:subClassOf Y
=> A rdf:type Y
15

+
WebPIE reasoning engine
 In this paper The researchers extend there previous work to
support OWL
 introduced a new massively scalable OWL reasoning engine
called "WebPIE" which deal with the complex OWL entailment
rules.
16

+
OWL challenges
OWL has many challenges that WebPIE will overcome such as:
1. no rule ordering.
2. Joins between multiple instance triples.
3. Duplicate derivations.
4. multiple joins per rule.
17

+
OWL Horst fragment
The authors chosen to work on Horst fragment of OWL rule-set :
 it's the standard used in industry ,
 it can be expressed by rule set,
 it make a balance between OWL full and the limited RDFS.
The OWL Horst rule-set (known as pD) consist of two parts:
 1- RDFS rules ( defined as D),
 2- other 16 rules (defined as p) ,
18

+
OWL Horst fragment
The researchers explored p rule-set and noticed that :
 Some rules can be implemented using the optimization
introduced in RDFS reasoning .
 Furthermore the found that rules 1 and 2 are straightforward to
implement by partitioning on subject and predicate.
 All other rules need a custom algorithm to be implemented,
these rules are : transitivity, sameAs, someValuesFrom and
allValuesFrom.
20

+
Experiment
 They make there experiments on hadoop cluster of
 46 nodes equipped with
 dual-core 2.4 GHz,4GB RAM, 250 GB HD, and Gigabit
Ethernet interconnect).
 used three data sets :
 1- UniPort (1.51 billion triples).
 2- LDSR (0.9 billion triples).
 3- LUMB (up to 100 billion triples ).
23

+
Results
 The experiment results on the data-sets was as follow: UniPort
processing took 6.1 hours,
 LDSR processing took 3.52 hours.
 these result show that this implantation outperform other
current systems in the same field.
24

+
Conclusions
 after making the experiment the researchers observed that the
throughput is higher (almost 0.30) for larger data-sets,.
 the execution time depends on the complexity of input.
 linear scalability regarding the input size and nodes number.
25

+
Criticism
 LUMB rule-set result not found in result.
 the method that they introduced can be easily implemented
using MapReduce on hadoop to paralyze the processing,.
 but it will be expensive.
26

OWL reasoning with WebPIE: calculating the closer of 100 billion triples

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to OWL reasoning with WebPIE: calculating the closer of 100 billion triples

Similar to OWL reasoning with WebPIE: calculating the closer of 100 billion triples (20)

More from Mahdi Atawneh

More from Mahdi Atawneh (6)

Recently uploaded

Recently uploaded (20)

OWL reasoning with WebPIE: calculating the closer of 100 billion triples