Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Efficient RDF Interchange (ERI) Format for RDF Data Streams
1. Efficient RDF Interchange (ERI)
Format for RDF Data Streams
Javier D. Fernández, Alejandro Llaves, Oscar Corcho
Ontology Engineering Group (OEG)
Universidad Politécnica de Madrid, Spain
2. Outline
Index
1. Introduction & Motivation
2. Background
3. Efficient RDF Interchange (ERI) Format
i. Basic Concepts
ii. ERI Streams
iii. Practical Deployment
4. Evaluation
5. Conclusions and Next steps
2
4. INTRODUCTION - Static data versus RDF data streams
3
Files
Extract Transform Load
DBMS
Spatial Information
Web APIs
Linked Data discovery
5. INTRODUCTION - Static data versus RDF data streams
3
Files
Extract Transform Load
DBMS
Spatial Information
Web APIs
Linked Data discovery
6. INTRODUCTION - Static data versus RDF data streams
3
Files
Extract Transform Load
DBMS
Spatial Information
Web APIs
Linked Data discovery
“Most semantic tools are focused on
this static view”
12. INTRODUCTION - Static data versus RDF data streams
RDF streams: potentially unbounded sequences of timestamped
RDF statements or graphs.
3
13. INTRODUCTION - Static data versus RDF data streams
RDF streams: potentially unbounded sequences of timestamped
RDF statements or graphs.
3
user1_observation [t1]
weather1_observation [t1]
user2_observation
[t3]
…
14. INTRODUCTION - Static data versus RDF data streams
RDF streams: potentially unbounded sequences of timestamped
RDF statements or graphs.
3
t
w1 w2 w3
u1 u2 u3 u4
Stream
user1_observation [t1]
weather1_observation [t1]
user2_observation
[t3]
…
15. INTRODUCTION - Motivation
Achieve efficient transmission of RDF streams, a necessary step to
ensure higher throughput for RDF Stream processors
3
Stream source
Stream source
Stream source
Stream source
Stream
Processor
Engine
Historic
Information
C-SPARQL,
SPARQLStream
morph-streams
CQELS Cloud
Ztreamy
…
Stream source
queries
Continuous results
16. INTRODUCTION – Motivation - Requirements
16
Efficient transmission of RDF streams:
• Streamable
• Scalable
• Easy (fast) to process (create and parse)
• Compact
• Parametrizable (several tradeoffs compression/time)
17. BACKGROUND
17
Plain:
Turtle/
Trig/
JSON-LD
Plain
+Compression
(e.g. gzip) HDT
Streaming
HDT RDSZ
RDF/XML
+ EXI ERI
Streamable Yes Yes No Yes Yes Yes Yes
Scalable Limited Yes Yes No Yes Yes Yes
Easy (fast) to
Yes Limited Limited Yes Limited Limited Yes
create and parse
Compact No Yes Yes Limited Yes Yes Yes
Parametrizable:
No Limited Yes No Limited Limited Yes
compression/time
18. Outline
Index
1. Introduction & Motivation
2. Background
3. Efficient RDF Interchange (ERI) Format
i. Basic Concepts
ii. ERI Streams
iii. Practical Deployment
4. Evaluation
5. Conclusions and Next steps
18
19. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
19
• (Assumption) Most RDF streams are well structured
structure
• the is well-known by the data provider
• the number of variations
in the structure are limited
20. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
20
• (Assumption) Most RDF streams are well structured
structure
• the is well-known by the data provider
• the number of variations
in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at
two levels:
21. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
21
• (Assumption) Most RDF streams are well structured
structure
• the is well-known by the data provider
• the number of variations
in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at
two levels:
• A sliding dictionary of structures: Structural Dictionary
22. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
22
• (Assumption) Most RDF streams are well structured
structure
• the is well-known by the data provider
• the number of variations
in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at
two levels:
• A sliding dictionary of structures: Structural Dictionary
• The concrete value for each predicate
23. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
23
• (Assumption) Most RDF streams are well structured
structure
• the is well-known by the data provider
• the number of variations
in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at
two levels:
• A sliding dictionary of structures: Structural Dictionary
• The concrete value for each predicate
24. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
24
• (Assumption) Most RDF streams are well structured
structure
• the is well-known by the data provider
• the number of variations
in the structure are limited
• Efficient RDF Interchange (ERI) Format encodes the information at
two levels:
• A sliding dictionary of structures: Structural Dictionary
• The concrete value for each predicate
33. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
33
• ERI processing model
• Minimal Information Unit is a molecule:
• We initially restrict to subject molecules
34. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00
a weather:TemperatureObservation ;
rdfs: label “Air temperature at 6:55:00”, “Verified” ;
om-owl:observedProperty weather:_AirTemperature ;
om-owl:procedure sens-obs:System_4UT01 ;
om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ;
om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00.
ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00
a weather:TemperatureObservation ;
rdfs: label “Air temperature at 7:45:00”, “Not Verified” ;
om-owl:observedProperty weather:_AirTemperature ;
om-owl:procedure sens-obs:System_4UT01 ;
om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ;
om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 .
ex:CelsiusValue “9.4”^^xsd:float
34
Subject
Molecule
…
Suubject
Molecule
…
36. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
Air Temperature Observations of
the Sensor “System_4UT01”
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00
a weather:TemperatureObservation ;
rdfs: label “Air temperature at 6:55:00”, “Verified” ;
om-owl:observedProperty weather:_AirTemperature ;
om-owl:procedure sens-obs:System_4UT01 ;
om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ;
om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00.
ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00
a weather:TemperatureObservation ;
rdfs: label “Air temperature at 7:45:00”, “Not Verified” ;
om-owl:observedProperty weather:_AirTemperature ;
om-owl:procedure sens-obs:System_4UT01 ;
om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ;
om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 .
ex:CelsiusValue “9.4”^^xsd:float
36
Subject
Molecule
…..
Structure ID30=
a (1, weather:TemperatureObservation)
rdfs:label (2)
om-wl:observedProperty (1, weather:_AirTemperature )
om-owl:procedure (1,sens-obs:System_4UT01)
om-owl:result (1)
om-owl:samplingTime (1)
ex:CelsiusValue (1)
…..
Structural Dictionary
…
Suubject
Molecule
…
37. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Basic Concepts
Air Temperature Observations of
the Sensor “System_4UT01”
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_6_55_00
a weather:TemperatureObservation ;
rdfs: label “Air temperature at 6:55:00”, “Verified” ;
om-owl:observedProperty weather:_AirTemperature ;
om-owl:procedure sens-obs:System_4UT01 ;
om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_6_55_00 ;
om-owl:samplingTime sens-obs:Instant_2003_3_31_6_55_00.
ex:CelsiusValue “7.7”^^xsd:float
sens-obs:Observation_AirTemperature_4UT01_2003_3_31_7_45_00
a weather:TemperatureObservation ;
rdfs: label “Air temperature at 7:45:00”, “Not Verified” ;
om-owl:observedProperty weather:_AirTemperature ;
om-owl:procedure sens-obs:System_4UT01 ;
om-owl:result sens-obs:MeasureData_AirTemperature_4UT01_2003_3_31_7_45_00 ;
om-owl:samplingTime sens-obs:Instant_2003_3_31_7_45_00 .
ex:CelsiusValue “9.4”^^xsd:float
37
Subject
Molecule
…..
Structure ID30=
a (1, weather:TemperatureObservation)
rdfs:label (2)
om-wl:observedProperty (1, weather:_AirTemperature )
om-owl:procedure (1,sens-obs:System_4UT01)
om-owl:result (1)
om-owl:samplingTime (1)
ex:CelsiusValue (1)
…..
Structural Dictionary
…
Suubject
Molecule
…
38. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – ERI Streams
Based on: Efficient XML Interchange (EXI) format
38
Block
Mole
cule
Mole
cule
Mole
cule
…
Block
Mole
cule
Mole
cule
Mole
cule
…
Block
Mole
cule
Mole
cule
Mole
cule
… …
Multiplex / Demultiplex
Compression/Decompression
(per channel)
Stream
Header
Stream Body
META
DATA
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
META
DATA
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
META
DATA
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
COMP
CHAN.
Channels
Structural Channels
Value Channels
…
ERI stream
39. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – ERI Streams
39
ERI follows an encoding procedure similar to that of the Efficient
XML Interchange (EXI) format.
Structural channels: They encode the subjects in each block
and, for each one, the structural properties of the related
triples, using the dynamic dictionary of structures.
• Main Terms of molecules: subject of the grouping.
• ID-Structures: ID of the structure of each molecule in the block. The ID
points to the entry in the Structural Dictionary.
• New Structures: New entries in the Structural Dictionary.
– Value channels: They encode the concrete data values held by
each predicate in the block in a compact fashion.
• One channel per different predicate in the block.
• Lists explicit values or use IDs pointing to a sliding object dictionary
variations structure
40. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
40
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
41. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
41
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
42. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
42
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
43. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
43
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
44. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
44
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
45. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
45
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
IDs pointing to a sliding
object dictionary
46. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
46
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
IDs pointing to a sliding
object dictionary
47. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
47
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
IDs pointing to a sliding
object dictionary
Extraction of types
48. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
48
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
IDs pointing to a sliding
object dictionary
Extraction of types
49. EFFICIENT RDF INTERCHANGE (ERI) FORMAT – Practical Deployment
49
ID-Structures
New
Structure
Marker
…
sens-obs:MeasureData_Air…55_00
sens-obs:Instant_2003…55_00
sens-obs:MeasureData_Air…45_00
sens-obs:Instant_2003…55_00
…
…
30
30
…
ID-pred1 weather: TemperatureObservation
ID-pred2
ID-pred3 weather:_AirTemperature
ID-pred4 sensobs: System_4UT01
ID-pred5
ID-pred6
ID-pred7
[IDs of
Structures]
…
om-owl:samplingTime
ex:CelsiusValue
…
[Encoded Structures] [Strings]
Structural
Channels
….
sens-obs:Observation_AirTemperature...55_00
sens-obs:Observation_AirTemperature...45_00
….
ID-pred2
…
Air temperature at 6:55:00
Verified
Air temperature at 7:45:00
Not Verified
…
[Object Values]
[Meta: strings]
ID-pred5
[Term IDs]
[Meta: IDs]
New Terms
[Strings]
…
101
245
…
ID-pred6
1
2
…
[Term IDs]
[Meta: IDs]
Potential
Compression
Differential
…
Prefix compression
Zlib
Snappy
…
Main Terms of Molecules
[Strings]
Prefix compression
Zlib
Snappy
…
Prefix compression
Zlib
Snappy
…
Zlib
Snappy
…
Differential
…
Differential
…
…
1
0
…
[Bits]
New Structures
New Predicates
Zlib
Snappy
…
New Object
Marker
ID-pred5
…
0
1
…
[Bits]
New Object
Marker
ID-pred6
1
1
…
[Bits]
1
2
1
1
1
1
1
ID-pred7
…
7.7
9.4
….
[Object Values]
[Meta: xsd:float]
Differential
…
Value
Channels
Potential
Compression
Explicit list of values
IDs pointing to a sliding
object dictionary
Extraction of types
50. Outline
Index
1. Introduction & Motivation
2. Background
3. Efficient RDF Interchange (ERI) Format
i. Basic Concepts
ii. ERI Streams
iii. Practical Deployment
4. Evaluation
5. Conclusions and Next steps
50
53. EVALUATION - COMPRESSION
53
ERI excels in space for streaming and statistical dataset
RDSZ remains comparable to our approach
54. EVALUATION - COMPRESSION
54
ERI excels in space for streaming and statistical dataset
RDSZ remains comparable to our approach
The object dictionary can overload the representation, although it always
obtains comparable compression ratios.
58. EVALUATION - PARSING
58
ERI always outperforms the RDSZ compression time (3 and 3.8 times on
average for ERI-4k and ERI-4k-Nodict, respectively)
59. EVALUATION - PARSING
59
ERI always outperforms the RDSZ compression time (3 and 3.8 times on
average for ERI-4k and ERI-4k-Nodict, respectively)
ERI decompression is commonly slower (1.4 times on average in both ERI
configurations), typically due to decompressing several channels.
60. EVALUATION - PARSING
60
ERI always outperforms the RDSZ compression time (3 and 3.8 times on
average for ERI-4k and ERI-4k-Nodict, respectively)
ERI decompression is commonly slower (1.4 times on average in both ERI
configurations), typically due to decompressing several channels.
Channels could be grouped
(as in EXI)
62. EVALUATION – CONSUMING SCENARIO
ERI-4k and ERI-4k-Nodict outperform the baseline in transmission + decompression
except for those datasets with less regularities in the structure or the data values,
62
In parsing: transmission + decompression
63. EVALUATION – CONSUMING SCENARIO
63
In a scenario in which we include the compression time
64. EVALUATION – CONSUMING SCENARIO
64
In a scenario in which we include the compression time
ERI-4k suffers an expected overhead as we are always including the time
to process the information
65. EVALUATION – CONSUMING SCENARIO
65
In a scenario in which we include the compression time
ERI-4k suffers an expected overhead as we are always including the time
to process the information
The time in which the client receives all data in ERI is comparable to the baseline
66. Results
66
• Compressed, efficient RDF interchange (ERI) format
• exploit the RDF data stream regularity of their structure and
data values
• Flexible and extensible ERI configurations
• Minimize transmission costs in RDF stream processing
• State-of-the-art compression
• Remains efficient in performance
• Time overheads are relatively low and can be assumed in
many scenarios.
67. Next steps
67
• Integration within RDF streaming Engines
• e.g. morph-streams, CQELS Cloud
• 3 purposes:
• scaling to higher input data rates
• minimizing the data exchange among processing nodes
• serving a small set of operators on the compressed data
• Parallel compression/decompression
• preliminary proposal on Storm
• Align the proposal with the results of W3C RSP group
regarding streaming modeling and serialization
68. Efficient RDF Interchange (ERI)
Format for RDF Data Streams
Javier D. Fernández, Alejandro Llaves, Oscar Corcho
Ontology Engineering Group (OEG), Universidad Politécnica de Madrid, Spain
purl.org/net/ro-eri-ISWC14
Electronic edition:
Research object: