This presentation was prepared for my faculty Christmas conference.
Abstract: For the last 11 months I have been working on a top secret project with a world renowned Scandinavian industry partner. We are now moving into the exciting operational phase of this project. I have been granted an early lifting of the embargo that has stopped me talking about this work up until now. I will talk about the data science behind this big data project and how semantic web technology has enabled the delivery of Project X.
6. Big Data Problem
4 December 2015 Project X-Mas 6
Volume Velocity
Variety Veracity
http://i.kinja-img.com/gawker-media/image/upload/lvzm0afp8kik5dctxiya.jpg
Value
7. 1. Global ID: URI
2. Resolvable ID
3. Useful content
HTML for humans
RDF for machines
4. Link out
Like the Web,
but for data!
Linked Data Approach
4 December 2015 Project X-Mas 7
http://blogs.splunk.com/wp-
content/uploads/2014/12/Data-Tree.jpg
8. Publishing Open Data
4 December 2015 Project X-Mas 8
Publish
Findable
Accessible
Interoperable
Reusable
Data
9. RDF: An Integration Dream
4 December 2015 Project X-Mas 9
http://www.w3.org/TR/rdf11-primer/
10. 4 December 2015 Project X-Mas 10
https://www.flickr.com/photos/mobilestreetlife/4179063482
“RDF and OWL do not
solve the interoperability
problem, they just lay it
bare on the table!”
Frank van Harmelen
14. Wrong Presents
4 December 2015 Project X-Mas 14
https://sickr.files.wordpress.com/2012/12/christmas_presents_lego.jpg
15. Santa Nav
Route planning
Linked data
Real-time location of
aircraft
Real-time weather
conditions
Sunset/sunrise
information
4 December 2015 Project X-Mas 15
http://www.rosshendrick.co.uk/wp-
content/uploads/2013/10/santa-nav.jpg
20. Conclusions
4 December 2015 Project X-Mas 20
Project X-Mas doesn’t exist
Solutions proposed do
FAIR Data Principles
Promote data reuse
Linked Data
Connecting data at scale
Agreeing what is the same
Santa Nav
Based on flooding defence work
Currently work in W3C RDF Stream Processing
Community Group
Christmas 2014 was a mitigating disaster
Wrong items bought
Wrong presents delivered
Santa got lost, was almost spotted, and stayed out after sunrise – weather was a contributing factor
Who’s heard of Big Data? What is it? What are the Vs?
Deriving value from the data
Volume: More data than you can process – relative term; complexity of processing
Velocity: Data constantly being generated
Variety: Multiple sources, formats, models
Veracity: Accuracy of the data
Linked data offers a platform on which to do data science
Linked Data hugely successful since inception in 2006, revision 2009
About 1000 linked open datasets published
Wide range of topics: government, publications, life sciences, …
Requirement to openly publish research data – promote reuse
FAIR is a set of principles to achieve this
Findable: persistent global ID, rich metadata, registries
Accessible: dereferenceable ID, metadata
Interoperable: metadata, machine processable, self-describing
Reusable: rich metadata, license, detailed provenance
RDF is one way of publishing FAIR data
Identify things with URIs
Reuse URIs
Explicit meaning to relationships
Links between datasets
Infer hidden meaning
They give us a common syntax
Rest of the talk focuses on my work to address these challenges
How do we ensure we buy the right things?
Who has used a search engine to find a product?
Keep your hands up
Who was able to filter on features of the product?
How has google got this information?
John Lewis example
Looks like a normal web page
Machine interpretable
Schema.org: loosely defined properties
Good for search, not great for data integration
If you don’t tell us what you want, we’ll just get you socks!
Natural language is ambiguous. When doing text-based search it is difficult to know the exact concept.
Not simply juxtaposing layers
Need integrated data to model with
This is a mock up that juxtaposes layers of data
Machines cannot process this
Need to integrate data models and reconcile data
Then it can be queried