SlideShare a Scribd company logo
1 of 23
Download to read offline
5 June 2013
BBC Linked Data Platform
Using semantic technologies to make our content more connected and more discoverable
A (very) short history
✤ Dynamic Semantic Publishing
✤ BBC Sport - Transition from ‘static’ to ‘dynamic’
✤ Introduction of Semantic Technologies for World Cup 2010
✤ Raising the bar for Olympics 2012
✤ Linked Data Platform & The Creative Work
Olympics 2012
Athletes & Medals: from trackside to our audience
BBC Linked Data Platform
(our logo)
LDP:The CreativeWork
MinimalMetadata
Semantically
AggregatedMetadata
Triple Store
Website
Mobile
Apps
IPTV
Open API
CreativeWorks
✤ Minimal metadata
✤ Enough non-semantic metadata to support ‘rich links’ in a wide
range of applications
✤ Enough semantic metadata (tags) to support discovery through
semantic queries
✤ Full metadata requires a content-type-specific metadata API
✤ Access to content requires a content API
Some use-cases
✤ Automated index pages/feeds
✤ Semantic navigation
✤ Semantic search
✤ A typical query:
✤ Top 10, most recent, BBC News Items about Politicians who are
members of The Labour Party
Powered by LDP
BBC Sport
BBC Music
BBC Olympics 2012
BBC Knowledge & Learning Beta
BBC News Local Beta
BBC Sport Mobile App
CreativeWork Ontology
CreativeWorks in Code
case class CreativeWork(
locators: Set[Locator],
title: String,
modified: DateTime,
format: Option[FormatType.FormatType] = None,
created: Option[DateTime] = None,
uri: Option[String] = None,
primaryContentOf: List[PrimaryContentOf] = List(),
about: List[String] = List(),
mentions: List[String] = List(),
`type`: CreativeWorkType = CreativeWorkType.CreativeWork,
provenance: Option[CreativeWorkProvenance] = None,
thumbnails: List[Thumbnail] = List(),
audience: Option[AudienceType] = None,
category: Option[CreativeWorkCategory] = None
) {
private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1)
private val allLocatorsDistinct = locators.map(_.uri).size == locators.size
require(title.trim.isEmpty == false, "Creative Work has an empty title")
require(title.length <= CreativeWork.MaxTitleLength,
"Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength)
require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type")
require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs")
def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", ""))
}
object CreativeWork {
val Locator = "http://www.bbc.co.uk/ontologies/cms/locator"
val MaxTitleLength = 300
}
Creative
Work Query*
CONSTRUCT {
?creativeWork a cwork:CreativeWork ;
a ?type ;
cwork:title ?title ;
cwork:about ?about ;
cwork:mentions ?mentions ;
cwork:dateModified ?modified ;
?about bbc:preferredLabel ?aboutPreferredLabel .
?mentions bbc:preferredLabel ?mentionsPrefLabel .
}
WHERE {{
SELECT DISTINCT ?creativeWork
! WHERE {
! {{#about}}
! ! FILTER (?about = <{{about}}>) .
! ! ?creativeWork cwork:about ?about .
! {{/about}}
! {{#mentions}}
! ! FILTER (?mentions = <{{mentions}}>) .
! ! ?creativeWork cwork:mentions ?mentions .
! {{/mentions}}
! ?creativeWork a cwork:CreativeWork ;
! ! a ?type ;
! ! cwork:title ?title ;
! ! cwork:dateModified ?modified .
! }
! ORDER BY DESC(?modified)
! LIMIT 10
! {{#offset}}OFFSET {{offset}}{{/offset}}
}
?creativeWork a cwork:CreativeWork .
{
?creativeWork a cwork:CreativeWork ;
a ?type ;
! ! cwork:title ?title ;
! ! cwork:dateModified ?modified .
{
?type rdfs:subClassOf cwork:CreativeWork .
} UNION {
OPTIONAL {
?creativeWork cwork:about ?about .
OPTIONAL { ?about rdfs:label ?aboutLabel . }
OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . }
}
OPTIONAL {
?creativeWork cwork:mentions ?mentions .
OPTIONAL { ?mentions rdfs:label ?mentionsLabel . }
OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . }
}
}
}
} *Simplified
SPARQL CONSTRUCT
Inner SELECT
Parametisation
Pagination
Mustache-templated
Our principal challenge:
Data Management
4 Kinds of Data
✤ Creative Works
✤ Reference Data, managed in sets (Datasets)
✤ Reference Data, managed individually (Resources)
✤ Ontologies
99.99% Availability
Our own URIs
✤ Everything has a ‘Thing URI’:
✤ http://www.bbc.co.uk/things/{GUID}#ID
✤ Opaque ID, dereferencable*
✤ BBC controls identity, therefore quality & consistency
✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc
*coming soon
Our own ontologies
✤ Core set of ontologies that are BBC owned
✤ Creative Work, BBC, (Organsational) Provenance, etc
✤ Ability to change regularly and unilaterally
✤ Provide ‘mappings’ to more widely used ontologies
(e.g. Schema.org)
✤ Domain ontologies can be shared or reused
✤ Sport, Politics, GeoLocation, etc
Open data
✤ Provided through Mashery
✤ ‘Connected Studio’ events will validate
our API
✤ Public beta to follow
✤ JSON-LD & Turtle
✤ Future
✤ Self-provisioned, cloud-based
triple stores
✤ Data Dumps
The Hard Problems...
Managing concepts across BBC
✤ Which domain ‘owns’ Arnold Schwarzenegger?
✤ News? Entertainment? History? Politics?
✤ Can domains ‘own’ predicates?
✤ Layering information over shared concepts
✤ High quality sub-sets vs. lower quality ‘long-tail’
✤ Synchronisation with external datasets
✤ Tools for creating and managing concepts
✤ Emerging, splitting & combining concepts
✤ Linked Data gives us a language to solve these problems
Metadata
Often subjective, never complete
✤ What is this TV programme about?
✤ Manual tag curation
✤ Subjective
✤ Long-term expense
✤ Inconsistent
✤ Automated tag generation
✤ Short-term expense
✤ Value in data or algorithm?
✤ Complex
✤ Relies on assumptions
✤ Our approach? Invest in both. Validate learnings.
When to reason?
✤ Our options...
✤ Before writing to the triple store
✤ Materialised in the triple store (Forward-chaining inference)
✤ Inferred by the SPARQL engine (Backward-chaining inference)
✤ After SPARQL results have returned
✤ None/some/all of the above
Maturity of SemanticTech
✤ From a Software Industry perspective, Semantic (RDF) Technology is
not mainstream and is therefore hard to sell
✤ Library/application immaturity can be a hinderance to innovation
✤ I believe the Sem Tech industry needs to focus on
simplicity and abstraction
✤ Semantic Technology is complex, but using it, need not be
Find out more
✤ Video from QCon London 2013:
✤ http://www.infoq.com/presentations/bbc-­‐data-­‐platform-­‐api
✤ BBC Internet Blog:
✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-­‐Data-­‐Connecting-­‐
together-­‐the-­‐BBCs-­‐Online-­‐Content
✤ david.rogers@bbc.co.uk
✤ @daverog

More Related Content

Similar to BBC Linked Data Platform (SemTechBiz San Fran 2013)

Similar to BBC Linked Data Platform (SemTechBiz San Fran 2013) (20)

Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
GoralSoft
GoralSoftGoralSoft
GoralSoft
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De LathouwerInspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
 
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De LathouwerInspire Helsinki 2019 - Keynote Bart De Lathouwer
Inspire Helsinki 2019 - Keynote Bart De Lathouwer
 
Inspire Helsinki 2019 Keynote by Bart De Lathouwer
Inspire Helsinki 2019 Keynote by Bart De LathouwerInspire Helsinki 2019 Keynote by Bart De Lathouwer
Inspire Helsinki 2019 Keynote by Bart De Lathouwer
 
Getting started with titanium
Getting started with titaniumGetting started with titanium
Getting started with titanium
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
 
HTML5: An Overview
HTML5: An OverviewHTML5: An Overview
HTML5: An Overview
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Getting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumGetting started with Appcelerator Titanium
Getting started with Appcelerator Titanium
 
Web of things introduction
Web of things introductionWeb of things introduction
Web of things introduction
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Azure Media Services & Azure Search
Azure Media Services & Azure SearchAzure Media Services & Azure Search
Azure Media Services & Azure Search
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Linked Open Data and Ontotext Projects
Linked Open Data and Ontotext ProjectsLinked Open Data and Ontotext Projects
Linked Open Data and Ontotext Projects
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Introduction to web scraping
Introduction to web scrapingIntroduction to web scraping
Introduction to web scraping
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 

BBC Linked Data Platform (SemTechBiz San Fran 2013)

  • 1. 5 June 2013 BBC Linked Data Platform Using semantic technologies to make our content more connected and more discoverable
  • 2. A (very) short history ✤ Dynamic Semantic Publishing ✤ BBC Sport - Transition from ‘static’ to ‘dynamic’ ✤ Introduction of Semantic Technologies for World Cup 2010 ✤ Raising the bar for Olympics 2012 ✤ Linked Data Platform & The Creative Work
  • 3. Olympics 2012 Athletes & Medals: from trackside to our audience
  • 4. BBC Linked Data Platform (our logo)
  • 6. CreativeWorks ✤ Minimal metadata ✤ Enough non-semantic metadata to support ‘rich links’ in a wide range of applications ✤ Enough semantic metadata (tags) to support discovery through semantic queries ✤ Full metadata requires a content-type-specific metadata API ✤ Access to content requires a content API
  • 7. Some use-cases ✤ Automated index pages/feeds ✤ Semantic navigation ✤ Semantic search ✤ A typical query: ✤ Top 10, most recent, BBC News Items about Politicians who are members of The Labour Party
  • 8. Powered by LDP BBC Sport BBC Music BBC Olympics 2012 BBC Knowledge & Learning Beta BBC News Local Beta BBC Sport Mobile App
  • 10. CreativeWorks in Code case class CreativeWork( locators: Set[Locator], title: String, modified: DateTime, format: Option[FormatType.FormatType] = None, created: Option[DateTime] = None, uri: Option[String] = None, primaryContentOf: List[PrimaryContentOf] = List(), about: List[String] = List(), mentions: List[String] = List(), `type`: CreativeWorkType = CreativeWorkType.CreativeWork, provenance: Option[CreativeWorkProvenance] = None, thumbnails: List[Thumbnail] = List(), audience: Option[AudienceType] = None, category: Option[CreativeWorkCategory] = None ) { private val oneLocatorPerType = locators.groupBy(_.`type`).forall(_._2.size == 1) private val allLocatorsDistinct = locators.map(_.uri).size == locators.size require(title.trim.isEmpty == false, "Creative Work has an empty title") require(title.length <= CreativeWork.MaxTitleLength, "Creative Work title exceeded the maximum length allowed of " + CreativeWork.MaxTitleLength) require(oneLocatorPerType, "Creative Work contained multiple Locators of the same type") require(allLocatorsDistinct, "Creative Work contained multiple identical Locator URNs") def guid = uri.map(_.replace("http://www.bbc.co.uk/things/", "")).map(_.replace("#id", "")) } object CreativeWork { val Locator = "http://www.bbc.co.uk/ontologies/cms/locator" val MaxTitleLength = 300 }
  • 11. Creative Work Query* CONSTRUCT { ?creativeWork a cwork:CreativeWork ; a ?type ; cwork:title ?title ; cwork:about ?about ; cwork:mentions ?mentions ; cwork:dateModified ?modified ; ?about bbc:preferredLabel ?aboutPreferredLabel . ?mentions bbc:preferredLabel ?mentionsPrefLabel . } WHERE {{ SELECT DISTINCT ?creativeWork ! WHERE { ! {{#about}} ! ! FILTER (?about = <{{about}}>) . ! ! ?creativeWork cwork:about ?about . ! {{/about}} ! {{#mentions}} ! ! FILTER (?mentions = <{{mentions}}>) . ! ! ?creativeWork cwork:mentions ?mentions . ! {{/mentions}} ! ?creativeWork a cwork:CreativeWork ; ! ! a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . ! } ! ORDER BY DESC(?modified) ! LIMIT 10 ! {{#offset}}OFFSET {{offset}}{{/offset}} } ?creativeWork a cwork:CreativeWork . { ?creativeWork a cwork:CreativeWork ; a ?type ; ! ! cwork:title ?title ; ! ! cwork:dateModified ?modified . { ?type rdfs:subClassOf cwork:CreativeWork . } UNION { OPTIONAL { ?creativeWork cwork:about ?about . OPTIONAL { ?about rdfs:label ?aboutLabel . } OPTIONAL { ?about bbc:preferredLabel ?aboutPreferredLabel . } } OPTIONAL { ?creativeWork cwork:mentions ?mentions . OPTIONAL { ?mentions rdfs:label ?mentionsLabel . } OPTIONAL { ?mentions bbc:preferredLabel ?mentionsPrefLabel . } } } } } *Simplified SPARQL CONSTRUCT Inner SELECT Parametisation Pagination Mustache-templated
  • 13. 4 Kinds of Data ✤ Creative Works ✤ Reference Data, managed in sets (Datasets) ✤ Reference Data, managed individually (Resources) ✤ Ontologies
  • 15. Our own URIs ✤ Everything has a ‘Thing URI’: ✤ http://www.bbc.co.uk/things/{GUID}#ID ✤ Opaque ID, dereferencable* ✤ BBC controls identity, therefore quality & consistency ✤ bbc:sameAs to DBPedia, Wikidata, Freebase etc *coming soon
  • 16. Our own ontologies ✤ Core set of ontologies that are BBC owned ✤ Creative Work, BBC, (Organsational) Provenance, etc ✤ Ability to change regularly and unilaterally ✤ Provide ‘mappings’ to more widely used ontologies (e.g. Schema.org) ✤ Domain ontologies can be shared or reused ✤ Sport, Politics, GeoLocation, etc
  • 17. Open data ✤ Provided through Mashery ✤ ‘Connected Studio’ events will validate our API ✤ Public beta to follow ✤ JSON-LD & Turtle ✤ Future ✤ Self-provisioned, cloud-based triple stores ✤ Data Dumps
  • 19. Managing concepts across BBC ✤ Which domain ‘owns’ Arnold Schwarzenegger? ✤ News? Entertainment? History? Politics? ✤ Can domains ‘own’ predicates? ✤ Layering information over shared concepts ✤ High quality sub-sets vs. lower quality ‘long-tail’ ✤ Synchronisation with external datasets ✤ Tools for creating and managing concepts ✤ Emerging, splitting & combining concepts ✤ Linked Data gives us a language to solve these problems
  • 20. Metadata Often subjective, never complete ✤ What is this TV programme about? ✤ Manual tag curation ✤ Subjective ✤ Long-term expense ✤ Inconsistent ✤ Automated tag generation ✤ Short-term expense ✤ Value in data or algorithm? ✤ Complex ✤ Relies on assumptions ✤ Our approach? Invest in both. Validate learnings.
  • 21. When to reason? ✤ Our options... ✤ Before writing to the triple store ✤ Materialised in the triple store (Forward-chaining inference) ✤ Inferred by the SPARQL engine (Backward-chaining inference) ✤ After SPARQL results have returned ✤ None/some/all of the above
  • 22. Maturity of SemanticTech ✤ From a Software Industry perspective, Semantic (RDF) Technology is not mainstream and is therefore hard to sell ✤ Library/application immaturity can be a hinderance to innovation ✤ I believe the Sem Tech industry needs to focus on simplicity and abstraction ✤ Semantic Technology is complex, but using it, need not be
  • 23. Find out more ✤ Video from QCon London 2013: ✤ http://www.infoq.com/presentations/bbc-­‐data-­‐platform-­‐api ✤ BBC Internet Blog: ✤ http://www.bbc.co.uk/blogs/internet/posts/Linked-­‐Data-­‐Connecting-­‐ together-­‐the-­‐BBCs-­‐Online-­‐Content ✤ david.rogers@bbc.co.uk ✤ @daverog