schema.org: Linked Data's Gateway Drug

schema.org
Linked Data’s Gateway Drug
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Drug",
"name": "schema.org",
"activeIngredient": "Linked data",
"dosageForm": "Structured data",
"recognizingAuthority": [{
"@type": "Organization",
"name": "Bing"
},{
"name": "Google"
},{
"name": "Yahoo"
},{
"name": "Yandex"
}]
}
</script>
Aaron Bradley, Connected Data London 2018 ▪ #CDL2018 ▪ @aaranged

Electronic Arts
schema.org/worksFor

bit.ly/semsearch
schema.org
pending.schema.org/knowsAbout
bit.ly/sdataevents schema.org/WebSite

schema.org
pending.schema.org/knowsAbout

History and adoption
schema.org followed in the footsteps of other structured data initiatives, but appears to
have enjoyed much broader adoption

schema.org
Microformats (2004)
Broad search engine support
data-vocabulary.org (2009)
data-vocabulary.org
Open Graph Protocol (2007)
Partial search engine support
GoodRelations (2007)
DCMI Terms (2003)
FOAF (2000)
No explicit search engine support
Structured data existed prior to schema.org, but often with little or no search engine support
The road to schema.org
schema.org (2011)

A “collection of shared vocabularies … that can be understood by the major search engines”
schema.org in a nutshell
Structure
• A collection of schemas consisting of types, properties and
enumerations
• Types – classes and subclasses (e.g. “Book”)
• Properties – attributes expecting a value of a particular data type
(e.g. “sameAs”), or relations expecting an instance of a particular
type (e.g. “author”) or an enumeration member (e.g. “availability”)
• Enumerations – a class (e.g. “ItemAvailability) whose members
are considered neither types nor properties (e.g. “InStock”)
Search engine support
• A joint initiative supported at launch by Bing, Google and
Yahoo, and soon after by Yandex
Supported encoding formats
• Microdata and RDFa supported at launch, with RDFa Lite and
JSON-LD support following

All data from Web Data Commons
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
2012 Aug 2013 Nov 2014 Dec 2015 Nov 2016 Oct 2017 Nov
Format Use as a Percentage of Sampled Domains
RDFa Microdata JSON-LD
Robust schema.org adoption data is hard to come by, but format use helps paint the picture
schema.org adoption as inferred from Web Data Commons data

What’s currently being encoded with these syntaxes is almost exclusively schema.org
For microdata and JSON-LD, it’s schema.org all the way down
Top Classes, Microdata, Nov. 2017 Top Classes, JSON-LD, Nov. 2017

All data from Web Data Commons
Format Use by Number of Domains in Sample
Raw Web Data Commons format usage data belies the relative expressiveness of schema.org
A relatively large vocabulary results in more assertions
2012 2017

Raw Web Data Commons format usage data belies the relative expressiveness of schema.org
A relatively large vocabulary results in more assertions
<span class= "author vcard">
<a href=
"http://www.seoskeptic.com/
aaron-bradley/"
class="url fn">Aaron Bradley</a>
“... OGP (Open Graph Protocol) and
microformat approaches can be found on
approximately as many sites as Schema.org,
but given their much smaller vocabularies,
they appear on less than fewer than half as
many pages and contain fewer than a quarter
as many logical assertions.”
Guha, Brickley and Macbeth, Dec. 2015

Such as they are
schema.org use by the numbers
Apr. 2014 Dec. 2014 Dec. 2015 Nov. 2018
0.3% 22.0% 31.3%
21.9%
JSON-LD
15.6%
Microdata
% of domains
SearchMetrics
500K domains
Microdata only?
% of pages
Guha, Brickley, Macbeth
10B pages
% of websites
W3Techs
Top 10M websites
(Alexa)
% of pages
Guha, Brickley, Macbeth
10B pages

The path to adoption
The vocabulary launched with a clear value proposition for webmasters, and has been
buoyed since by a collaborative vocabulary development model, a modified extension
mechanism and the added flexibility afforded by JSON-LD

Event
Recipe, AggregateRating
Product, AggregateRating
The search engines incentivized schema.org use right out of the gate with rich snippets
Rich results at launch

Rich results post-launch
The search engines have been steadily adding new search features as the vocabulary grows
Organization.logo, Organization.sameAs JobPosting ClaimReview

23 March 20174 May 2016
0 200 400 600 800 1000
Jun-11
Nov-15
Nov-18
Classes in schema.org, 2011-2018
Core Extensions Pending
A living vocabulary
Over the course of time schema.org has become more and more expressive

public-schemaorg W3C Mailing List
schema.org provides multiple mechanisms for collaborative vocabulary development
Making vocabulary development a community affair
schema.org on Github Partnerships

GS1’s SmartSearch is powered by a schema.org
external extension
schema.org’s extension mechanism was completely revamped in v2.0 (May 2015)
Extending schema.org with more specialized vocabulary
SmartSearch in action at Tesco

schema.org endorsed JSON-LD in 2013; Google started using it in 2014, with full support by 2016
JSON-LD: developer-friendly linked data
“…the whole point about it is, it is JSON first and RDF
second. And the fact that it carries RDF is simply
unimportant. And it's particularly unimportant to people
who are JSON users – which is basically every web
developer these days.
“People don't need to know everything, they can create
really cool applications, and if they find JSON-LD useful
– fantastic. If they don't know that it's RDF, I don't care.”
Phil Archer, Aug. 2014

Separation of the data and presentation layers makes life considerably easier for web developers
JSON-LD versus inline markup: no contest
Product Details Page: Before Product Details Page: After
{
"@type": "Product",
"name": "Bob's Best Basic T"
"image": "bbbt-pink.jpg",
"offers": {
"@type": "Offer",
"price": "$28",
"priceCurrency": "$USD",
},
"aggregateRating": {
…
{
"@type": "Product",
"name": "Bob's Best Basic T"
"image": "bbbt-pink.jpg",
"offers": {
"@type": "Offer",
"price": "$28",
"priceCurrency": "$USD",
},
"aggregateRating": {
…

schema.org beyond search
Seemingly striking the right balance between expressiveness and complexity, the
vocabulary is being used for applications outside of search, and is increasingly the
starting point for ground-up linked data initiatives

Pinterest uses schema.org to populate Article, Product and Recipe Rich Pins
Leveraging structured data to enhance the presentation layer
Pinterest Product Rich Pin Offer Information on Pin Source Page

When Google needed vocabulary for its Assistant it unsurprisingly turned to schema.org
Virtual assistants and schema.org

Amazon’s Alexa Meaning Representation Language is based on schema.org
Virtual assistants and schema.org
“The Alexa ontology utilized schema.org as
its base and has been updated to include
support for spoken language. In addition,
using schema.org as the base of the Alexa
Ontology means that it shares a vocabulary
used by more than 10 million websites, which
can be linked to the Alexa ontology”
Thomas Kollar et al, Jun. 2018

A New Zealand health insurance company used the vocabulary to kickstart product development
Bootstrapping development with schema.org
David Gibson, Feb. 2018

The vocabulary allows linked data practitioners to construct knowledge graphs with relative ease
“…the knowledge graph is implemented as a
triple store where the data has been
represented using a small number of
vocabularies (mostly schema.org with some
terms borrowed from TAXREF-LD and the
TDWG LSID vocabularies).”
Rod Page, Ozymandias

Chinese search engine Baidu appears to have based its knowledge graph on schema.org
Via Google Translate

Electronic Arts used the vocabulary as the basis for their domain ontology

Boundaries of the vocabulary
As schema.org is adopted for use in increasingly diverse domains, there’s more and
more demands to add to the vocabulary: does it risk becoming too much “an ontology of
everything”, or is it actually not expressive enough?

Is it an animal?
Just how much can we say about each entity?
Let’s play 20 questions using schema.org vocabulary!
Is it a vegetable? Is it a mineral?
It’s a Thing It’s a Thing It’s a Thing
More expressive exceptions:
Person, Product
More expressive exception:
Product
More expressive exception:
Product

But there’s always a tension between adding to schema.org and referencing existing vocabularies
The “add animals and plants” discussion has recently reignited

Recent developments and future
directions
At the same time that the improved ability of machines to understand content makes
structured data use less of an imperative, schema.org is increasingly finding itself useful
as a mechanism for serialized linked data

If machines are eventually able to parse content like humans will structured data still be necessary?
Will AI and related technologies render schema.org obsolete?

Leveraging schema.org allows Google to improve the discoverability of datasets
Bridging the semantic gap with Dataset Search
Year of Birth No. of cases
1976 1
1977 1
1980 1
1981 2
1982 7
1983 8
1984 7
1985 7
1986 11
…
Total 89

JSON-LD data feeds enable publishers to support user-initiated video or audio playback
Bridging the action gap with Google Media Actions
{
"@context": ["http://schema.org",
{"@language": "en"}],
"@type": "Movie",
"@id": "http://example.com/M",
"url": "http://example.com/M",
"name": “M",
"potentialAction": {
"@type": "WatchAction",
"target": {
"@type": "EntryPoint",
"urlTemplate":
"http://example.com/M?autoplay=true",
"inLanguage": "en",
"actionPlatform": [
"http://schema.org/DesktopWebPlatform",
"http://schema.org/MobileWebPlatform",
"http://schema.org/AndroidPlatform",
"http://schema.org/IOSPlatform",
"http://schema.googleapis.com/GoogleVideoCa
st"
]
…

This Google tool supports direct entry of ClaimReview data, which then appears on dataCommons.org
Bridging the markup gap with the Fact Check Markup Tool
...
"@type" : "DataFeedItem",
"dateModified" : "2018-10-24T15:00:14.238315+00:00",
"item" :
[
{
"@context" : "schema.org",
"@type" : "ClaimReview",
"author" :
{
"@type" : "Organization",
"name" : "Sens3",
"url" : "http://fct.sens3.com/"
},
"claimReviewed" : "I play the trumpet!",
"datePublished" : "2018-10-09",
"itemReviewed" :
{
"@type" : "Claim",
"author" :
{
"@type" : "Person",
"name" : "Paul McCartney"
}
},
"reviewRating" :
...

This Google tool supports direct entry of ClaimReview data, which then appears on dataCommons.org
Bridging the markup gap with the Fact Check Markup Tool
...
"@type" : "DataFeedItem",
"dateModified" : "2018-10-24T15:00:14.238315+00:00",
"item" :
[
{
"@context" : "schema.org",
"@type" : "ClaimReview",
"author" :
{
"@type" : "Organization",
"name" : "Sens3",
"url" : "http://fct.sens3.com/"
},
"claimReviewed" : "I play the trumpet!",
"datePublished" : "2018-10-09",
"itemReviewed" :
{
"@type" : "Claim",
"author" :
{
"@type" : "Person",
"name" : "Paul McCartney"
}
},
"reviewRating" :
...
"@type": "Rating",
"ratingValue": “2",
"alternateName" : “Mostly False",
"bestRating": "5",
"worstRating": "1“

schema.org has established common ground on shared terminology: is it time to address identifiers?
Questions of identity
“Very early in the formation of schema.org we made a strong decision, which was not
to support canonical IDs, and I think it was an important thing because it would have
been very politically contentious at the time to support it, because we basically would
have had to pick somebody's ID system to have canonical IDs.
“I think the time has come for canonical IDs, so I would love to see schema.org or
some other organization take on canonical IDs.”
Steve Macbeth, Microsoft, Apr. 2018

Let’s keep the conversation going
Thanks!
{
"@type": "CommunicateAction",
"agent": {
"@type": "Person",
"name": "Aaron"
},
"recipient": {
"@type": "PeopleAudience",
"name": "CDL2018 Attendees"
},
"object": "Stay in touch!"
}
</script>
Twitter
@aaranged
LinkedIn
linkedin.com/in/aaranged/
Semantic Search Marketing
bit.ly/semsearch

schema.org: Linked Data's Gateway Drug

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to schema.org: Linked Data's Gateway Drug

Similar to schema.org: Linked Data's Gateway Drug (20)

Recently uploaded

Recently uploaded (11)

schema.org: Linked Data's Gateway Drug