DevEX - reference for building teams, processes, and platforms
Web of Data - Introduction (english)
1. Web of data
Thomas Francart, sparna.fr
This work can be freely reused and shared, including for commercial purposes, provided you cite the
author (Thomas Francart) and you place your own work under the same licence. For more
information, see the licence.
Crédits : This work remixes elements from Fabien Gandon, Serge Garlatti and Pierre-Yves Vandenbussche
3. 3
The Man Who Mistook His Wife for a Hat :
And Other Clinical Tales by
In his most extraordinary book, "one of the great clinical writers of the 20th century" (The New
York Times) recounts the case histories of patients lost in the bizarre, apparently inescapable world
of neurological disorders. Oliver Sacks's The Man Who Mistook His Wife for a Hat tells the stories
of individuals afflicted with fantastic perceptual and intellectual aberrations: patients who have lost
their memories and with them the greater part of their pasts; who are no longer able to recognize
people and common objects; who are stricken with violent tics and grimaces or who shout
involuntary obscenities; whose limbs have become alien; who have been dismissed as retarded yet
are gifted with uncanny artistic or mathematical talents.
If inconceivably strange, these brilliant tales remain, in Dr. Sacks's splendid and sympathetic telling, deeply human. They
are studies of life struggling against incredible adversity, and they enable us to enter the world of the neurologically
impaired, to imagine with our hearts what it must be to live and feel as they do. A great healer, Sacks never loses sight of
medicine's ultimate responsibility: "the suffering, afflicted, fighting human subject."
Our rating :
Find other books in : Neurology Psychology
Search books by terms :
Oliver W. Sacks
Oliver Sacks
13. Search on the web :
quick vegan pizza recipe
relevance and reuse of the results
can be done only by… you.
What if I want to sort by cooking time ? By calories ?
What if I need to create and excel spreadsheet of the recipes ?
15. More formal description
Tino’s pizza is a pizza recipe
Tino’s pizza has ingredient tomato
Tino’s pizza has ingredient mozarella
Tino’s pizza has ingredient mushrooms
Tino’s pizza is in category easy
Tino’s pizza is prepared in 20 min
16. Yes but…
how can we be
non ambiguous
in these descriptions ?
« has ingredient », « contains », « a pour ingrédient »… ?
17. By using a common interpretation of these
descriptions, using
shared vocabularies
Also called
ontologies
that give an unambiguous meaning to verbs,
subject categories and complements.
18. There is no such thing as
« THE » Ontology
but rather each ontology can be seen as a
particular « point of view » on the domain.
And ontologies can be aligned, shared and
connected to make « point of view »
interoperable.
26. • Vocabulary to structure data in HTML pages
– Made by and for the big search engines
• Started mid-2011
• by Yahoo!, Bing and Google.
• + Yandex (russian)
• Working group led by Dan Brickley
• Relies on HTML5 (Microdata and RDFa)
31. RDFa Microdata
vs.
Which one should I choose?
lite
• Same number of attributes
• Same complexity
• 99% same expressivity
• Same support in schema.org
32. RDFa Microdata
vs.
Which one should I choose?
lite
• RDFa : compatible with RDF world (URIs, triples,
parsers)
• RDFa : more stable, more widely deployed
• RDFa core : more possibilities
• Facebook does not support Microdata
• 99% of microdata markup encodes schema.org
33. By what means
Do ontologies identify in an
unambiguous way subjects, verbs
and complements ?
35. URL
Identifies
what exists
on the web
http://mon.site.fr
URI
Identifies,
on the web,
what exists
http://animaux.fr/mon-zebre
Fabien Gandon : http://fr.slideshare.net/fabien_gandon
36. URL : phone number
URI : social security number
Good practice : on the web of data,
every URI is also a URL
40. To
share data with partners,
applications, services…
41. What is the simplest mode of
communication ?
« peer to peer » « hub and spoke »
42. Publishing data ? Is it Open Data
then ?
http://5stardata.info
Open data
Louvre Paris
Data in the web
Linked data
Is in
http://fr.dbpedia.org/resource/Pari
s
Paris =
Paris Paris
43. Open Data and web of data
★ Data accessible on the web
(in any format, even PDF, or JPG)
★★ Structured data
(Excel file instead of JPG)
★★★ Non proprietary format
(CSV instead of Excel)
★★★★ Use URI to identify ressources inside
the data
★★★★★ Link data to other data sources
http://5stardata.info/
Open Data
Linked data –
web of data
49. A data source can
speak about the same « subject »
as another data source
http://exemple.com/Elvis
plays guitar
http://exemple.com/Elvis
lives in Las Vegas
50. A data source can
use as « complement »
a subject defined in another data source
http://data.insee.fr/Paris
is in France
Elvis is in concert in
http://data.insee.fr/Paris
51. A data source can
use a « verb »
defined in another data source
http://exemple.fr/meet
is a
property (linking 2 people)
Thomas
http://exemple.fr/meet
Oliver
52. From a web of
documents
identified by URLs and interlinked
by hypertext links…
53. … to a web of data
identified by URIs and interlinked
using triples
« subject verb complement »
56. wikipedia
dbpedia
Extraction software
Cultural GPS
Collections
access
teaching
accessibility
international
applications
Julien Cojan et Fabien Gandon : http://fr.slideshare.net/JulienCojan/dbpedia-cafein
57. Julien Cojan et Fabien Gandon : http://fr.slideshare.net/JulienCojan/dbpedia-cafein
58. Find a resource in DBPedia
1. Look up something in DBPedia
– « Jack Sparrow »
1. Note the URL of the Wikipedia page
– http://en.wikipedia.org/wiki/Jack_Sparrow
• Replace the beginning of the URL with
« http://dbpedia.org/resource/ »
– http://dbpedia.org/resource/Jack_Sparrow
60. Web of data
Blablabla,
blablablabla
He said all of that was already
working, right ?
Arrière plan de l’image issu du blog des bits: http://nurdcartoon.blogspot.com/
61. Find the common point between
- Pierre Curie: French phycisist
- Boutros Boutros Ghali: Egyptian diplomat
- Jackie Kennedy : JFK’s wife
64. for your data
1. Persistent Identifiers
2. Persistent access to data file
3. Data archival
4. Metadata publishing
1. URIs and content negociation
2. OAI-PMH
3. SPARQL endpoint
5. In the future… linking (to DBPedia) ?
67. 2. Access
• Data (embeddable in another website)
– http://www.nakala.fr/data/11280/1b2c0d4f
• Metadata
– Human or machine version
• http://www.nakala.fr/metadata/11280/1b2c0d4f
– Human version
• http://www.nakala.fr/page/data/11280/1b2c0d4f
– Machine version
• http://www.nakala.fr/data/data/11280/1b2c0d4f
68. 3. Harvest or query
• OAI-PMH publishing (your data only)
– https://www.nakala.fr/oai/11280/93ec8e76?
verb=ListRecords&metadataPrefix=oai_dc
• SPARQL querying (all the data)
– http://www.nakala.fr/sparql
69. Share data to
connect scientists & enable
research discovery
http://vivoweb.org
70. What is VIVO ?
• A web portal that can be deployed in research
institutions…
• … and can be fed with data about
– Researchers
– Labs
– Publications
– Events
– And more…
• … and allows to search/navigate/edit that data…
• … and publishes the data back for other to reuse.
71. What is VIVO ?
• Exemple installations
– Meta-VIVO :
http://vivo.vivoweb.org
– U. Florida :
https://vivo.ufl.edu/
– Bournemouth :
http://staffprofiles.bourn
emouth.ac.uk/
• (find others at vivoweb.org)
73. vivosearch.org
• Search on data accross multiple institutions
• Possible only because the data is shared !
74. Interinstitutional collaboration
dataviz
• http://xcite.hackerceo
.org/VIVOviz/visualizat
ion.html
• Possible only because
the data is shared…
• … and the data is
talking about the
same “thing” (here,
the same publication)
75. Using data from the web to
enrich content reading
http://labs.sparna.fr
http://dev.presek-i.com/onmt_demo/
76.
77. Create mashups
With data from the web
http://labs.antidot.net/museesdefrance
78.
79. Use data from the web to
power an API
http://seevl.net
80. “The data seevl utilizes come from YouTube, Musicbrainz, Freebase, DBPedia, Google Plus,
and Facebook, and other sources”.
82. Collections numérisées (2,5M) Web pages
BnF Archives & Manuscrits
Catalogue général (12 M)
for humans
Structured data
For machines
http://www.rencontres-numeriques.org/2013/mediation/docs/rn2013-BNF-opendata.pptm
83. data.bnf.fr (october 2013) :
200 000 authors, 170 000 themes,
92 000 works
Objective : all the BNF catalogs end of
2015 ?
data.bnf.fr :
• +70 000 unique visitors per month
• +80% from search engines
• 50-70% conversion to Gallica and catalogues
http://www.rencontres-numeriques.org/2013/mediation/docs/rn2013-BNF-opendata.pptm