SlideShare a Scribd company logo
1 of 52
Download to read offline
An Ambitious Wikidata Tutorial
Emw
WikiConference USA
Washington, D.C.
2015-10-10 (Updated 2015-10-13)
Wikidata is a free knowledge base that can be
read and edited by
humans and machines.
Wikidata's goals
● Centralize interwiki links
● Centralize infoboxes
● Provide an interface for rich queries
● Structure the sum of all human knowledge
What you'll learn from this talk
● How to edit Wikidata
● Projects built with Wikidata
● Wikidata API programming
● Ontologies
● Wikidata vocabulary
● Where to find things
● Querying with SPARQL, etc.
● RDF and OWL exports
Elements of a Wikidata statement
Example: Washington, D.C. (Q61)
Items and properties
●
Each item and property has its own page
● Items
– Represent subjects: Barbara McClintock, Challenger disaster
– Have identifiers like Q199654, Q921090
– 14,875,838 items as of 2015-10-05
● Properties
– Represent attribute names: occupation, cause of
– Have identifiers like P106, P828
– 1,805 properties as of 2015-10-05
Statements and claims
● Claims
– Claims are “triplets”
● Formally: subject, predicate, object
● In Wikidata: item, property, value
● Example: Barbara McClintock, occupation, scientist
● Statements
– A claim is only part of a statement
– Statements also include:
● References
● Ranks
Qualifiers, ranks, references
● Qualifiers
– Qualifiers are properties used on claims rather than items
– “Bethesda population 56,527 point in time (P585) 1960”
● Ranks
– Preferred, normal, deprecated
– Useful to mark outdated claims
● References
– Source of claim; provenance
– “... stated in (P248) 1960 United States Census”
More on Wikidata vocabulary
https://www.wikidata.org/wiki/Wikidata:Glossary
Wikipedia articles have a Wikidata item link in the left
navigation panel.
Wikidata link on Wikipedia
Getting to Wikidata from Wikipedia
Instant search suggests items that have labels or
aliases matching your keyword.
Wikidata search
Search by label
Search by alias: “flu” -> influenza
Finding properties
● Is there a property for “number of windows”?
● What was the ID of that property, again?
● Search
– In main site search box, prefix search term with “P:”
– “P:number of”, “P:occupation”
– Instant search doesn't work for properties, only items
● Browse
– https://www.wikidata.org/wiki/Wikidata:List_of_properties
^ bookmark this!
Let's edit Wikidata.
Barbara McClintock
https://www.wikidata.org/wiki/Q199654
●
employer (P108):
Cold Spring Harbor Laboratory
start time (P580): December 1941
● member of (P463):
National Academy of Sciences
start time (P580): 1944
American Academy of Arts and Sciences
start time: 1959
Royal Society
start time: 1989
● award received (P166):
Nobel Prize in Physiology or Medicine
for work (P1686): mobile genetic elements
National Medal of Science
point in time (P585): 1971
● birth name (P1477): Eleanor McClintock
Area? Height? GDP per capita?
● Quantities with units recently made possible!
● area (P2046)
● height (P2048)
● mass (P2067)
● cost (P2130)
● GDP per capita (P2132)
● total debt (P2133)
Quantities: Lots of low-hanging fruit
● Not yet on Wikidata:
– Area of Washington, D.C.
– Height of Abraham Lincoln
– Height of United States Capitol
– Length of Mississippi River
– GDP per capita of the United States of America
^ Add these!
Built on Wikidata
● Histropedia
300,000 timelines and 1.5 million events
http://histropedia.com/timeline/1fr22b0p8s/Empires
● Reasonator
Wikidata knowledge tailored for readers
J.S. Bach: http://tools.wmflabs.org/reasonator/?q=Q1339
● Gene Wiki
Every human gene is now on Wikidata
http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub-for-linked-open-life-
science-data/
(More info today in Open Biomedical Knowledge session at 2:15 PM)
Histropedia
http://histropedia.com/timeline/1fr22b0p8s/Empires
Tools
– Wikidata API
● https://www.wikidata.org/w/api.php
– Querying:
● Autolist:
http://tools.wmflabs.org/autolist/autolist1.html
● Wikidata Query Service (new!):
https://query.wikidata.org
– Software framework: Wikidata Toolkit
● https://www.mediawiki.org/wiki/Wikidata_Toolkit
● https://github.com/Wikidata/Wikidata-Toolkit
Wikidata API
Quick Python demo
Querying in Wikidata
List of politicians who died of cancer
Pseudo-query:
occupation: politician AND cause of death: cancer
occupation: P106
politician: Q82955
cause of death: P509
cancer: Q12078
Wikidata query in Autolist:
claim[106:82955] AND claim[509:12078]
http://tools.wmflabs.org/autolist/autolist1.html?q=claim[106:82955]%20AND%20claim[509:12078]
Only 56 politicians have died of cancer?
Seems rather low...
Classification on Wikidata
● Taxonomy of knowledge
● Enables powerful inference, novel applications
● Interesting philosophical, design, and engineering issues
Tree of Porphyry
User:VoiceOfTheCommons, CC-BY-SA 3.0
Classes and instances
● Plato is a human is a animal
● Plato instance of human subclass of animal
● Instance: concrete object, individual
● Class: abstract object
Classification on Wikidata
● instance of (P31)
– rdf:type in RDF and OWL
– Most popular Wikidata property
● subclass of (P279)
– “all instances of A are also instances of B”
– rdfs:subClassOf in RDF and OWL
Examples
● USS Nimitz instance of Nimitz-class aircraft carrier
Nimitz-class aircraft carrier subclass of aircraft carrier
● 2012 Cannes Film Festival instance of Cannes Film Festival
Cannes Film Festival subclass of film festival
● an individual charm quark instance of charm quark
charm quark subclass of quark
^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances.
(There are no items about individual quarks on Wikidata!)
https://www.wikidata.org/wiki/Help:Basic_membership_properties
Bad smells
Item has many instance of or subclass of claims
Items typically satisfy a huge number of instance of claims:
● Fido instance of dog
● Fido instance of English Pointer
● Fido instance of faithful animal
● …
Solution: use one class for instance of, put other class knowledge into
normal properties
● Fido instance of dog
● Fido breed: English Pointer
● Fido known for: faithfulness
● ...
Bad smells
subclass of claim that is nonsensical when interpreted as “All instances of A
are also instances of B”
Example:
dog subclass of pet
But not all dogs are pets!
feral dog subclass of dog true
feral dog subclass of pet false
:. dog subclass of pet false
Solution: put “pet” knowledge about dogs into claim that does not apply to all
instances of dog. E.g. “dog has role pet”. (Has role would not be transitive.)
Classification on Wikidata
● Last but not least: part of (P361)
– Third basic membership property
– Top-level “part-whole” relation
● subclass of and part of are both transitive; instance of is not transitive
● Transitive relation:
A subclass of B
B subclass of C
:. A subclass of C
https://www.wikidata.org/wiki/Help:Basic_membership_properties
subclass of (P279)
enables machines to infer
conceptual hierarchy
Recall
Query reports that only 56 politicians have died of cancer
Problem
Only matches the precise claim “cause of death: cancer”
Omits results that have:
– cause of death: lung cancer
– cause of death: lymphoma
– cause of death: leukemia
Solution
Include causes of death that are a subclass of cancer
– In Autolist: http://tinyurl.com/ovgjqd8
– Also possible in SPARQL in new Wikidata Query Service
SPARQL
● SPARQL: semantic query language for databases
● Wikidata recently added official support
– Query UI: https://query.wikidata.org
– Examples:
https://www.mediawiki.org/wiki/Wikibase/Indexing
/SPARQL_Query_Examples
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?politician ?cause ?politician_label ?cause_of_death_label WHERE {
?politician wdt:P106 wd:Q82955 . # find items that have "occupation (P106): politician (Q82955)"
?politician wdt:P509 ?cause . # with a P509 (cause of death) claim
?cause wdt:P279* wd:Q12078 . # ... where the cause is a subclass of (P279*) cancer (Q12078)
# ?politician wdt:P39 wd:Q11696 . # Uncomment this line to include only U.S. Presidents
OPTIONAL {?politician rdfs:label ?politician_label filter (lang(?politician_label) = "en") .}
OPTIONAL {?cause rdfs:label ?cause_of_death_label filter (lang(?cause_of_death_label) = "en").}
}
ORDER BY ASC (?politician)
Example SPARQL query
List of politicians who died of cancer:
Live demo: http://tinyurl.com/nh7jc2p
http://tinyurl.com/nh7jc2p
?cause wdt:P279* wd:Q12078 .
Be sure to add an asterisk (*) to get the subclass tree!
Autolist vs. Wikidata Query Service
Autolist Wikidata Query Service
URL https://tools.wmflabs.org/autolist/ https://query.wikidata.org
Syntax WDQ (custom, but succinct) SPARQL (W3C standard)
Support status Unofficial Official, beta
Release date 2013-09-17 2015-09-07
Developer Magnus Manske Stas Malyshev
Source code https://bitbucket.org/magnusmanske
/wikidataquery
https://github.com/wikimedia/wikidata-
query-rdf/
License GPL 2+ Apache 2.0
Technology C++ Java, Blazegraph
How to:
Explore RDF/OWL dumps locally
● Get the most recent dumps:
http://tools.wmflabs.org/wikidata-exports/rdf/
● Small, interesting: wikidata-taxonomy.nt.gz
● Download and install Protege:
http://protege.stanford.edu/
Protege
● http://protege.stanford.edu/
● The browser of the Semantic Web
● Good for small- to medium-sized ontologies
Querying cancer types in wikidata-taxonomy.nt.gz
Open questions: Modeling causes
What caused or causes:
– The Space Shuttle Challenger explosion?
– The dinosaurs to die?
– Malaria? Cancer?
– The American Civil War?
Causation on Wikidata
● https://www.wikidata.org/wiki/Help:Modeling_causes
● has cause (P828) (alias underlying cause): thing that ultimately resulted in
the effect
● has immediate cause (P1478): thing that proximately resulted in the effect
● has contributing factor (P1479): thing that significantly influenced the
effect, but did not directly cause it
American Civil War (Q8676)
has cause:
slavery in the United States (Q118382) (preferred rank)
states' rights (Q48527) (deprecated rank)
...
has immediate cause:
Battle of Fort Sumter (Q543165)
United States presidential election, 1860 (Q698842)
...
has contributing factor:
caning of Charles Sumner (Q5032419)
Dred Scott v. Sandford (Q690462)
Bleeding Kansas (Q331377)
Uncle Tom's Cabin (Q2222)
...
What caused the
American Civil War?
Thank you!
https://www.wikidata.org/wiki/User:Emw

More Related Content

What's hot

Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark StreamingSigmoid
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 
data platform on kubernetes
data platform on kubernetesdata platform on kubernetes
data platform on kubernetes창언 정
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itBruno Faria
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020andyseaborne
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningPetar Ristoski
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDataWorks Summit
 
Data training tips and tricks
Data training tips and tricksData training tips and tricks
Data training tips and tricksSarah Jones
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software OverviewKNIMESlides
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaMetrics
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialAdonisDamian
 

What's hot (20)

SHACL by example
SHACL by exampleSHACL by example
SHACL by example
 
Real-Time Stock Market Analysis using Spark Streaming
 Real-Time Stock Market Analysis using Spark Streaming Real-Time Stock Market Analysis using Spark Streaming
Real-Time Stock Market Analysis using Spark Streaming
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
data platform on kubernetes
data platform on kubernetesdata platform on kubernetes
data platform on kubernetes
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
 
SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020
 
RDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data MiningRDF2Vec: RDF Graph Embeddings for Data Mining
RDF2Vec: RDF Graph Embeddings for Data Mining
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Data training tips and tricks
Data training tips and tricksData training tips and tricks
Data training tips and tricks
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
VictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - PreviewVictoriaLogs: Open Source Log Management System - Preview
VictoriaLogs: Open Source Log Management System - Preview
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 

Viewers also liked

Wikidata for libraries and archives
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives_Emw
 
Up and running with Wikidata
Up and running with WikidataUp and running with Wikidata
Up and running with Wikidata_Emw
 
Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Denny Vrandecic
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk CambridgeMagnus Manske
 
Linked Open Data Publications through Wikidata & Persistent Identification...
Linked Open Data  Publications through  Wikidata &  Persistent Identification...Linked Open Data  Publications through  Wikidata &  Persistent Identification...
Linked Open Data Publications through Wikidata & Persistent Identification...PACKED vzw
 
위키데이터 개론
위키데이터 개론위키데이터 개론
위키데이터 개론Jeongmin KIM
 
The Knowledge Discovery Quest
The Knowledge Discovery Quest The Knowledge Discovery Quest
The Knowledge Discovery Quest Ontotext
 
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...Simon Knight
 
Wikipedia
WikipediaWikipedia
Wikipediasimondw
 
Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015Aleksandra Janus
 
A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)Dominic McDevitt-Parks
 
wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3raypun101
 
Wikimedia for civil servants
Wikimedia for civil servantsWikimedia for civil servants
Wikimedia for civil servantsSara Thomas
 
Philippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of VolunteersPhilippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of VolunteersFeverBee Limited
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins PresentationQuick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins PresentationSara Snyder
 
Présentation des projets Wikimedia
Présentation des projets WikimediaPrésentation des projets Wikimedia
Présentation des projets WikimediaWikimédia France
 
20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in Bulgaria20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in BulgariaVassia Atanassova
 

Viewers also liked (20)

Wikidata for libraries and archives
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives
 
Up and running with Wikidata
Up and running with WikidataUp and running with Wikidata
Up and running with Wikidata
 
Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012Wikidata presentation at SemTechBiz Berlin 2012
Wikidata presentation at SemTechBiz Berlin 2012
 
2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge2014-02-27 Wikidata talk Cambridge
2014-02-27 Wikidata talk Cambridge
 
Linked Open Data Publications through Wikidata & Persistent Identification...
Linked Open Data  Publications through  Wikidata &  Persistent Identification...Linked Open Data  Publications through  Wikidata &  Persistent Identification...
Linked Open Data Publications through Wikidata & Persistent Identification...
 
위키데이터 개론
위키데이터 개론위키데이터 개론
위키데이터 개론
 
The Knowledge Discovery Quest
The Knowledge Discovery Quest The Knowledge Discovery Quest
The Knowledge Discovery Quest
 
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
Wikimedia Strategy - making it impactful, measuring impact, and thinking abou...
 
Wikipedia
WikipediaWikipedia
Wikipedia
 
Music scoring on Wikipedia
Music scoring on WikipediaMusic scoring on Wikipedia
Music scoring on Wikipedia
 
Wikidata
WikidataWikidata
Wikidata
 
Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015Open Monuments Project at GLAM Wiki Conference 2015
Open Monuments Project at GLAM Wiki Conference 2015
 
A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)A Primer on Wikipedia Editathons (MCN 2014)
A Primer on Wikipedia Editathons (MCN 2014)
 
wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3wikipedia event_HerstoryEditathon 3-2016_v1.3
wikipedia event_HerstoryEditathon 3-2016_v1.3
 
Wikimedia for civil servants
Wikimedia for civil servantsWikimedia for civil servants
Wikimedia for civil servants
 
Philippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of VolunteersPhilippe Beaudette - How To Motivate Legions of Volunteers
Philippe Beaudette - How To Motivate Legions of Volunteers
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins PresentationQuick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
Quick Start Guide to Editing Wikipedia - #WikiHumanOrigins Presentation
 
Présentation des projets Wikimedia
Présentation des projets WikimediaPrésentation des projets Wikimedia
Présentation des projets Wikimedia
 
20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in Bulgaria20160601 Wiki Loves Earth 2016 in Bulgaria
20160601 Wiki Loves Earth 2016 in Bulgaria
 

Similar to An Ambitious Wikidata Tutorial

A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsDr. Neil Brittliff
 
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Onando Mulang'
 
Wikipedia and Civic Engagement
Wikipedia and Civic EngagementWikipedia and Civic Engagement
Wikipedia and Civic EngagementAndrew Lih
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionJennifer D'Souza
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semanticeswcsummerschool
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Repo fringe - Wikidata
Repo fringe  - WikidataRepo fringe  - Wikidata
Repo fringe - WikidataEwan McAndrew
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsKrishna Sankar
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library DataRichard Wallis
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLJane Frazier
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 

Similar to An Ambitious Wikidata Tutorial (11)

A Little SPARQL in your Analytics
A Little SPARQL in your AnalyticsA Little SPARQL in your Analytics
A Little SPARQL in your Analytics
 
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
Arjun@WISE-2020 : Encoding Knowledge Graph Context in an Attentive Neural Net...
 
Wikipedia and Civic Engagement
Wikipedia and Civic EngagementWikipedia and Civic Engagement
Wikipedia and Civic Engagement
 
Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Repo fringe - Wikidata
Repo fringe  - WikidataRepo fringe  - Wikidata
Repo fringe - Wikidata
 
R, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science CompetitionsR, Data Wrangling & Kaggle Data Science Competitions
R, Data Wrangling & Kaggle Data Science Competitions
 
Web Driven Revolution For Library Data
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
 
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQLVALA Tech Camp 2017: Intro to Wikidata & SPARQL
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

An Ambitious Wikidata Tutorial

  • 1. An Ambitious Wikidata Tutorial Emw WikiConference USA Washington, D.C. 2015-10-10 (Updated 2015-10-13)
  • 2. Wikidata is a free knowledge base that can be read and edited by humans and machines.
  • 3. Wikidata's goals ● Centralize interwiki links ● Centralize infoboxes ● Provide an interface for rich queries ● Structure the sum of all human knowledge
  • 4. What you'll learn from this talk ● How to edit Wikidata ● Projects built with Wikidata ● Wikidata API programming ● Ontologies ● Wikidata vocabulary ● Where to find things ● Querying with SPARQL, etc. ● RDF and OWL exports
  • 5. Elements of a Wikidata statement
  • 7. Items and properties ● Each item and property has its own page ● Items – Represent subjects: Barbara McClintock, Challenger disaster – Have identifiers like Q199654, Q921090 – 14,875,838 items as of 2015-10-05 ● Properties – Represent attribute names: occupation, cause of – Have identifiers like P106, P828 – 1,805 properties as of 2015-10-05
  • 8. Statements and claims ● Claims – Claims are “triplets” ● Formally: subject, predicate, object ● In Wikidata: item, property, value ● Example: Barbara McClintock, occupation, scientist ● Statements – A claim is only part of a statement – Statements also include: ● References ● Ranks
  • 9. Qualifiers, ranks, references ● Qualifiers – Qualifiers are properties used on claims rather than items – “Bethesda population 56,527 point in time (P585) 1960” ● Ranks – Preferred, normal, deprecated – Useful to mark outdated claims ● References – Source of claim; provenance – “... stated in (P248) 1960 United States Census”
  • 10. More on Wikidata vocabulary https://www.wikidata.org/wiki/Wikidata:Glossary
  • 11. Wikipedia articles have a Wikidata item link in the left navigation panel. Wikidata link on Wikipedia
  • 12. Getting to Wikidata from Wikipedia
  • 13. Instant search suggests items that have labels or aliases matching your keyword. Wikidata search
  • 15. Search by alias: “flu” -> influenza
  • 16. Finding properties ● Is there a property for “number of windows”? ● What was the ID of that property, again? ● Search – In main site search box, prefix search term with “P:” – “P:number of”, “P:occupation” – Instant search doesn't work for properties, only items ● Browse – https://www.wikidata.org/wiki/Wikidata:List_of_properties ^ bookmark this!
  • 18. Barbara McClintock https://www.wikidata.org/wiki/Q199654 ● employer (P108): Cold Spring Harbor Laboratory start time (P580): December 1941 ● member of (P463): National Academy of Sciences start time (P580): 1944 American Academy of Arts and Sciences start time: 1959 Royal Society start time: 1989 ● award received (P166): Nobel Prize in Physiology or Medicine for work (P1686): mobile genetic elements National Medal of Science point in time (P585): 1971 ● birth name (P1477): Eleanor McClintock
  • 19. Area? Height? GDP per capita? ● Quantities with units recently made possible! ● area (P2046) ● height (P2048) ● mass (P2067) ● cost (P2130) ● GDP per capita (P2132) ● total debt (P2133)
  • 20. Quantities: Lots of low-hanging fruit ● Not yet on Wikidata: – Area of Washington, D.C. – Height of Abraham Lincoln – Height of United States Capitol – Length of Mississippi River – GDP per capita of the United States of America ^ Add these!
  • 21. Built on Wikidata ● Histropedia 300,000 timelines and 1.5 million events http://histropedia.com/timeline/1fr22b0p8s/Empires ● Reasonator Wikidata knowledge tailored for readers J.S. Bach: http://tools.wmflabs.org/reasonator/?q=Q1339 ● Gene Wiki Every human gene is now on Wikidata http://blog.wikimedia.de/2014/10/22/establishing-wikidata-as-the-central-hub-for-linked-open-life- science-data/ (More info today in Open Biomedical Knowledge session at 2:15 PM)
  • 23. Tools – Wikidata API ● https://www.wikidata.org/w/api.php – Querying: ● Autolist: http://tools.wmflabs.org/autolist/autolist1.html ● Wikidata Query Service (new!): https://query.wikidata.org – Software framework: Wikidata Toolkit ● https://www.mediawiki.org/wiki/Wikidata_Toolkit ● https://github.com/Wikidata/Wikidata-Toolkit
  • 25. Querying in Wikidata List of politicians who died of cancer Pseudo-query: occupation: politician AND cause of death: cancer occupation: P106 politician: Q82955 cause of death: P509 cancer: Q12078 Wikidata query in Autolist: claim[106:82955] AND claim[509:12078]
  • 27. Only 56 politicians have died of cancer? Seems rather low...
  • 28. Classification on Wikidata ● Taxonomy of knowledge ● Enables powerful inference, novel applications ● Interesting philosophical, design, and engineering issues
  • 29.
  • 31. Classes and instances ● Plato is a human is a animal ● Plato instance of human subclass of animal ● Instance: concrete object, individual ● Class: abstract object
  • 32. Classification on Wikidata ● instance of (P31) – rdf:type in RDF and OWL – Most popular Wikidata property ● subclass of (P279) – “all instances of A are also instances of B” – rdfs:subClassOf in RDF and OWL
  • 33. Examples ● USS Nimitz instance of Nimitz-class aircraft carrier Nimitz-class aircraft carrier subclass of aircraft carrier ● 2012 Cannes Film Festival instance of Cannes Film Festival Cannes Film Festival subclass of film festival ● an individual charm quark instance of charm quark charm quark subclass of quark ^ Many “leaf nodes” in Wikidata's taxonomic hierarchy are not instances. (There are no items about individual quarks on Wikidata!) https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 34. Bad smells Item has many instance of or subclass of claims Items typically satisfy a huge number of instance of claims: ● Fido instance of dog ● Fido instance of English Pointer ● Fido instance of faithful animal ● … Solution: use one class for instance of, put other class knowledge into normal properties ● Fido instance of dog ● Fido breed: English Pointer ● Fido known for: faithfulness ● ...
  • 35. Bad smells subclass of claim that is nonsensical when interpreted as “All instances of A are also instances of B” Example: dog subclass of pet But not all dogs are pets! feral dog subclass of dog true feral dog subclass of pet false :. dog subclass of pet false Solution: put “pet” knowledge about dogs into claim that does not apply to all instances of dog. E.g. “dog has role pet”. (Has role would not be transitive.)
  • 36. Classification on Wikidata ● Last but not least: part of (P361) – Third basic membership property – Top-level “part-whole” relation ● subclass of and part of are both transitive; instance of is not transitive ● Transitive relation: A subclass of B B subclass of C :. A subclass of C https://www.wikidata.org/wiki/Help:Basic_membership_properties
  • 37. subclass of (P279) enables machines to infer conceptual hierarchy
  • 38. Recall Query reports that only 56 politicians have died of cancer Problem Only matches the precise claim “cause of death: cancer” Omits results that have: – cause of death: lung cancer – cause of death: lymphoma – cause of death: leukemia Solution Include causes of death that are a subclass of cancer – In Autolist: http://tinyurl.com/ovgjqd8 – Also possible in SPARQL in new Wikidata Query Service
  • 39. SPARQL ● SPARQL: semantic query language for databases ● Wikidata recently added official support – Query UI: https://query.wikidata.org – Examples: https://www.mediawiki.org/wiki/Wikibase/Indexing /SPARQL_Query_Examples
  • 40. PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?politician ?cause ?politician_label ?cause_of_death_label WHERE { ?politician wdt:P106 wd:Q82955 . # find items that have "occupation (P106): politician (Q82955)" ?politician wdt:P509 ?cause . # with a P509 (cause of death) claim ?cause wdt:P279* wd:Q12078 . # ... where the cause is a subclass of (P279*) cancer (Q12078) # ?politician wdt:P39 wd:Q11696 . # Uncomment this line to include only U.S. Presidents OPTIONAL {?politician rdfs:label ?politician_label filter (lang(?politician_label) = "en") .} OPTIONAL {?cause rdfs:label ?cause_of_death_label filter (lang(?cause_of_death_label) = "en").} } ORDER BY ASC (?politician) Example SPARQL query List of politicians who died of cancer: Live demo: http://tinyurl.com/nh7jc2p
  • 42. ?cause wdt:P279* wd:Q12078 . Be sure to add an asterisk (*) to get the subclass tree!
  • 43. Autolist vs. Wikidata Query Service Autolist Wikidata Query Service URL https://tools.wmflabs.org/autolist/ https://query.wikidata.org Syntax WDQ (custom, but succinct) SPARQL (W3C standard) Support status Unofficial Official, beta Release date 2013-09-17 2015-09-07 Developer Magnus Manske Stas Malyshev Source code https://bitbucket.org/magnusmanske /wikidataquery https://github.com/wikimedia/wikidata- query-rdf/ License GPL 2+ Apache 2.0 Technology C++ Java, Blazegraph
  • 44. How to: Explore RDF/OWL dumps locally ● Get the most recent dumps: http://tools.wmflabs.org/wikidata-exports/rdf/ ● Small, interesting: wikidata-taxonomy.nt.gz ● Download and install Protege: http://protege.stanford.edu/
  • 45.
  • 46. Protege ● http://protege.stanford.edu/ ● The browser of the Semantic Web ● Good for small- to medium-sized ontologies
  • 47.
  • 48. Querying cancer types in wikidata-taxonomy.nt.gz
  • 49. Open questions: Modeling causes What caused or causes: – The Space Shuttle Challenger explosion? – The dinosaurs to die? – Malaria? Cancer? – The American Civil War?
  • 50. Causation on Wikidata ● https://www.wikidata.org/wiki/Help:Modeling_causes ● has cause (P828) (alias underlying cause): thing that ultimately resulted in the effect ● has immediate cause (P1478): thing that proximately resulted in the effect ● has contributing factor (P1479): thing that significantly influenced the effect, but did not directly cause it
  • 51. American Civil War (Q8676) has cause: slavery in the United States (Q118382) (preferred rank) states' rights (Q48527) (deprecated rank) ... has immediate cause: Battle of Fort Sumter (Q543165) United States presidential election, 1860 (Q698842) ... has contributing factor: caning of Charles Sumner (Q5032419) Dred Scott v. Sandford (Q690462) Bleeding Kansas (Q331377) Uncle Tom's Cabin (Q2222) ... What caused the American Civil War?