SlideShare a Scribd company logo
1 of 18
Using Wikidata as an open,
community-maintained database of
biomedical knowledge
Andrew Su, Ph.D.
@andrewsu
http://sulab.org
July 23, 2017
#BOSC2017
Slides: slideshare.net/andrewsu
(open source
tools for
open data)
The Gene Wiki project, circa 2008
2
Huss, PLoS Biol, 2008
Data imported
from structured
databases
Summarized
knowledge via
crowdsourcing
3
is to data
is to text
biomedical
Provide a database of the world’s
knowledge that anyone can edit
- Denny Vrandečić
Subclass of
Regulates
Physically
interacts with
Protein
Neural
development
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
VLDL receptor Q1979313
Amyloid
beta A4 Q423510
Q13561329
http://www.wikidata.org/wiki/Q13561329
Decreased
expression in
Property:P1910
Schizophrenia Q41112
Bipolar disorder Q131755
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
Q1979313
Q423510
Q13561329
Property:P1910
Q41112
Q131755
https://www.wikidata.org/wiki/Special:EntityData/Q13561329.json
7
Qualifiers
References
8
EMA
GWAS CentralPubChem
Simple data retrieval
9
39 genes
gene geneLabel gene geneLabel gene geneLabel gene geneLabel
Q5013317 COL22A1 Q18027370 IGSF3 Q18053559 CDHR3 Q14903974 SMAD3
Q14912759 SLC22A5 Q18045382 HPSE2 Q18045669 ATG3 Q18033889 IL1RL1
Q14914243 PSAP Q18048437 IL33 Q18035037 RAD50 Q17917202 ERBB4
Q14907990 SLC30A8 Q18051900 PYHIN1 Q18036984 FBXL7 Q18027836 IL6R
Q18025002 GAB1 Q17709208 ACO1 Q18033919 XPR1 Q18030185 NOTCH4
Q18035589 C6orf10 Q18027822 IL2RB Q15326496 RORA Q18030409 PDE4D
Q18054256 GSDMA Q18030364 PBX2 Q18042132 GSDMB Q18045645 IKZF4
Q18058487 C5orf56 Q18037773 ABI3BP Q18029145 MKLN1 Q18039979 KLHL5
Q18030785 PRKG1 Q18039623 CTNNA3 Q18036729 RAP1GAP2 Q18026947 HLA-DQA1
Q18033424 IL18R1 Q18046350 ZNF665 Q14878303 IL13
“Retrieve genes with
GWAS association
with asthma”
http://bit.ly/bosc2017_wikidata
Data integration
10
“Retrieve genes with
GWAS association
with asthma and gene
product is localized to
membrane”
gene geneLabel gene geneLabel gene geneLabel gene geneLabel
Q14912759 SLC22A5 Q18027370 IGSF3 Q18035037 RAD50 Q18027836 IL6R
Q14914243 PSAP Q18033424 IL18R1 Q18033919 XPR1 Q18030409 PDE4D
Q14907990 SLC30A8 Q18045382 HPSE2 Q18042132 GSDMB Q18030185 NOTCH4
Q18035589 C6orf10 Q18027822 IL2RB Q18036729 RAP1GAP2 Q18026947 HLA-DQA1
Q18054256 GSDMA Q18053559 CDHR3 Q18033889 IL1RL1
Q18030785 PRKG1 Q14903974 SMAD3 Q17917202 ERBB4
22 genes
http://bit.ly/bosc2017_wikidata
Computing on provenance
11
“Retrieve genes with
GWAS association
with asthma and gene
product is localized to
membrane (non-IEA)”
gene geneLabel gene geneLabel gene geneLabel
Q14912759 SLC22A5 Q18045382 HPSE2 Q17917202 ERBB4
Q14914243 PSAP Q18027822 IL2RB Q18027836 IL6R
Q14907990 SLC30A8 Q14903974 SMAD3 Q18030409 PDE4D
Q18027370 IGSF3 Q18035037 RAD50 Q18030185 NOTCH4
Q18033424 IL18R1 Q18036729 RAP1GAP2 Q18026947 HLA-DQA1
15 genes
http://bit.ly/bosc2017_wikidata
Leveraging the Disease Ontology structure
12
“Retrieve genes with GWAS
association with any
respiratory disease and
gene product is localized to
membrane (non-IEA)”
31 genes / 8 diseases
diseaseGALabel gene_counts geneList
asthma 15
SMAD3, RAP1GAP2, IL18R1, HPSE2,
SLC30A8, SLC22A5, PSAP, ERBB4, HLA-
DQA1, IGSF3, IL2RB, IL6R, NOTCH4, PDE4D,
RAD50
chronic obstructive pulmonary
disease 5 HLA-C, SFTPD, ANXA5, ANXA11, ATP2C2
lung cancer 3 TGM5, VTI1A, PHACTR2
interstitial lung disease 2 DSP, ATP11A
non-small-cell lung carcinoma 2 NALCN, DLST
nasopharynx carcinoma 2 ITGA9, TNFRSF19
adenocarcinoma of the lung 1 BTNL2
pulmonary emphysema 1 BICD1
http://bit.ly/bosc2017_wikidata
Opportunistic integration
13
diseaseGALabel exposureLabel
lung cancer arsenic pentoxide exposure
lung cancer HN1 exposure
lung cancer mechlorethamine exposure
lung cancer HN3 exposure
asthma Phenacyl chloride exposure
pulmonary emphysema phosgene exposure
“Retrieve genes with GWAS
association with any
respiratory disease and
gene product is localized to
membrane (non-IEA) and
show causative chemical
hazards”
4 diseases / 6 chemical hazards
http://bit.ly/bosc2017_wikidata
Small data to big data
14
?
Chlambase.org for the Chlamydia research community
15
Community-specific
knowledge
Genetic mutants, gene
expression, host-pathogen
interactions, orthologs, ….
Domain-specific applications based on Wikidata
16
Chlambase
Open source
17
github.com/SuLab/GeneWikiCentral
github.com/SuLab/wikidataintegrator – python module for Wikidata
github.com/SuLab/scheduled-bots – bot automation framework
github.com/SuLab/WikiGenomes.org
github.com/SuLab/ChlamBase.org
github.com/SuLab/Genewiki-ShEx – data models
github.com/SuLab/wdbiothings – wrapper for BioThings APIs
Expert interfaces
License
18
Crowd volunteers
and partners
Andra
Waagmeester
Lynn
Schriml
Elvira
Mitraka
U. Maryland, Baltimore
MicelioUBC
Paul Pavlidis
Ben GoodGreg Stupp Sebastian
Burgstaller
Tim
Putman
Ginger
Tsueng
Nuria
Queralt
Rosinach
bit.ly/genewikidata
sulab.org
Join us!
U. Washington
Kevin Hybiske

More Related Content

What's hot

ZFN-Science-Rats
ZFN-Science-RatsZFN-Science-Rats
ZFN-Science-Rats
Greg Davis
 
Gastrointestinal nematodiasis
Gastrointestinal nematodiasisGastrointestinal nematodiasis
Gastrointestinal nematodiasis
Nasrin Lucky
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Human Variome Project
 
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-CattleFrequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
UniversitasGadjahMada
 
Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...
Alexander Decker
 

What's hot (20)

Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
ZFN-Science-Rats
ZFN-Science-RatsZFN-Science-Rats
ZFN-Science-Rats
 
Transcriptional profiling as a key tool to improve marbling in beef cattle.
Transcriptional profiling as a key tool to improve marbling in beef cattle.Transcriptional profiling as a key tool to improve marbling in beef cattle.
Transcriptional profiling as a key tool to improve marbling in beef cattle.
 
Integrative network based analysis of mRNA and microRNA expression
Integrative network based analysis of mRNA and microRNA expressionIntegrative network based analysis of mRNA and microRNA expression
Integrative network based analysis of mRNA and microRNA expression
 
Equine Exchange
Equine ExchangeEquine Exchange
Equine Exchange
 
Gastrointestinal nematodiasis
Gastrointestinal nematodiasisGastrointestinal nematodiasis
Gastrointestinal nematodiasis
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-CattleFrequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
 
unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...
unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...
unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...
 
The BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe BeroudThe BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe Beroud
 
Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...
 
Wildlife-livestock-human interface: recognising drivers of disease
Wildlife-livestock-human interface: recognising drivers of diseaseWildlife-livestock-human interface: recognising drivers of disease
Wildlife-livestock-human interface: recognising drivers of disease
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...
 
PLoS One MRSA paper
PLoS One MRSA paperPLoS One MRSA paper
PLoS One MRSA paper
 
Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...
Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...
Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...
 
Capstone Project - The Battle of Neighbourhoods Impact of COVID-19 in Andhra...
Capstone Project - The Battle of Neighbourhoods  Impact of COVID-19 in Andhra...Capstone Project - The Battle of Neighbourhoods  Impact of COVID-19 in Andhra...
Capstone Project - The Battle of Neighbourhoods Impact of COVID-19 in Andhra...
 
Ketter Reinesberg
Ketter ReinesbergKetter Reinesberg
Ketter Reinesberg
 
6 - Allen Poultry PFGE
6 - Allen Poultry PFGE6 - Allen Poultry PFGE
6 - Allen Poultry PFGE
 
Web applications for rapid microbial taxonomy identification
Web applications for rapid microbial taxonomy identification Web applications for rapid microbial taxonomy identification
Web applications for rapid microbial taxonomy identification
 

Similar to BOSC2017: Using Wikidata as an open, community-maintained database of biomedical knowledge

RESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docx
RESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docxRESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docx
RESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docx
rgladys1
 
Insilico binding studies on tau protein and pp2 a as alternative targets in a...
Insilico binding studies on tau protein and pp2 a as alternative targets in a...Insilico binding studies on tau protein and pp2 a as alternative targets in a...
Insilico binding studies on tau protein and pp2 a as alternative targets in a...
SSR Institute of International Journal of Life Sciences
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
Matthew Clark
 
Inflammation and Vascular Damage in Hypertension
Inflammation and Vascular Damage in HypertensionInflammation and Vascular Damage in Hypertension
Inflammation and Vascular Damage in Hypertension
InsideScientific
 

Similar to BOSC2017: Using Wikidata as an open, community-maintained database of biomedical knowledge (20)

Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
RESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docx
RESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docxRESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docx
RESEARCH ARTICLERadio Electric Asymmetric Conveyer (REAC).docx
 
2019 Triangle Machine Learning Day - Machine Learning from De-Identified Code...
2019 Triangle Machine Learning Day - Machine Learning from De-Identified Code...2019 Triangle Machine Learning Day - Machine Learning from De-Identified Code...
2019 Triangle Machine Learning Day - Machine Learning from De-Identified Code...
 
Data analysis in life science vincenzo lagani
Data analysis in life science   vincenzo laganiData analysis in life science   vincenzo lagani
Data analysis in life science vincenzo lagani
 
Insilico binding studies on tau protein and pp2 a as alternative targets in a...
Insilico binding studies on tau protein and pp2 a as alternative targets in a...Insilico binding studies on tau protein and pp2 a as alternative targets in a...
Insilico binding studies on tau protein and pp2 a as alternative targets in a...
 
Our Genome-Edited Future: the Promise and the Challenge
Our Genome-Edited Future: the Promise and the ChallengeOur Genome-Edited Future: the Promise and the Challenge
Our Genome-Edited Future: the Promise and the Challenge
 
LUXEMBOURG CREATIVE 2017 : bioinformatique (1)
LUXEMBOURG CREATIVE 2017 : bioinformatique (1)LUXEMBOURG CREATIVE 2017 : bioinformatique (1)
LUXEMBOURG CREATIVE 2017 : bioinformatique (1)
 
Plenary presentation saturday 11 7_dr. lucie bruijn
Plenary presentation  saturday 11 7_dr. lucie bruijnPlenary presentation  saturday 11 7_dr. lucie bruijn
Plenary presentation saturday 11 7_dr. lucie bruijn
 
Neo4j GraphTalk Basel - Using Graph Technology to drive Diabetes Reserach
Neo4j GraphTalk Basel - Using Graph Technology to drive Diabetes ReserachNeo4j GraphTalk Basel - Using Graph Technology to drive Diabetes Reserach
Neo4j GraphTalk Basel - Using Graph Technology to drive Diabetes Reserach
 
CRTH2 Antagonists for the Treatment of Asthma and Allergic Rhinoconjunctivitis
CRTH2 Antagonists for the Treatment of Asthma and Allergic RhinoconjunctivitisCRTH2 Antagonists for the Treatment of Asthma and Allergic Rhinoconjunctivitis
CRTH2 Antagonists for the Treatment of Asthma and Allergic Rhinoconjunctivitis
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
 
Pneumonia Vaccination
Pneumonia VaccinationPneumonia Vaccination
Pneumonia Vaccination
 
Ncomms5563
Ncomms5563Ncomms5563
Ncomms5563
 
Basler modellers.210126reduced
Basler modellers.210126reducedBasler modellers.210126reduced
Basler modellers.210126reduced
 
Inflammation and Vascular Damage in Hypertension
Inflammation and Vascular Damage in HypertensionInflammation and Vascular Damage in Hypertension
Inflammation and Vascular Damage in Hypertension
 
Ameliorative effects of recombinant human erythropoietin ijrpp
Ameliorative effects of recombinant human erythropoietin ijrppAmeliorative effects of recombinant human erythropoietin ijrpp
Ameliorative effects of recombinant human erythropoietin ijrpp
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
 

Recently uploaded

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Recently uploaded (20)

Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 

BOSC2017: Using Wikidata as an open, community-maintained database of biomedical knowledge

  • 1. Using Wikidata as an open, community-maintained database of biomedical knowledge Andrew Su, Ph.D. @andrewsu http://sulab.org July 23, 2017 #BOSC2017 Slides: slideshare.net/andrewsu (open source tools for open data)
  • 2. The Gene Wiki project, circa 2008 2 Huss, PLoS Biol, 2008 Data imported from structured databases Summarized knowledge via crowdsourcing
  • 3. 3
  • 4. is to data is to text biomedical Provide a database of the world’s knowledge that anyone can edit - Denny Vrandečić
  • 5. Subclass of Regulates Physically interacts with Protein Neural development Property:P279 Property:P128 Property:P129 Q8054 Q1345738 VLDL receptor Q1979313 Amyloid beta A4 Q423510 Q13561329 http://www.wikidata.org/wiki/Q13561329 Decreased expression in Property:P1910 Schizophrenia Q41112 Bipolar disorder Q131755
  • 9. Simple data retrieval 9 39 genes gene geneLabel gene geneLabel gene geneLabel gene geneLabel Q5013317 COL22A1 Q18027370 IGSF3 Q18053559 CDHR3 Q14903974 SMAD3 Q14912759 SLC22A5 Q18045382 HPSE2 Q18045669 ATG3 Q18033889 IL1RL1 Q14914243 PSAP Q18048437 IL33 Q18035037 RAD50 Q17917202 ERBB4 Q14907990 SLC30A8 Q18051900 PYHIN1 Q18036984 FBXL7 Q18027836 IL6R Q18025002 GAB1 Q17709208 ACO1 Q18033919 XPR1 Q18030185 NOTCH4 Q18035589 C6orf10 Q18027822 IL2RB Q15326496 RORA Q18030409 PDE4D Q18054256 GSDMA Q18030364 PBX2 Q18042132 GSDMB Q18045645 IKZF4 Q18058487 C5orf56 Q18037773 ABI3BP Q18029145 MKLN1 Q18039979 KLHL5 Q18030785 PRKG1 Q18039623 CTNNA3 Q18036729 RAP1GAP2 Q18026947 HLA-DQA1 Q18033424 IL18R1 Q18046350 ZNF665 Q14878303 IL13 “Retrieve genes with GWAS association with asthma” http://bit.ly/bosc2017_wikidata
  • 10. Data integration 10 “Retrieve genes with GWAS association with asthma and gene product is localized to membrane” gene geneLabel gene geneLabel gene geneLabel gene geneLabel Q14912759 SLC22A5 Q18027370 IGSF3 Q18035037 RAD50 Q18027836 IL6R Q14914243 PSAP Q18033424 IL18R1 Q18033919 XPR1 Q18030409 PDE4D Q14907990 SLC30A8 Q18045382 HPSE2 Q18042132 GSDMB Q18030185 NOTCH4 Q18035589 C6orf10 Q18027822 IL2RB Q18036729 RAP1GAP2 Q18026947 HLA-DQA1 Q18054256 GSDMA Q18053559 CDHR3 Q18033889 IL1RL1 Q18030785 PRKG1 Q14903974 SMAD3 Q17917202 ERBB4 22 genes http://bit.ly/bosc2017_wikidata
  • 11. Computing on provenance 11 “Retrieve genes with GWAS association with asthma and gene product is localized to membrane (non-IEA)” gene geneLabel gene geneLabel gene geneLabel Q14912759 SLC22A5 Q18045382 HPSE2 Q17917202 ERBB4 Q14914243 PSAP Q18027822 IL2RB Q18027836 IL6R Q14907990 SLC30A8 Q14903974 SMAD3 Q18030409 PDE4D Q18027370 IGSF3 Q18035037 RAD50 Q18030185 NOTCH4 Q18033424 IL18R1 Q18036729 RAP1GAP2 Q18026947 HLA-DQA1 15 genes http://bit.ly/bosc2017_wikidata
  • 12. Leveraging the Disease Ontology structure 12 “Retrieve genes with GWAS association with any respiratory disease and gene product is localized to membrane (non-IEA)” 31 genes / 8 diseases diseaseGALabel gene_counts geneList asthma 15 SMAD3, RAP1GAP2, IL18R1, HPSE2, SLC30A8, SLC22A5, PSAP, ERBB4, HLA- DQA1, IGSF3, IL2RB, IL6R, NOTCH4, PDE4D, RAD50 chronic obstructive pulmonary disease 5 HLA-C, SFTPD, ANXA5, ANXA11, ATP2C2 lung cancer 3 TGM5, VTI1A, PHACTR2 interstitial lung disease 2 DSP, ATP11A non-small-cell lung carcinoma 2 NALCN, DLST nasopharynx carcinoma 2 ITGA9, TNFRSF19 adenocarcinoma of the lung 1 BTNL2 pulmonary emphysema 1 BICD1 http://bit.ly/bosc2017_wikidata
  • 13. Opportunistic integration 13 diseaseGALabel exposureLabel lung cancer arsenic pentoxide exposure lung cancer HN1 exposure lung cancer mechlorethamine exposure lung cancer HN3 exposure asthma Phenacyl chloride exposure pulmonary emphysema phosgene exposure “Retrieve genes with GWAS association with any respiratory disease and gene product is localized to membrane (non-IEA) and show causative chemical hazards” 4 diseases / 6 chemical hazards http://bit.ly/bosc2017_wikidata
  • 14. Small data to big data 14 ?
  • 15. Chlambase.org for the Chlamydia research community 15 Community-specific knowledge Genetic mutants, gene expression, host-pathogen interactions, orthologs, ….
  • 16. Domain-specific applications based on Wikidata 16 Chlambase
  • 17. Open source 17 github.com/SuLab/GeneWikiCentral github.com/SuLab/wikidataintegrator – python module for Wikidata github.com/SuLab/scheduled-bots – bot automation framework github.com/SuLab/WikiGenomes.org github.com/SuLab/ChlamBase.org github.com/SuLab/Genewiki-ShEx – data models github.com/SuLab/wdbiothings – wrapper for BioThings APIs Expert interfaces License
  • 18. 18 Crowd volunteers and partners Andra Waagmeester Lynn Schriml Elvira Mitraka U. Maryland, Baltimore MicelioUBC Paul Pavlidis Ben GoodGreg Stupp Sebastian Burgstaller Tim Putman Ginger Tsueng Nuria Queralt Rosinach bit.ly/genewikidata sulab.org Join us! U. Washington Kevin Hybiske