SlideShare a Scribd company logo
1 of 33
The Gene Wiki: Using
Wikipedia and Wikidata to
organize biomedical
knowledge
Andrew Su, Ph.D.
@andrewsu
[[User:Andrew Su]]
http://sulab.org
August 23, 2017
WMF Research
Slides: slideshare.net/andrewsu
2
Crowd volunteers
and partners
Andra
Waagmeester
(andrawaag)
Lynn
Schriml
Elvira
Mitraka
(emitraka)
U. Maryland, Baltimore
MicelioUBC
Paul Pavlidis
Ben Good
(i9606)
Greg Stupp
(gstupp)
Sebastian
Burgstaller
(sebotic)
Tim Putman
(putmantime)
Ginger
Tsueng
Nuria
Queralt
Rosinach
bit.ly/genewikidata
sulab.org
We are recruiting!
U. Washington
Kevin Hybiske
The biomedical literature is massive…
3
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1985 1990 1995 2000 2005 2010 2015
Number of new PubMed-indexed articles
Filtering, extracting, and summarizing PubMed
Documents
Concepts Review article
about fibronectin
Filtering, extracting, and summarizing PubMed
Documents
Concepts
The Gene Wiki project, circa 2008
6
Huss, PLoS Biol, 2008
Data imported
from structured
biomedical
databases
Summarized
knowledge via
crowdsourcing
7
8
Biomedical
databases
Applications
9
Biomedical
databases
Applications
1. Text mining biological annotations
https://www.ncbi.nlm.nih.gov/pubmed/22165947
10
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
https://www.ncbi.nlm.nih.gov/pubmed/24012870
11
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
3. Embedding structured data using
Wikipedia templates
https://www.ncbi.nlm.nih.gov/pubmed/22434829
The expression of the protein has been found to be
significantly lower in [[schizophrenia]] and
psychotic...
The expression of the protein has been found to be
significantly lower in {{SWL|type=decreased
expression|target=schizophrenia}} and psychotic...
is to data
is to text
biomedical
Provide a database of the world’s
knowledge that anyone can edit
- Denny Vrandečić
Subclass of
Regulates
Physically
interacts with
Protein
Neural
development
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
VLDL receptor Q1979313
Amyloid
beta A4 Q423510
Q13561329
http://www.wikidata.org/wiki/Q13561329
Decreased
expression in
Property:P1910
Schizophrenia Q41112
Bipolar disorder Q131755
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
Q1979313
Q423510
Q13561329
Property:P1910
Q41112
Q131755
https://www.wikidata.org/wiki/Special:EntityData/Q13561329.json
15
Qualifiers
References
16
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
3. Embedding structured data using
Wikipedia templates
17
EMA
GWAS CentralPubChem
18
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
3. Embedding structured data using
Wikipedia templates
Applications
1. Demonstrating integrative
biomedical queries
Simple data retrieval
19
39 genes
gene geneLabel gene geneLabel gene geneLabel gene geneLabel
Q5013317 COL22A1 Q18027370 IGSF3 Q18053559 CDHR3 Q14903974 SMAD3
Q14912759 SLC22A5 Q18045382 HPSE2 Q18045669 ATG3 Q18033889 IL1RL1
Q14914243 PSAP Q18048437 IL33 Q18035037 RAD50 Q17917202 ERBB4
Q14907990 SLC30A8 Q18051900 PYHIN1 Q18036984 FBXL7 Q18027836 IL6R
Q18025002 GAB1 Q17709208 ACO1 Q18033919 XPR1 Q18030185 NOTCH4
Q18035589 C6orf10 Q18027822 IL2RB Q15326496 RORA Q18030409 PDE4D
Q18054256 GSDMA Q18030364 PBX2 Q18042132 GSDMB Q18045645 IKZF4
Q18058487 C5orf56 Q18037773 ABI3BP Q18029145 MKLN1 Q18039979 KLHL5
Q18030785 PRKG1 Q18039623 CTNNA3 Q18036729 RAP1GAP2 Q18026947 HLA-DQA1
Q18033424 IL18R1 Q18046350 ZNF665 Q14878303 IL13
“Retrieve genes with
genetic association
with asthma”
http://bit.ly/bosc2017_wikidata
Data integration
20
“Retrieve genes with
genetic association
with asthma and gene
product is localized to
cell membrane”
gene geneLabel gene geneLabel gene geneLabel gene geneLabel
Q14912759 SLC22A5 Q18027370 IGSF3 Q18035037 RAD50 Q18027836 IL6R
Q14914243 PSAP Q18033424 IL18R1 Q18033919 XPR1 Q18030409 PDE4D
Q14907990 SLC30A8 Q18045382 HPSE2 Q18042132 GSDMB Q18030185 NOTCH4
Q18035589 C6orf10 Q18027822 IL2RB Q18036729 RAP1GAP2 Q18026947 HLA-DQA1
Q18054256 GSDMA Q18053559 CDHR3 Q18033889 IL1RL1
Q18030785 PRKG1 Q14903974 SMAD3 Q17917202 ERBB4
22 genes
http://bit.ly/bosc2017_wikidata
Leveraging the Disease Ontology structure
21
“Retrieve genes with
genetic association with any
respiratory disease and
gene product is localized to
cell membrane”
31 genes / 8 diseases
diseaseGALabel gene_counts geneList
asthma 15
SMAD3, RAP1GAP2, IL18R1, HPSE2,
SLC30A8, SLC22A5, PSAP, ERBB4, HLA-
DQA1, IGSF3, IL2RB, IL6R, NOTCH4, PDE4D,
RAD50
chronic obstructive pulmonary
disease 5 HLA-C, SFTPD, ANXA5, ANXA11, ATP2C2
lung cancer 3 TGM5, VTI1A, PHACTR2
interstitial lung disease 2 DSP, ATP11A
non-small-cell lung carcinoma 2 NALCN, DLST
nasopharynx carcinoma 2 ITGA9, TNFRSF19
adenocarcinoma of the lung 1 BTNL2
pulmonary emphysema 1 BICD1
http://bit.ly/bosc2017_wikidata
Opportunistic integration
22
diseaseGALabel exposureLabel
lung cancer arsenic pentoxide exposure
lung cancer HN1 exposure
lung cancer mechlorethamine exposure
lung cancer HN3 exposure
asthma Phenacyl chloride exposure
pulmonary emphysema phosgene exposure
“Retrieve genes with
genetic association with any
respiratory disease and
gene product is localized to
cell membrane and show
causative chemical
hazards”
4 diseases / 6 chemical hazards
http://bit.ly/bosc2017_wikidata
23
Biomedical
databases
Applications
1. Text mining biological annotations
2. Editor engagement via peer-
review dual publication model
3. Embedding structured data using
Wikipedia templates
Applications
1. Demonstrating integrative
biomedical queries
2. Building domain-specific web
applications
Small data to big data
24
?
Chlambase.org for the Chlamydia research community
25
Community-specific
structured knowledge
Genetic mutants, gene
expression, host-pathogen
interactions, orthologs, ….
Domain-specific applications based on Wikidata
26
Chlambase
Thoughts for the future
(TFTF)
27
https://www.pexels.com/photo/telescope-view-binoculars-viewpoint-4754/
TFTF #1: Need incentives for data owners → data contributors
28
Circular
Diagram
Direct measures of usage
• SPARQL query logs
• Network interconnectedness
• Other ideas?
TFTF #2: Need functional integration of WP and WD edit histories
1. Statement-level filtering
2. Across all sourced WD items
(arbitrary access)
29
https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine/Archive_92
TFTF #3: Need more expressive data modeling and reporting
• Defining data models and constraints
(ShEx?)
• Visualizing and disseminating models
• Reporting violations and auto-suggesting
fixes
30
https://github.com/SuLab/Genewiki-ShEx
31
Crowd volunteers
and partners
Andra
Waagmeester
(andrawaag)
Lynn
Schriml
Elvira
Mitraka
(emitraka)
U. Maryland, Baltimore
MicelioUBC
Paul Pavlidis
Ben Good
(i9606)
Greg Stupp
(gstupp)
Sebastian
Burgstaller
(sebotic)
Tim Putman
(putmantime)
Ginger
Tsueng
Nuria
Queralt
Rosinach
bit.ly/genewikidata
sulab.org
We are recruiting!
U. Washington
Kevin Hybiske
32
Open source
33
github.com/SuLab/GeneWikiCentral
github.com/SuLab/wikidataintegrator – python module for Wikidata
github.com/SuLab/scheduled-bots – bot automation framework
github.com/SuLab/WikiGenomes.org
github.com/SuLab/ChlamBase.org
github.com/SuLab/Genewiki-ShEx – data models
github.com/SuLab/wdbiothings – wrapper for BioThings APIs
Expert interfaces
License

More Related Content

What's hot

ZFN-Science-Rats
ZFN-Science-RatsZFN-Science-Rats
ZFN-Science-Rats
Greg Davis
 
Gastrointestinal nematodiasis
Gastrointestinal nematodiasisGastrointestinal nematodiasis
Gastrointestinal nematodiasis
Nasrin Lucky
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
Human Variome Project
 
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-CattleFrequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
UniversitasGadjahMada
 
Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...
Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...
Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...
Merial EMEA
 
Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...
Alexander Decker
 

What's hot (20)

Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
Applications of Whole Genome Sequencing (WGS) to Food Safety – Perspective fr...
 
ZFN-Science-Rats
ZFN-Science-RatsZFN-Science-Rats
ZFN-Science-Rats
 
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
Real-Time Genome Sequencing of Resistant Bacteria Provides Precision Infectio...
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418
 
Gastrointestinal nematodiasis
Gastrointestinal nematodiasisGastrointestinal nematodiasis
Gastrointestinal nematodiasis
 
unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...
unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...
unusual-coexistence-of-giant-cell-tumor-and-small-bone-tuberculosis-of-the-ha...
 
Equine Exchange
Equine ExchangeEquine Exchange
Equine Exchange
 
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa LandrumClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
ClinVar: Aggregating Data to Improve Variant Interpretation - Melissa Landrum
 
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-CattleFrequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
Frequency and Risk-Factors Analysis of Escherichia coli O157:H7 in Bali-Cattle
 
Bse.rule substances prohibited from use in animal food and feed final rule
Bse.rule substances prohibited from use in animal food and feed final ruleBse.rule substances prohibited from use in animal food and feed final rule
Bse.rule substances prohibited from use in animal food and feed final rule
 
Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...
Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...
Apvs2013 02 merdy et al.-comparison circovac vs competitors on adwg mortality...
 
2016.11.08 doherty institute symposium
2016.11.08 doherty institute symposium2016.11.08 doherty institute symposium
2016.11.08 doherty institute symposium
 
Update of malaria vaccines
Update of malaria vaccinesUpdate of malaria vaccines
Update of malaria vaccines
 
Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...Sero prevalence of infectious bursal disease in backyard chickens at selected...
Sero prevalence of infectious bursal disease in backyard chickens at selected...
 
PLoS One MRSA paper
PLoS One MRSA paperPLoS One MRSA paper
PLoS One MRSA paper
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...
Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...
Prevalence of Coccidiosis in Back Yard Chicken in and Around DebereTabere Tow...
 
The BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe BeroudThe BRCA Share(TM) Consortium - Christophe Beroud
The BRCA Share(TM) Consortium - Christophe Beroud
 
Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...Establishing validity, reproducibility, and utility of highly scalable geneti...
Establishing validity, reproducibility, and utility of highly scalable geneti...
 
Wildlife-livestock-human interface: recognising drivers of disease
Wildlife-livestock-human interface: recognising drivers of diseaseWildlife-livestock-human interface: recognising drivers of disease
Wildlife-livestock-human interface: recognising drivers of disease
 

Similar to The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
Matthew Clark
 

Similar to The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge (20)

Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 
Genetics society
Genetics societyGenetics society
Genetics society
 
Dr. Timothy Bowen (Cardiff University) - Data-driven systems medicine
Dr. Timothy Bowen (Cardiff University) - Data-driven systems medicineDr. Timothy Bowen (Cardiff University) - Data-driven systems medicine
Dr. Timothy Bowen (Cardiff University) - Data-driven systems medicine
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
Precision Oncology - using Genomics, Proteomics and Imaging to inform biology...
 
mHealth Israel_Ryo Kosaka_AIST_National Institute of Advanced Industrial Scie...
mHealth Israel_Ryo Kosaka_AIST_National Institute of Advanced Industrial Scie...mHealth Israel_Ryo Kosaka_AIST_National Institute of Advanced Industrial Scie...
mHealth Israel_Ryo Kosaka_AIST_National Institute of Advanced Industrial Scie...
 
9th b3 sc
9th b3 sc9th b3 sc
9th b3 sc
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
AI in medicine: COVID-19 and beyond
AI in medicine: COVID-19 and beyondAI in medicine: COVID-19 and beyond
AI in medicine: COVID-19 and beyond
 
ACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domainsACS Spring 2016 Combining semantic triple stores across knowledge domains
ACS Spring 2016 Combining semantic triple stores across knowledge domains
 
Guide to Pharmacology Poster - ELIXIR All Hands 2020
Guide to Pharmacology Poster - ELIXIR All Hands 2020Guide to Pharmacology Poster - ELIXIR All Hands 2020
Guide to Pharmacology Poster - ELIXIR All Hands 2020
 
2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data2015 06-12-beiko-irida-big data
2015 06-12-beiko-irida-big data
 
Updated 2016 introduction to the methodology of flow cytometry
Updated 2016 introduction to the methodology of flow cytometryUpdated 2016 introduction to the methodology of flow cytometry
Updated 2016 introduction to the methodology of flow cytometry
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Data sharing and analysis
Data sharing and analysisData sharing and analysis
Data sharing and analysis
 
Standards for public health genomic epidemiology - Biocuration 2015
Standards for public health genomic epidemiology - Biocuration 2015Standards for public health genomic epidemiology - Biocuration 2015
Standards for public health genomic epidemiology - Biocuration 2015
 
Recent Advances in Immune Monitoring Presentation Slides
Recent Advances in Immune Monitoring Presentation Slides Recent Advances in Immune Monitoring Presentation Slides
Recent Advances in Immune Monitoring Presentation Slides
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...Wikipedia as an engine for scientific communication and collaboration at mass...
Wikipedia as an engine for scientific communication and collaboration at mass...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org (Sanger)
 
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)
 
Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)Crowdsourcing to structure biological knowledge (USC/ISI)
Crowdsourcing to structure biological knowledge (USC/ISI)
 
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotationISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
ISMB2012: The Gene Wiki: Crowdsourcing human gene annotation
 

Recently uploaded

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 

The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge

  • 1. The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge Andrew Su, Ph.D. @andrewsu [[User:Andrew Su]] http://sulab.org August 23, 2017 WMF Research Slides: slideshare.net/andrewsu
  • 2. 2 Crowd volunteers and partners Andra Waagmeester (andrawaag) Lynn Schriml Elvira Mitraka (emitraka) U. Maryland, Baltimore MicelioUBC Paul Pavlidis Ben Good (i9606) Greg Stupp (gstupp) Sebastian Burgstaller (sebotic) Tim Putman (putmantime) Ginger Tsueng Nuria Queralt Rosinach bit.ly/genewikidata sulab.org We are recruiting! U. Washington Kevin Hybiske
  • 3. The biomedical literature is massive… 3 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000 1,400,000 1985 1990 1995 2000 2005 2010 2015 Number of new PubMed-indexed articles
  • 4. Filtering, extracting, and summarizing PubMed Documents Concepts Review article about fibronectin
  • 5. Filtering, extracting, and summarizing PubMed Documents Concepts
  • 6. The Gene Wiki project, circa 2008 6 Huss, PLoS Biol, 2008 Data imported from structured biomedical databases Summarized knowledge via crowdsourcing
  • 7. 7
  • 9. 9 Biomedical databases Applications 1. Text mining biological annotations https://www.ncbi.nlm.nih.gov/pubmed/22165947
  • 10. 10 Biomedical databases Applications 1. Text mining biological annotations 2. Editor engagement via peer- review dual publication model https://www.ncbi.nlm.nih.gov/pubmed/24012870
  • 11. 11 Biomedical databases Applications 1. Text mining biological annotations 2. Editor engagement via peer- review dual publication model 3. Embedding structured data using Wikipedia templates https://www.ncbi.nlm.nih.gov/pubmed/22434829 The expression of the protein has been found to be significantly lower in [[schizophrenia]] and psychotic... The expression of the protein has been found to be significantly lower in {{SWL|type=decreased expression|target=schizophrenia}} and psychotic...
  • 12. is to data is to text biomedical Provide a database of the world’s knowledge that anyone can edit - Denny Vrandečić
  • 13. Subclass of Regulates Physically interacts with Protein Neural development Property:P279 Property:P128 Property:P129 Q8054 Q1345738 VLDL receptor Q1979313 Amyloid beta A4 Q423510 Q13561329 http://www.wikidata.org/wiki/Q13561329 Decreased expression in Property:P1910 Schizophrenia Q41112 Bipolar disorder Q131755
  • 16. 16 Biomedical databases Applications 1. Text mining biological annotations 2. Editor engagement via peer- review dual publication model 3. Embedding structured data using Wikipedia templates
  • 18. 18 Biomedical databases Applications 1. Text mining biological annotations 2. Editor engagement via peer- review dual publication model 3. Embedding structured data using Wikipedia templates Applications 1. Demonstrating integrative biomedical queries
  • 19. Simple data retrieval 19 39 genes gene geneLabel gene geneLabel gene geneLabel gene geneLabel Q5013317 COL22A1 Q18027370 IGSF3 Q18053559 CDHR3 Q14903974 SMAD3 Q14912759 SLC22A5 Q18045382 HPSE2 Q18045669 ATG3 Q18033889 IL1RL1 Q14914243 PSAP Q18048437 IL33 Q18035037 RAD50 Q17917202 ERBB4 Q14907990 SLC30A8 Q18051900 PYHIN1 Q18036984 FBXL7 Q18027836 IL6R Q18025002 GAB1 Q17709208 ACO1 Q18033919 XPR1 Q18030185 NOTCH4 Q18035589 C6orf10 Q18027822 IL2RB Q15326496 RORA Q18030409 PDE4D Q18054256 GSDMA Q18030364 PBX2 Q18042132 GSDMB Q18045645 IKZF4 Q18058487 C5orf56 Q18037773 ABI3BP Q18029145 MKLN1 Q18039979 KLHL5 Q18030785 PRKG1 Q18039623 CTNNA3 Q18036729 RAP1GAP2 Q18026947 HLA-DQA1 Q18033424 IL18R1 Q18046350 ZNF665 Q14878303 IL13 “Retrieve genes with genetic association with asthma” http://bit.ly/bosc2017_wikidata
  • 20. Data integration 20 “Retrieve genes with genetic association with asthma and gene product is localized to cell membrane” gene geneLabel gene geneLabel gene geneLabel gene geneLabel Q14912759 SLC22A5 Q18027370 IGSF3 Q18035037 RAD50 Q18027836 IL6R Q14914243 PSAP Q18033424 IL18R1 Q18033919 XPR1 Q18030409 PDE4D Q14907990 SLC30A8 Q18045382 HPSE2 Q18042132 GSDMB Q18030185 NOTCH4 Q18035589 C6orf10 Q18027822 IL2RB Q18036729 RAP1GAP2 Q18026947 HLA-DQA1 Q18054256 GSDMA Q18053559 CDHR3 Q18033889 IL1RL1 Q18030785 PRKG1 Q14903974 SMAD3 Q17917202 ERBB4 22 genes http://bit.ly/bosc2017_wikidata
  • 21. Leveraging the Disease Ontology structure 21 “Retrieve genes with genetic association with any respiratory disease and gene product is localized to cell membrane” 31 genes / 8 diseases diseaseGALabel gene_counts geneList asthma 15 SMAD3, RAP1GAP2, IL18R1, HPSE2, SLC30A8, SLC22A5, PSAP, ERBB4, HLA- DQA1, IGSF3, IL2RB, IL6R, NOTCH4, PDE4D, RAD50 chronic obstructive pulmonary disease 5 HLA-C, SFTPD, ANXA5, ANXA11, ATP2C2 lung cancer 3 TGM5, VTI1A, PHACTR2 interstitial lung disease 2 DSP, ATP11A non-small-cell lung carcinoma 2 NALCN, DLST nasopharynx carcinoma 2 ITGA9, TNFRSF19 adenocarcinoma of the lung 1 BTNL2 pulmonary emphysema 1 BICD1 http://bit.ly/bosc2017_wikidata
  • 22. Opportunistic integration 22 diseaseGALabel exposureLabel lung cancer arsenic pentoxide exposure lung cancer HN1 exposure lung cancer mechlorethamine exposure lung cancer HN3 exposure asthma Phenacyl chloride exposure pulmonary emphysema phosgene exposure “Retrieve genes with genetic association with any respiratory disease and gene product is localized to cell membrane and show causative chemical hazards” 4 diseases / 6 chemical hazards http://bit.ly/bosc2017_wikidata
  • 23. 23 Biomedical databases Applications 1. Text mining biological annotations 2. Editor engagement via peer- review dual publication model 3. Embedding structured data using Wikipedia templates Applications 1. Demonstrating integrative biomedical queries 2. Building domain-specific web applications
  • 24. Small data to big data 24 ?
  • 25. Chlambase.org for the Chlamydia research community 25 Community-specific structured knowledge Genetic mutants, gene expression, host-pathogen interactions, orthologs, ….
  • 26. Domain-specific applications based on Wikidata 26 Chlambase
  • 27. Thoughts for the future (TFTF) 27 https://www.pexels.com/photo/telescope-view-binoculars-viewpoint-4754/
  • 28. TFTF #1: Need incentives for data owners → data contributors 28 Circular Diagram Direct measures of usage • SPARQL query logs • Network interconnectedness • Other ideas?
  • 29. TFTF #2: Need functional integration of WP and WD edit histories 1. Statement-level filtering 2. Across all sourced WD items (arbitrary access) 29 https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine/Archive_92
  • 30. TFTF #3: Need more expressive data modeling and reporting • Defining data models and constraints (ShEx?) • Visualizing and disseminating models • Reporting violations and auto-suggesting fixes 30 https://github.com/SuLab/Genewiki-ShEx
  • 31. 31 Crowd volunteers and partners Andra Waagmeester (andrawaag) Lynn Schriml Elvira Mitraka (emitraka) U. Maryland, Baltimore MicelioUBC Paul Pavlidis Ben Good (i9606) Greg Stupp (gstupp) Sebastian Burgstaller (sebotic) Tim Putman (putmantime) Ginger Tsueng Nuria Queralt Rosinach bit.ly/genewikidata sulab.org We are recruiting! U. Washington Kevin Hybiske
  • 32. 32
  • 33. Open source 33 github.com/SuLab/GeneWikiCentral github.com/SuLab/wikidataintegrator – python module for Wikidata github.com/SuLab/scheduled-bots – bot automation framework github.com/SuLab/WikiGenomes.org github.com/SuLab/ChlamBase.org github.com/SuLab/Genewiki-ShEx – data models github.com/SuLab/wdbiothings – wrapper for BioThings APIs Expert interfaces License