SlideShare a Scribd company logo
1 of 1
WikiGenomes and Chlambase: Microbial genomics data in Wikidata.
Tim E. Putman1, Sebastian Burgstaller-Muehlbacher1, Andra Waagmeester2, Chunlei Wu1,
Kevin Hybiske3, Benjamin M. Good1, and Andrew I. Su1
1 Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, USA; sulab.org
2 Micelio, Antwerp, Belgium
3 Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington
Motivation
Wikidata provides an extensible open framework ideal for aggregating
distributed data in a centralized database that supports:
• complex querying based a semantic data model
• providing data for domain specific web applications that allow the user to
both read and write data
Here, we describe the use of Wikidata to integrate microbial genomics data
using WikiGenomes and a Chlamydia-specific instance called Chlambase.
A
A) Semantic microbial data model consisting of a hierarchical taxonomic schema
and separate entities for gene and protein. The nodes are Wikidata ‘items’ and
‘properties’ define the relationships. B) Python based ‘Bot’ software for gathering
data from different resources and reading and writing directly to Wikidata
(https://github.com/SuLab/WikidataIntegrator).
Data model and implementation
A) Various data sources for microbial genetic data. B) Cumulative sum of bacterial
and eukaryotic genome assemblies submitted to NCBI GenBank by year.
A B
Scope and diversity of microbial data
Modeling microbial interactions
C. trachomatis
genome
www.ncbi.nlm.nih.gov/
genome/
indole
www.drugbank.ca/
Chlamydia trachomatis:
genes
www.ncbi.nlm.nih.gov/gene/
Human:
indoleamine 2, 3-dioxygenase
www.uniprot.org/
tryptophanase
www.uniprot.org/
C.trachomatis:
trp. synth.
alpha
and
beta
www.uniprot.org/
C.trachomatis:
tryptophan
synthase
www.rhea-db.org
C.trachomatis:
trpRBA operon
www.operondb.jp/
Akers et al. 2006
A) The interactions between host, pathogen,
microbiome, and small molecules that lead to
pathogen persistence during a chlamydial infection in
humans (originally hypothesized by Caldwell et al.
2003). Blue URLs indicate source of data and edges
are defined by properties in Wikidata. B) SPARQL
query results for organisms that are capable of
producing indole .
B. Organisms that produce indole
Acknowledgements
We would like to thank Lynn Schriml and Elvira Mitraka of the University of Maryland, the members
of The Apollo Project and the many members of the Wikidata community for valuable contributions
to this project.
References/Funding
Caldwell et al. 2003 (PMID:12782678)
Putman et al. 2016 (PMID:27022157)
Burgstaller-Muehlbacher et al. 2015
(PMID:26989148)
This work is supported by the National Institutes of
Health under grants GM089820 and GM114833.
Domain Specific Portals into Wikidata
WikiGenomes serves as a centralized and generalizable microbial genomics database
for the Long Tail of sequenced genomes. WikiGenomes engages domain experts by
providing integrated gene reports that are otherwise difficult of tedious to access.
WikiGenomes also provides an easy interface that supports community annotation,
which is then immediately written to Wikidata.
L-tryptophan
www.drugbank.ca/
Bacteria
(Q10876)
domain
C.
trachomatis
434/BU
(Q20800254)
strain
trpA
(Q21153861)
gene
TRPA
(Q21153984)
protein
found in taxon
(P703)
parent taxon (P171)
encodes (P688)
encoded by (P702)
subclass of (P279)
Entrez ID (P351)
gen. start (P644)
gen. stop (P645)
subclass of
(P279)
UniProt ID
(P352)
RefSeq ID (P637)
molecular
function
(P680)
locus tag (P2393)
C.
trachomatis
(Q131065)
species
biological
process
(P681)
cell
component
(P682)
found in taxon
(P703)
B
N-Formylkynurenine
www.drugbank.ca/
A
Join the team!
bit.ly/genewikidata; sulab.org

More Related Content

What's hot

Creating an integrated Ondex knowledge base for comparative gene function ana...
Creating an integrated Ondex knowledge base for comparative gene function ana...Creating an integrated Ondex knowledge base for comparative gene function ana...
Creating an integrated Ondex knowledge base for comparative gene function ana...Catherine Canevet
 
GENOME DATA ANALYSIS
GENOME DATA ANALYSISGENOME DATA ANALYSIS
GENOME DATA ANALYSISAmeldaAkoijam
 
Cancer and wikimedia
Cancer and wikimediaCancer and wikimedia
Cancer and wikimediaRockpocket
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Jonathan Eisen
 
Personalized models for Quantitative Systems Pharmacology
Personalized models for Quantitative Systems PharmacologyPersonalized models for Quantitative Systems Pharmacology
Personalized models for Quantitative Systems PharmacologyRutgers University
 
ADARSH JOSE_Resume
ADARSH JOSE_ResumeADARSH JOSE_Resume
ADARSH JOSE_ResumeAdarsh Jose
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Sachin Kumar
 
Gcc talk baltimore july 2014
Gcc talk baltimore july 2014Gcc talk baltimore july 2014
Gcc talk baltimore july 2014pratikomics
 
Genomics2 Phenomics Complete
Genomics2 Phenomics CompleteGenomics2 Phenomics Complete
Genomics2 Phenomics CompleteInterpretOmics
 
STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...Lars Juhl Jensen
 
Danita CV 2015 July
Danita CV 2015 JulyDanita CV 2015 July
Danita CV 2015 JulyDanita Mayer
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data setsimprovemed
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012Kate Hertweck
 
The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...
The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...
The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...Meningitis Research Foundation
 
Introducing the KnetMiner Knowledge Graph: things, not strings
Introducing the KnetMiner Knowledge Graph: things, not stringsIntroducing the KnetMiner Knowledge Graph: things, not strings
Introducing the KnetMiner Knowledge Graph: things, not stringsKeywan Hassani-Pak
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...GigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkGigaScience, BGI Hong Kong
 

What's hot (20)

Creating an integrated Ondex knowledge base for comparative gene function ana...
Creating an integrated Ondex knowledge base for comparative gene function ana...Creating an integrated Ondex knowledge base for comparative gene function ana...
Creating an integrated Ondex knowledge base for comparative gene function ana...
 
GENOME DATA ANALYSIS
GENOME DATA ANALYSISGENOME DATA ANALYSIS
GENOME DATA ANALYSIS
 
Cancer and wikimedia
Cancer and wikimediaCancer and wikimedia
Cancer and wikimedia
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
 
Personalized models for Quantitative Systems Pharmacology
Personalized models for Quantitative Systems PharmacologyPersonalized models for Quantitative Systems Pharmacology
Personalized models for Quantitative Systems Pharmacology
 
iOmics
iOmicsiOmics
iOmics
 
ADARSH JOSE_Resume
ADARSH JOSE_ResumeADARSH JOSE_Resume
ADARSH JOSE_Resume
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Gcc talk baltimore july 2014
Gcc talk baltimore july 2014Gcc talk baltimore july 2014
Gcc talk baltimore july 2014
 
Genomics2 Phenomics Complete
Genomics2 Phenomics CompleteGenomics2 Phenomics Complete
Genomics2 Phenomics Complete
 
STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...STRING - Prediction of functionally associated proteins from heterogeneous ge...
STRING - Prediction of functionally associated proteins from heterogeneous ge...
 
CDD poster
CDD posterCDD poster
CDD poster
 
Danita CV 2015 July
Danita CV 2015 JulyDanita CV 2015 July
Danita CV 2015 July
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012
 
The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...
The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...
The MRF Genome Library: Epidemiology of meningococcal disease-causing lineage...
 
Introducing the KnetMiner Knowledge Graph: things, not strings
Introducing the KnetMiner Knowledge Graph: things, not stringsIntroducing the KnetMiner Knowledge Graph: things, not strings
Introducing the KnetMiner Knowledge Graph: things, not strings
 
Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8Bioinformatics-2009-Moura-1096-8
Bioinformatics-2009-Moura-1096-8
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
 

Similar to WikiGenomes Poster (ISMB)

Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureLarry Smarr
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Larry Smarr
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Andrew Su
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersLarry Smarr
 
Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...
Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...
Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...jodischneider
 
Bioinformatics and its Applications in Agriculture/Sericulture and in other F...
Bioinformatics and its Applications in Agriculture/Sericulture and in other F...Bioinformatics and its Applications in Agriculture/Sericulture and in other F...
Bioinformatics and its Applications in Agriculture/Sericulture and in other F...mohd younus wani
 
Cimetta et al., 2013
Cimetta et al., 2013Cimetta et al., 2013
Cimetta et al., 2013Fran Flores
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Larry Smarr
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Larry Smarr
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...Human Variome Project
 
Forest Environment Analysis for the Pandemic Health
Forest Environment Analysis for the Pandemic HealthForest Environment Analysis for the Pandemic Health
Forest Environment Analysis for the Pandemic HealthJun Steed Huang
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
 

Similar to WikiGenomes Poster (ISMB) (20)

Microbial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New CyberinfrastructureMicrobial Metagenomics Drives a New Cyberinfrastructure
Microbial Metagenomics Drives a New Cyberinfrastructure
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...
Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...
Modeling Alzheimer’s Disease research claims, evidence, and arguments from a ...
 
Bioinformatics and its Applications in Agriculture/Sericulture and in other F...
Bioinformatics and its Applications in Agriculture/Sericulture and in other F...Bioinformatics and its Applications in Agriculture/Sericulture and in other F...
Bioinformatics and its Applications in Agriculture/Sericulture and in other F...
 
Cimetta et al., 2013
Cimetta et al., 2013Cimetta et al., 2013
Cimetta et al., 2013
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
 
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
Analyzing the Human Gut Microbiome Dynamics in Health and Disease Using Super...
 
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
 
Forest Environment Analysis for the Pandemic Health
Forest Environment Analysis for the Pandemic HealthForest Environment Analysis for the Pandemic Health
Forest Environment Analysis for the Pandemic Health
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
 
gky1131.pdf
gky1131.pdfgky1131.pdf
gky1131.pdf
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphAndrew Su
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesAndrew Su
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeAndrew Su
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...Andrew Su
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseAndrew Su
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Andrew Su
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchAndrew Su
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceAndrew Su
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceAndrew Su
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Andrew Su
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeAndrew Su
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Andrew Su
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...Andrew Su
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgAndrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
NCBO Webinar: Translating unstructured, crowdsourced content into structured ...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 

Recently uploaded

Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docxkarenmillo
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 

Recently uploaded (20)

Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 

WikiGenomes Poster (ISMB)

  • 1. WikiGenomes and Chlambase: Microbial genomics data in Wikidata. Tim E. Putman1, Sebastian Burgstaller-Muehlbacher1, Andra Waagmeester2, Chunlei Wu1, Kevin Hybiske3, Benjamin M. Good1, and Andrew I. Su1 1 Department of Integrative, Structural and Computational Biology, the Scripps Research Institute, La Jolla, USA; sulab.org 2 Micelio, Antwerp, Belgium 3 Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington Motivation Wikidata provides an extensible open framework ideal for aggregating distributed data in a centralized database that supports: • complex querying based a semantic data model • providing data for domain specific web applications that allow the user to both read and write data Here, we describe the use of Wikidata to integrate microbial genomics data using WikiGenomes and a Chlamydia-specific instance called Chlambase. A A) Semantic microbial data model consisting of a hierarchical taxonomic schema and separate entities for gene and protein. The nodes are Wikidata ‘items’ and ‘properties’ define the relationships. B) Python based ‘Bot’ software for gathering data from different resources and reading and writing directly to Wikidata (https://github.com/SuLab/WikidataIntegrator). Data model and implementation A) Various data sources for microbial genetic data. B) Cumulative sum of bacterial and eukaryotic genome assemblies submitted to NCBI GenBank by year. A B Scope and diversity of microbial data Modeling microbial interactions C. trachomatis genome www.ncbi.nlm.nih.gov/ genome/ indole www.drugbank.ca/ Chlamydia trachomatis: genes www.ncbi.nlm.nih.gov/gene/ Human: indoleamine 2, 3-dioxygenase www.uniprot.org/ tryptophanase www.uniprot.org/ C.trachomatis: trp. synth. alpha and beta www.uniprot.org/ C.trachomatis: tryptophan synthase www.rhea-db.org C.trachomatis: trpRBA operon www.operondb.jp/ Akers et al. 2006 A) The interactions between host, pathogen, microbiome, and small molecules that lead to pathogen persistence during a chlamydial infection in humans (originally hypothesized by Caldwell et al. 2003). Blue URLs indicate source of data and edges are defined by properties in Wikidata. B) SPARQL query results for organisms that are capable of producing indole . B. Organisms that produce indole Acknowledgements We would like to thank Lynn Schriml and Elvira Mitraka of the University of Maryland, the members of The Apollo Project and the many members of the Wikidata community for valuable contributions to this project. References/Funding Caldwell et al. 2003 (PMID:12782678) Putman et al. 2016 (PMID:27022157) Burgstaller-Muehlbacher et al. 2015 (PMID:26989148) This work is supported by the National Institutes of Health under grants GM089820 and GM114833. Domain Specific Portals into Wikidata WikiGenomes serves as a centralized and generalizable microbial genomics database for the Long Tail of sequenced genomes. WikiGenomes engages domain experts by providing integrated gene reports that are otherwise difficult of tedious to access. WikiGenomes also provides an easy interface that supports community annotation, which is then immediately written to Wikidata. L-tryptophan www.drugbank.ca/ Bacteria (Q10876) domain C. trachomatis 434/BU (Q20800254) strain trpA (Q21153861) gene TRPA (Q21153984) protein found in taxon (P703) parent taxon (P171) encodes (P688) encoded by (P702) subclass of (P279) Entrez ID (P351) gen. start (P644) gen. stop (P645) subclass of (P279) UniProt ID (P352) RefSeq ID (P637) molecular function (P680) locus tag (P2393) C. trachomatis (Q131065) species biological process (P681) cell component (P682) found in taxon (P703) B N-Formylkynurenine www.drugbank.ca/ A Join the team! bit.ly/genewikidata; sulab.org