SlideShare a Scribd company logo
1 of 18
Download to read offline
Biohackathon 2015
Europe PubMed Central and Linked
Data
Jee-Hyub Kim
0000-0002-0359-2887
Nagasaki 13 Sep 2015
Contents
● Europe PubMed Central
● Linking Literature
● Mining Identifiers
● Publishing Mined Identifiers on RDF
● Web Annotation Data Model
● Use Case for Database Curation
Europe PubMed Central
● Europe PMC is a literature database
○ Abstracts: 30 million PubMed, Agricola and patent
records, updated daily
○ Full text articles: over 3 million full text articles, of
which over 900,000 are free to read and reuse,
updated daily
Services in Europe PMC
● RESTful web service:
○ http://europepmc.org/RestfulWebService
○ Text-mined terms, metadata, full text
● ORCID article claiming tool
● Embassy Cloud for 3rd party contents providers
● BioJS literature module: http://biojs.io/d/biojs-vis-
pmccitation
● RSS
Linking Literature
● Europe PMC provides various types of linking methods
○ By external links: to any URL (e.g., database,
Wikipedia, press release, etc.)
○ By text mining
■ Biological entities
■ Identifiers (e.g., accession numbers)
○ By ORCID (article claims)
● 24 external links providers, 1 ORCID, 9 cross-reference
DBs, 20 DB identifiers, 6 named entity types
Linking Examples
To By Relation REST API
Wikipedia Provider Mention labsLinks
Publons Provider Review labsLinks
UniProt Curator Citation databaseLinks
ORCID Provider Author search
EFO Named entity
tagger
Recognition textMinedTerms
PDB Accession
number tagger
Mention textMinedTerms
Mining Identifiers in Free Text
● Motivation
○ Started for cross-linking with EBI databases
○ Data citation, impact analysis
○ Now, moving for linked data
● We use patterns from identifiers.org and link back to it.
● A IE problem: ID matching + NER for resource names
● Some ambiguities
○ PDB: 4min
○ OMIM and ERC funding id: both 6-digit numbers
○ Resource name variations: UniProt, Swiss-Prot, etc.
Mentioned in Europe PMC articles
Identifiers in Literature
Databases
ENA, PDB,
ArrayExpress, UniProt,
RefSNP, OMIM, PFam,
RefSeq, Ensembl,
InterPro, Bioproject,
Biosample, EMDB, PXD,
EGA, TreeFam
Funding
resources
European
Research Council
Ontologies
GO, UniProt,
EFO, ChEBI,
NCBI Taxonomy,
UMLS
Clinical Trials
NCT, EudraCT
Digital
Repositories
(Dryad, figshare,
etc.)
Data DOI
Identifiers in Different Resources
Articles (978,605) Patents 2014 (266,192) Wiki pages (15,346,290)
db # articles db # patents db # pages
ena/genbank/
ddbj 23,295
ena/genbank/
ddbj 4,074 pdb 4,265
pdb 15,544 uniprot 1,387 omim 2,226
nct 13,006 pdb 1,093 uniprot 1,712
refsnp 10,168 refseq 1,002 refseq 1,643
refseq 6,551 refsnp 322 ensembl 1,402
omim 5,093 omim 254 go 1,351
uniprot 2,865 pfam 115 pfam 582
go 1,900 ensembl 97 interpro 560
arrayexpress 1,832 interpro 46
ena/genbank/
ddbj 396
Publishing Identifiers on RDF
● Goals
○ More connectivity
○ More provenance for each linking
■ PMCID, sentence, section label, etc.
○ Links to share and comment (e.g., hypothes.is)
● Challenges:
○ How to model? Web Annotation Data Model.
○ dealing with nearly a billion annotations generated
automatically in a large scale
Web Annotation Data Model
● Built on the top on RDF
● Annotations as resources
● To provide a standard description mechanism for
sharing annotations between systems
● For more general purpose use
○ Not only for text mining
○ For example, YouTube video comments (by people),
image annotation, etc.
○ W3C Working Draft
Core Annotation Framework
● Typically an Annotation has a single Body, which is
the comment or other descriptive resource, and a single
Target that the Body is somehow "about".
● The Body provides the information which is annotating
the Target.
● This "aboutness" may be further clarified or extended to
notions such as classifying or identifying.
Text-Mining RDF Service
● Running on EBI RDF Platform
● Stores 1,563,241,810 triples text-mined from 400,746
Open Access articles in Europe PubMed Central.
● Provides
○ for each article, all the annotations linking to
ontologies/databases
○ with contexts:
■ sentences
■ section information
Use Case for Database Curation
● Given an database identifier, provides sentence-level
information for database curation.
○ Show all the articles where a PDB accession number
3NSS is mentioned.
○ Show all the annotations with each its label in
PMC3382907.
○ Show all the articles where inflammatory bowel
disease (C0021390) is mentioned.
● http://wwwdev.ebi.ac.uk/rdf/services/textmining/sparql
Plans for BioHackathon 2015
● Integration with other SPAQL endpoints
● Interoperability with other formats used in text-mining
community
○ e.g., BioC, UIMA
● Produce more links on RDF
References
Europe PMC Consortium. Europe PMC: a full-text literature database for the life sciences and platform
for innovation. Nucleic Acids Res. 2015 Jan;43(Database issue) D1042-8. doi:10.1093/nar/gku1061.
PMID: 25378340; PMCID: PMC4383902.
Kafkas Ş, Kim JH, McEntyre JR. Database citation in full text biomedical articles. PLoS One. 2013;8(5)
e63184. doi:10.1371/journal.pone.0063184. PMID: 23734176; PMCID: PMC3667078.
Juty N, Le Novère N, Laibe C. Identifiers.org and MIRIAM Registry: community resources to provide
persistent identification. Nucleic Acids Res. 2012 Jan;40(Database issue) D580-6. doi:10.1093
/nar/gkr1097. PMID: 22140103; PMCID: PMC3245029.

More Related Content

What's hot

HPEC 2021 sparse binary format
HPEC 2021 sparse binary formatHPEC 2021 sparse binary format
HPEC 2021 sparse binary formatErikWelch2
 
Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsRuben Taelman
 
Change Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-historyChange Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-historyJoachim Neubert
 
Exposing RDF Archives using Triple Pattern Fragments
Exposing RDF Archives using Triple Pattern FragmentsExposing RDF Archives using Triple Pattern Fragments
Exposing RDF Archives using Triple Pattern FragmentsRuben Taelman
 
skos-history: Tracking the evolution of Knowledge Organization Systems
skos-history: Tracking the evolution of Knowledge Organization Systemsskos-history: Tracking the evolution of Knowledge Organization Systems
skos-history: Tracking the evolution of Knowledge Organization SystemsJoachim Neubert
 
Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014PavelKats
 
Making art (and more!) with metadata
Making art (and more!) with metadataMaking art (and more!) with metadata
Making art (and more!) with metadataMatthew Miguez
 
Concept net150529
Concept net150529Concept net150529
Concept net150529KangSe Lee
 
StaTIX - Statistical Type Inference on Linked Data
StaTIX - Statistical Type Inference on Linked DataStaTIX - Statistical Type Inference on Linked Data
StaTIX - Statistical Type Inference on Linked DataArtem Lutov
 
Tufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government DataTufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government DataKyle Monahan
 
EASY Metadata as Linked Open Data
EASY Metadata as Linked Open DataEASY Metadata as Linked Open Data
EASY Metadata as Linked Open DataMarat Charlaganov
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3IPLODProject
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jNeo4j
 
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...EUDAT
 

What's hot (20)

Sonex deposit meeting_ws_20110301
Sonex deposit meeting_ws_20110301Sonex deposit meeting_ws_20110301
Sonex deposit meeting_ws_20110301
 
HPEC 2021 sparse binary format
HPEC 2021 sparse binary formatHPEC 2021 sparse binary format
HPEC 2021 sparse binary format
 
Learning R - Handling NetCDF files
Learning R - Handling NetCDF filesLearning R - Handling NetCDF files
Learning R - Handling NetCDF files
 
Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern Fragments
 
Day 4( magic camp)
Day 4( magic camp)Day 4( magic camp)
Day 4( magic camp)
 
Change Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-historyChange Tracking in Knowledge Organization Systems with skos-history
Change Tracking in Knowledge Organization Systems with skos-history
 
Sheldon challenge
Sheldon challengeSheldon challenge
Sheldon challenge
 
Exposing RDF Archives using Triple Pattern Fragments
Exposing RDF Archives using Triple Pattern FragmentsExposing RDF Archives using Triple Pattern Fragments
Exposing RDF Archives using Triple Pattern Fragments
 
skos-history: Tracking the evolution of Knowledge Organization Systems
skos-history: Tracking the evolution of Knowledge Organization Systemsskos-history: Tracking the evolution of Knowledge Organization Systems
skos-history: Tracking the evolution of Knowledge Organization Systems
 
Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014Cloud Transforms Culture, Europeana Tech 2014
Cloud Transforms Culture, Europeana Tech 2014
 
Making art (and more!) with metadata
Making art (and more!) with metadataMaking art (and more!) with metadata
Making art (and more!) with metadata
 
Concept net150529
Concept net150529Concept net150529
Concept net150529
 
StaTIX - Statistical Type Inference on Linked Data
StaTIX - Statistical Type Inference on Linked DataStaTIX - Statistical Type Inference on Linked Data
StaTIX - Statistical Type Inference on Linked Data
 
Sonex 2nd DL.org workshop ECDL2010
Sonex 2nd DL.org workshop ECDL2010Sonex 2nd DL.org workshop ECDL2010
Sonex 2nd DL.org workshop ECDL2010
 
Tufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government DataTufts Spatial Data Rescue: Crawling at-risk Government Data
Tufts Spatial Data Rescue: Crawling at-risk Government Data
 
EASY Metadata as Linked Open Data
EASY Metadata as Linked Open DataEASY Metadata as Linked Open Data
EASY Metadata as Linked Open Data
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4j
 
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
FREYA - Connected Open Identifiers for Discovery, Access and Use of Research ...
 
Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)
 

Similar to Europe PubMed Central and Linked Data

Literature Services Resource Description Framework
Literature Services Resource Description FrameworkLiterature Services Resource Description Framework
Literature Services Resource Description FrameworkJee-Hyub Kim
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE
 
Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBIJuan Antonio Vizcaino
 
Scalable Text Mining
Scalable Text MiningScalable Text Mining
Scalable Text MiningJee-Hyub Kim
 
CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...
CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...
CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...Pedro Príncipe
 
OpenAIRE guidelines : supporting interoperability for literature repositories...
OpenAIRE guidelines : supporting interoperability for literature repositories...OpenAIRE guidelines : supporting interoperability for literature repositories...
OpenAIRE guidelines : supporting interoperability for literature repositories...OpenAIRE
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic WebChimezie Ogbuji
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Juan Antonio Vizcaino
 
DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...
DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...
DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...OpenAIRE
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomePiergiorgio Lucidi
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
 
OntoMaven Repositories and OMG API4KP
OntoMaven Repositories and OMG API4KPOntoMaven Repositories and OMG API4KP
OntoMaven Repositories and OMG API4KPAksw Group
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Diego López-de-Ipiña González-de-Artaza
 

Similar to Europe PubMed Central and Linked Data (20)

Literature Services Resource Description Framework
Literature Services Resource Description FrameworkLiterature Services Resource Description Framework
Literature Services Resource Description Framework
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
 
Mass spectrometry resources at the EBI
Mass spectrometry resources at the EBIMass spectrometry resources at the EBI
Mass spectrometry resources at the EBI
 
Scalable Text Mining
Scalable Text MiningScalable Text Mining
Scalable Text Mining
 
CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...
CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...
CRIS 2014 - OpenAIRE Guidelines: supporting interoperability for Literature R...
 
OpenAIRE guidelines : supporting interoperability for literature repositories...
OpenAIRE guidelines : supporting interoperability for literature repositories...OpenAIRE guidelines : supporting interoperability for literature repositories...
OpenAIRE guidelines : supporting interoperability for literature repositories...
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...
DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...
DCMI webinar - OpenAIRE Guidelines: Promoting Repositories Interoperability a...
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016OpenAIRE services and tools - presentation at #DI4R2016
OpenAIRE services and tools - presentation at #DI4R2016
 
OpenCitations
OpenCitationsOpenCitations
OpenCitations
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
OntoMaven Repositories and OMG API4KP
OntoMaven Repositories and OMG API4KPOntoMaven Repositories and OMG API4KP
OntoMaven Repositories and OMG API4KP
 
Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...Ontological Infrastructure for Interoperable Research Information Systems: HE...
Ontological Infrastructure for Interoperable Research Information Systems: HE...
 
Semantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretationSemantic artefact and ontology services for long-term data interpretation
Semantic artefact and ontology services for long-term data interpretation
 

Recently uploaded

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 

Recently uploaded (20)

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 

Europe PubMed Central and Linked Data

  • 1. Biohackathon 2015 Europe PubMed Central and Linked Data Jee-Hyub Kim 0000-0002-0359-2887 Nagasaki 13 Sep 2015
  • 2. Contents ● Europe PubMed Central ● Linking Literature ● Mining Identifiers ● Publishing Mined Identifiers on RDF ● Web Annotation Data Model ● Use Case for Database Curation
  • 3. Europe PubMed Central ● Europe PMC is a literature database ○ Abstracts: 30 million PubMed, Agricola and patent records, updated daily ○ Full text articles: over 3 million full text articles, of which over 900,000 are free to read and reuse, updated daily
  • 4. Services in Europe PMC ● RESTful web service: ○ http://europepmc.org/RestfulWebService ○ Text-mined terms, metadata, full text ● ORCID article claiming tool ● Embassy Cloud for 3rd party contents providers ● BioJS literature module: http://biojs.io/d/biojs-vis- pmccitation ● RSS
  • 5. Linking Literature ● Europe PMC provides various types of linking methods ○ By external links: to any URL (e.g., database, Wikipedia, press release, etc.) ○ By text mining ■ Biological entities ■ Identifiers (e.g., accession numbers) ○ By ORCID (article claims) ● 24 external links providers, 1 ORCID, 9 cross-reference DBs, 20 DB identifiers, 6 named entity types
  • 6. Linking Examples To By Relation REST API Wikipedia Provider Mention labsLinks Publons Provider Review labsLinks UniProt Curator Citation databaseLinks ORCID Provider Author search EFO Named entity tagger Recognition textMinedTerms PDB Accession number tagger Mention textMinedTerms
  • 7. Mining Identifiers in Free Text ● Motivation ○ Started for cross-linking with EBI databases ○ Data citation, impact analysis ○ Now, moving for linked data ● We use patterns from identifiers.org and link back to it. ● A IE problem: ID matching + NER for resource names ● Some ambiguities ○ PDB: 4min ○ OMIM and ERC funding id: both 6-digit numbers ○ Resource name variations: UniProt, Swiss-Prot, etc.
  • 8. Mentioned in Europe PMC articles Identifiers in Literature Databases ENA, PDB, ArrayExpress, UniProt, RefSNP, OMIM, PFam, RefSeq, Ensembl, InterPro, Bioproject, Biosample, EMDB, PXD, EGA, TreeFam Funding resources European Research Council Ontologies GO, UniProt, EFO, ChEBI, NCBI Taxonomy, UMLS Clinical Trials NCT, EudraCT Digital Repositories (Dryad, figshare, etc.) Data DOI
  • 9. Identifiers in Different Resources Articles (978,605) Patents 2014 (266,192) Wiki pages (15,346,290) db # articles db # patents db # pages ena/genbank/ ddbj 23,295 ena/genbank/ ddbj 4,074 pdb 4,265 pdb 15,544 uniprot 1,387 omim 2,226 nct 13,006 pdb 1,093 uniprot 1,712 refsnp 10,168 refseq 1,002 refseq 1,643 refseq 6,551 refsnp 322 ensembl 1,402 omim 5,093 omim 254 go 1,351 uniprot 2,865 pfam 115 pfam 582 go 1,900 ensembl 97 interpro 560 arrayexpress 1,832 interpro 46 ena/genbank/ ddbj 396
  • 10. Publishing Identifiers on RDF ● Goals ○ More connectivity ○ More provenance for each linking ■ PMCID, sentence, section label, etc. ○ Links to share and comment (e.g., hypothes.is) ● Challenges: ○ How to model? Web Annotation Data Model. ○ dealing with nearly a billion annotations generated automatically in a large scale
  • 11. Web Annotation Data Model ● Built on the top on RDF ● Annotations as resources ● To provide a standard description mechanism for sharing annotations between systems ● For more general purpose use ○ Not only for text mining ○ For example, YouTube video comments (by people), image annotation, etc. ○ W3C Working Draft
  • 12. Core Annotation Framework ● Typically an Annotation has a single Body, which is the comment or other descriptive resource, and a single Target that the Body is somehow "about". ● The Body provides the information which is annotating the Target. ● This "aboutness" may be further clarified or extended to notions such as classifying or identifying.
  • 13.
  • 14. Text-Mining RDF Service ● Running on EBI RDF Platform ● Stores 1,563,241,810 triples text-mined from 400,746 Open Access articles in Europe PubMed Central. ● Provides ○ for each article, all the annotations linking to ontologies/databases ○ with contexts: ■ sentences ■ section information
  • 15. Use Case for Database Curation ● Given an database identifier, provides sentence-level information for database curation. ○ Show all the articles where a PDB accession number 3NSS is mentioned. ○ Show all the annotations with each its label in PMC3382907. ○ Show all the articles where inflammatory bowel disease (C0021390) is mentioned. ● http://wwwdev.ebi.ac.uk/rdf/services/textmining/sparql
  • 16.
  • 17. Plans for BioHackathon 2015 ● Integration with other SPAQL endpoints ● Interoperability with other formats used in text-mining community ○ e.g., BioC, UIMA ● Produce more links on RDF
  • 18. References Europe PMC Consortium. Europe PMC: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Res. 2015 Jan;43(Database issue) D1042-8. doi:10.1093/nar/gku1061. PMID: 25378340; PMCID: PMC4383902. Kafkas Ş, Kim JH, McEntyre JR. Database citation in full text biomedical articles. PLoS One. 2013;8(5) e63184. doi:10.1371/journal.pone.0063184. PMID: 23734176; PMCID: PMC3667078. Juty N, Le Novère N, Laibe C. Identifiers.org and MIRIAM Registry: community resources to provide persistent identification. Nucleic Acids Res. 2012 Jan;40(Database issue) D580-6. doi:10.1093 /nar/gkr1097. PMID: 22140103; PMCID: PMC3245029.