50. Why using ontologies? They structure the knowledge from a domain They specify terms that can be used by natural language processing algorithms to process text They uniquely identify concept (URI) They specify relations between concepts that can be used for computing concept similarity They define hierarchies allowing abstraction of type They play the role of common denominator for various data from a domain INRIA - EXMO seminar - March 24th, 2010 8
61. e.g., concepts in ontologies are removed INRIA - EXMO seminar - March 24th, 2010 10
62. Why is it a hard problem? (2/2) How to leverage the knowledge contained in ontologies? Process the transitive closure for relations (not trivial for ontologies with 300k concepts) Execute semantic distance algorithms to determine similarity Compute mappings between ontologies to connect ontologies one another Keep all of this up to date when ontologies evolve e.g., new GO version everyday INRIA - EXMO seminar - March 24th, 2010 11
63. Ontology-based annotation workflow INRIA - EXMO seminar - March 24th, 2010` 12 First, direct annotations are created by recognizing concepts in raw text, Second, annotations are semantically expanded using knowledge of the ontologies, Third, all annotations are scored according to the context in which they have been created.
64.
65. 220 ontologies, ~4.2M concepts & ~7.9M termsUses NCIBI Mgrep, a syntactic concept recognizer High degree of accuracy Fast, scalable, Domain independent 13 INRIA - EXMO seminar - March 24th, 2010`
81. We have used the annotation workflow to annotate some common resources (gene expression data, clinical trials, articles) and index then by concepts
100. OBR results available in NCBO BioPortal INRIA - EXMO seminar - March 24th, 2010 22 Example of resource available (name and description) Number of annotations in the OBR index Ontology concept/term browsed Title and URL link to the original element Context in which an element has been annotated ID of an element
101.
102. 24 Good use of the semantics (2/2) INRIA - EXMO seminar - March 24th, 2010
111. Higher number of terms for which we return results. Significant improve in the case of AE or GEO.INRIA - EXMO seminar - March 24th, 2010 25 [BMC BioInformatics 09]
112. Technical details Workflow and pre-computation of the data Java, JDBC & MySQL (prototypes) Java, Spring/Hibernate & MySQL (production) Services deployed as REST web services Tomcat & RestLet INRIA - EXMO seminar - March 24th, 2010 26
113. 27 Users… INRIA - EXMO seminar - March 24th, 2010 Ontology-based services (OBS) NCBO Biomedical Resources index service NCBO Annotator web service BioPortal services UMLS services UCSF Laboratree CollabRx UCHSC PharmGKB, JAX HGMD BioPortal UI PDB/PLoS I2B2 NextBio IO informatics “Resources” tab` Knewco IO informatics CaNanoLab
114.
115. Decide which clinical trials are relevant for a particular patient. Use the annotator service to map clinical-trial eligibility criteria to concepts from UMLS
151. Dr. Fan Meng 31 INRIA - EXMO seminar - March 24th, 2010
152. Thank youNational Center for BioMedical Ontologyhttp://www.bioontology.orgBioPortal, biomedical ontology repositoryhttp://bioportal.bioontology.orgContact mejonquet@stanford.edu
Editor's Notes
Let’s try to understand the context of this work and what we mean by semantic annotation.
Ontology based annotation is not wide-spread; possibly because of:Lack of a one stop shop for bio-ontologiesLack of tools to annotate datasetsManual will not scaleAutomatic can it be ‘good enough’?Lack of a sustainable mechanism to create ontology based annotations
They structure the knowledge from a domainThey specify terms that can be used by natural language processing algorithms to process textThey uniquely identify concept (URI)They specify relations between concepts that can be used for computing concept similarityThey define hierarchies allowing abstraction of typeThey play the role of common denominator for various data from a domain
Uses a dictionary (or lexicon): a list of strings that identifies ontology conceptsConstructed by accessing ontologies and pooling all concept names or other string forms (synonyms, labels) that syntactically identify conceptsWe use Mgrep, a syntactic concept recognizerDeveloped by University of Michigan – NCIBIHas a very high degree of accuracy (over 95% in recognizing disease names)Fast, scalable, domain independentAnother AMIA STB 2009 presentation (tomorrow, 1:50pm)Mgrep vs. MetaMap evaluation Higher precision & faster Not limited to UMLS terminologies
Performing a search of GEO using OBR. A user searching for “melanoma” in Bioportal is able to view the set of online data resources that have been annotated with the ontology terms related to this query. The GEO element “melanoma progression” is returned as a pertinent element for this search. (Note: In the current version, we have dealt only with element titles and descriptions to validate the notion of context awareness. Later, we will process the more of the metadata structure to enable a finer grained level of detail.) The display within BioPortal allows the user to view the original data set with a single click.
Specific evaluation with external users on progressCenter for Clinical and Translational InformaticsJackson LabUniv. of Indiana (research management system)
Let’s try to understand the context of this work and what we mean by semantic annotation.