SlideShare a Scribd company logo
1 of 14
Download to read offline
ContentMine:
extracting millions of facts from scientific literature
@jenny_molloy EMBL-EBI – 17 June 2015
What is content?
What is mining?
1982
“Automatically generating logical representations of
text passages... by means of an analysis of the
coherence structure of the passages.”
Jerry R. Hobbs, Donald E. Walker, and Robert A. Amsler. 1982. Natural language access to structured text. In Proceedings of the 9th
conference on Computational linguistics - Volume 1(COLING '82), Ján Horecký (Ed.), Vol. 1. Academia Praha, , Czechoslovakia, 127-132.
DOI=10.3115/991813.991833 http://dx.doi.org/10.3115/991813.991833
2008
“The use of automated methods for exploiting
the enormous amount of knowledge available in
the biomedical literature.”
Cohen, K. Bretonnel; Hunter, Lawrence (2008). "Getting Started in Text Mining". PLoS Computational
Biology 4 (1): e20. doi:10.1371/journal.pcbi.0040020. PMC 2217579.PMID 18225946.
Mining Examples
Building bacterial supertrees
Mining chemical reactions
Better genome annotation
Only ~4% phylogenetic analyses
make underlying data available.
Supertrees
Content Mining enables AUTOMATED
extraction from daily literature and
conversion to NeXML:
- Machine-readable
- Open
- Reuseable
RAW data would be optimal!
PLUTo: Ross Mounce & Peter Murray-Rust
Chemistry
AMI reads and recognises chemicals
structures.
Can even create reaction animation.
Natural language processing
can be used to analyse
chemical methods. These are
FACTS but the paper itself may
be copyrighted.
Clinical Trials
Clinical trials offer clear use cases
for content mining.
Data extraction from graphs could be very
useful for meta-analyses where raw data is
unavailable.
Annotation
Many applications:
- Find primers
- Enhance positive controls
- Find novel sequence information
- More detailed and accurate annotation
Potential to improve
quality and efficiency
of genomic research.
Legal Considerations
Copyright
Database
rights
Contract
Law
2011
2014
From 2014
UK Law
Workshops, hackdays, presentations, collaborations,
discussions with librarians and publishers.
Putting new rights into action.
In Europe
2013
Shortly after
20132015
Research commisioned through H2020...any EU Directive >5 years away.
Ireland already considering following UK - plus other member states?.
Thank you very much
for your attention!
Any questions?
Peter Murray-Rust
Ross Mounce
Richard Smith-Unna
Steph Unna
Jenny Molloy
Mark MacGillivray
Graham Steel
Stefan Kasberger
Christopher Kittel
With thanks to:
Charles Oppenheim
Michelle Brook
Follow
@TheContentMine
contentmine.org
Find the code on
github.com/ContentMine
Funded by:
All images are licensed under CC-BY unless otherwise stated
What is Content?
Phylogenetic Tree from Figure 1 in Evolution and Taxonomic Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31,
HPV33, HPV35, HPV52, HPV58 and HPV67. Chen Z, Schiffman M, Herrero R, DeSalle R, Anastos K, et al. (2011) Evolution and Taxonomic
Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. PLoS ONE 6(5):
e20183. doi: 10.1371/journal.pone.0020183
Graph from He F, Fromion V, Westerhoff HV. (Im)Perfect robustness and adaptation of metabolic networks subject to metabolic and gene-expression
regulation: marrying control engineering with metabolic control analysis. BMC Syst Biol. 2013;7 131. doi:10.1186/1752-0509-7-131. PubMed PMID:
24261908; PubMed Central PMCID: PMC4222491.
Table from Table 1 Young GR, Mavrommatis B, Kassiotis G. Microarray analysis reveals global modulation of endogenous retroelement transcription by
microbes. Retrovirology. 2014;11 59. doi:10.1186/1742-4690-11-59. PubMed PMID: 25063042; PubMed Central PMCID: PMC4222864.
Text from Laidlaw CT, Condon JM, Belk MC. Viability Costs of Reproduction and Behavioral Compensation in Western Mosquitofish (Gambusia affinis).
PLoS One. 2014;9(11) e110524. doi:10.1371/journal.pone.0110524. PubMed PMID: 25365426; PubMed Central PMCID: PMC4217728.
Cell microscopy image from Pettinato G, Vanden Berg-Foels WS, Zhang N, Wen X. ROCK Inhibitor Is Not Required for Embryoid Body Formation from
Singularized Human Embryonic Stem Cells. PLoS One. 2014;9(11) e100742. doi:10.1371/journal.pone.0100742. PubMed PMID: 25365581; PubMed
Central PMCID: PMC4217711.
Supertrees:
Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One.
2013;8(4) e62510. doi:10.1371/journal.pone.0062510. PubMed PMID: 23638103; PubMed Central PMCID: PMC3636077.
McDowell A, Nagy I, Magyari M, Barnard E, Patrick S. The opportunistic pathogen Propionibacterium acnes: insights into typing, human disease, clonal
diversification and CAMP factor evolution. PLoS One. 2013;8(9) e70897. doi:10.1371/journal.pone.0070897. PubMed PMID: 24058439; PubMed Central
PMCID: PMC3772855.
Chemistry:
Diagram from Klejnstrup ML, Frandsen RJ, Holm DK, Nielsen MT, Mortensen UH, Larsen TO, Nielsen JB. Genetics of Polyketide Metabolism in
Aspergillus nidulans. Metabolites. 2012;2(1) 100-133. doi:10.3390/metabo2010100. PubMed PMID: 24957370; PubMed Central PMCID: PMC3901194.
Methods text from Greshock, T. J., Grubbs, A. W., Jiao, P., Wicklow, D. T., Gloer, J. B., & Williams, R. M. (2008). Isolation, Structure Elucidation, and
Biomimetic Total Synthesis of Versicolamide B, and the Isolation of Antipodal (−)‐Stephacidin A and (+)‐Notoamide B from Aspergillus versicolor NRRL
35600. Angewandte Chemie m frokInternational Edition, 47(19), 3573-3577.
Annotation:
Stubben, C. J., & Challacombe, J. F. (2014). Mining locus tags in PubMed Central to improve microbial gene annotation. BMC bioinformatics, 15(1), 43.
Figure from Haeussler, M., Gerner, M., & Bergman, C. M. (2011). Annotating genes and genomes with DNA sequences extracted from biomedical
articles. Bioinformatics, 27(7), 980-986.

More Related Content

What's hot

Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011
NEB-2011
 
HVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry CuttingHVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry Cutting
Human Variome Project
 
Bioanth.5hominidconc
Bioanth.5hominidconcBioanth.5hominidconc
Bioanth.5hominidconc
jmckendricks
 

What's hot (20)

Databases Part II
Databases Part IIDatabases Part II
Databases Part II
 
Anticancer ruthenium(ii) complexes - Anjali Devi J S
Anticancer ruthenium(ii) complexes - Anjali Devi J SAnticancer ruthenium(ii) complexes - Anjali Devi J S
Anticancer ruthenium(ii) complexes - Anjali Devi J S
 
CV_Todd Lorenz
CV_Todd LorenzCV_Todd Lorenz
CV_Todd Lorenz
 
Gene
GeneGene
Gene
 
Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011Bacillus anthracis-NEB2011
Bacillus anthracis-NEB2011
 
HVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry CuttingHVP5: Meeting summary and thoughts - Garry Cutting
HVP5: Meeting summary and thoughts - Garry Cutting
 
Kowalewski
KowalewskiKowalewski
Kowalewski
 
Human genetic diversity. ESHG Barcelona
Human genetic diversity. ESHG BarcelonaHuman genetic diversity. ESHG Barcelona
Human genetic diversity. ESHG Barcelona
 
Jonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
Jonathan Eisen talk for 2019 ADVANCE Scholar Award SymposiumJonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
Jonathan Eisen talk for 2019 ADVANCE Scholar Award Symposium
 
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
Genome Wide SNP Analysis for Inferring the Population Structure and Genetic H...
 
Marine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and OpportunitiesMarine Host-Microbiome Interactions: Challenges and Opportunities
Marine Host-Microbiome Interactions: Challenges and Opportunities
 
1
11
1
 
Nanoweapons: Nanotechnology Weapons Of Genocide
Nanoweapons: Nanotechnology Weapons Of GenocideNanoweapons: Nanotechnology Weapons Of Genocide
Nanoweapons: Nanotechnology Weapons Of Genocide
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
leung-summary
leung-summaryleung-summary
leung-summary
 
Bioanth.5hominidconc
Bioanth.5hominidconcBioanth.5hominidconc
Bioanth.5hominidconc
 
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
Wagner College Forum for Undergraduate Research, Vol. 15 No. 2
 
CHEM 2P20 SciFinder Exercise November 2015
CHEM 2P20 SciFinder Exercise November 2015CHEM 2P20 SciFinder Exercise November 2015
CHEM 2P20 SciFinder Exercise November 2015
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
Lesson 5.1 Activity: The Tree of Life
Lesson 5.1 Activity: The Tree of LifeLesson 5.1 Activity: The Tree of Life
Lesson 5.1 Activity: The Tree of Life
 

Viewers also liked

Viewers also liked (20)

Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for Everyone
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
 

Similar to ContentMine (EMBL-EBI Industry Programme)

The genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitismThe genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitism
João Soares
 
CV.Ximiao_He
CV.Ximiao_HeCV.Ximiao_He
CV.Ximiao_He
Ximiao He
 
Resume-CV - April 2016
Resume-CV - April 2016Resume-CV - April 2016
Resume-CV - April 2016
Avanti Gokhale
 
s4021907_phd_finalthesis
s4021907_phd_finalthesiss4021907_phd_finalthesis
s4021907_phd_finalthesis
Paul Berkman
 
Referencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodadoReferencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodado
elenard6
 
A natural history of the human mind
A natural history of the human mindA natural history of the human mind
A natural history of the human mind
Francys Subiaul
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral Medicine
Harold Slavkin
 

Similar to ContentMine (EMBL-EBI Industry Programme) (20)

ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Tsoi cv umms_8_6_15
Tsoi cv umms_8_6_15Tsoi cv umms_8_6_15
Tsoi cv umms_8_6_15
 
The genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitismThe genomes of four tapeworm species reveal adaptations to parasitism
The genomes of four tapeworm species reveal adaptations to parasitism
 
My presentation2
My presentation2My presentation2
My presentation2
 
ncomms10165
ncomms10165ncomms10165
ncomms10165
 
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
 
BioPosterPP
BioPosterPPBioPosterPP
BioPosterPP
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Bio
BioBio
Bio
 
CV.Ximiao_He
CV.Ximiao_HeCV.Ximiao_He
CV.Ximiao_He
 
References on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. BishopReferences on Reproducibility Crisis in Science by D.V.M. Bishop
References on Reproducibility Crisis in Science by D.V.M. Bishop
 
Human Genome Project
Human Genome ProjectHuman Genome Project
Human Genome Project
 
Resume-CV - April 2016
Resume-CV - April 2016Resume-CV - April 2016
Resume-CV - April 2016
 
Sarah Fox-Greer Resume
Sarah Fox-Greer ResumeSarah Fox-Greer Resume
Sarah Fox-Greer Resume
 
s4021907_phd_finalthesis
s4021907_phd_finalthesiss4021907_phd_finalthesis
s4021907_phd_finalthesis
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Human genome project and elsi
Human genome project and elsiHuman genome project and elsi
Human genome project and elsi
 
Referencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodadoReferencias bibliograficas en formato apa y vancouver de elena rodado
Referencias bibliograficas en formato apa y vancouver de elena rodado
 
A natural history of the human mind
A natural history of the human mindA natural history of the human mind
A natural history of the human mind
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral Medicine
 

More from Jenny Molloy (6)

Engineering Life with Synthetic Biology
Engineering Life with Synthetic BiologyEngineering Life with Synthetic Biology
Engineering Life with Synthetic Biology
 
YEAR Conference 2015 - How to share our research data
YEAR Conference 2015 - How to share our research dataYEAR Conference 2015 - How to share our research data
YEAR Conference 2015 - How to share our research data
 
Legal Framework for TDM
Legal Framework for TDMLegal Framework for TDM
Legal Framework for TDM
 
Introducing Open Science
Introducing Open ScienceIntroducing Open Science
Introducing Open Science
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro Slides
 
Id2 presentation
Id2 presentationId2 presentation
Id2 presentation
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Recently uploaded (20)

Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 

ContentMine (EMBL-EBI Industry Programme)

  • 1. ContentMine: extracting millions of facts from scientific literature @jenny_molloy EMBL-EBI – 17 June 2015
  • 3. What is mining? 1982 “Automatically generating logical representations of text passages... by means of an analysis of the coherence structure of the passages.” Jerry R. Hobbs, Donald E. Walker, and Robert A. Amsler. 1982. Natural language access to structured text. In Proceedings of the 9th conference on Computational linguistics - Volume 1(COLING '82), Ján Horecký (Ed.), Vol. 1. Academia Praha, , Czechoslovakia, 127-132. DOI=10.3115/991813.991833 http://dx.doi.org/10.3115/991813.991833 2008 “The use of automated methods for exploiting the enormous amount of knowledge available in the biomedical literature.” Cohen, K. Bretonnel; Hunter, Lawrence (2008). "Getting Started in Text Mining". PLoS Computational Biology 4 (1): e20. doi:10.1371/journal.pcbi.0040020. PMC 2217579.PMID 18225946.
  • 4.
  • 5. Mining Examples Building bacterial supertrees Mining chemical reactions Better genome annotation
  • 6. Only ~4% phylogenetic analyses make underlying data available. Supertrees Content Mining enables AUTOMATED extraction from daily literature and conversion to NeXML: - Machine-readable - Open - Reuseable RAW data would be optimal! PLUTo: Ross Mounce & Peter Murray-Rust
  • 7. Chemistry AMI reads and recognises chemicals structures. Can even create reaction animation. Natural language processing can be used to analyse chemical methods. These are FACTS but the paper itself may be copyrighted.
  • 8. Clinical Trials Clinical trials offer clear use cases for content mining. Data extraction from graphs could be very useful for meta-analyses where raw data is unavailable.
  • 9. Annotation Many applications: - Find primers - Enhance positive controls - Find novel sequence information - More detailed and accurate annotation Potential to improve quality and efficiency of genomic research.
  • 11. 2011 2014 From 2014 UK Law Workshops, hackdays, presentations, collaborations, discussions with librarians and publishers. Putting new rights into action.
  • 12. In Europe 2013 Shortly after 20132015 Research commisioned through H2020...any EU Directive >5 years away. Ireland already considering following UK - plus other member states?.
  • 13. Thank you very much for your attention! Any questions? Peter Murray-Rust Ross Mounce Richard Smith-Unna Steph Unna Jenny Molloy Mark MacGillivray Graham Steel Stefan Kasberger Christopher Kittel With thanks to: Charles Oppenheim Michelle Brook Follow @TheContentMine contentmine.org Find the code on github.com/ContentMine Funded by:
  • 14. All images are licensed under CC-BY unless otherwise stated What is Content? Phylogenetic Tree from Figure 1 in Evolution and Taxonomic Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. Chen Z, Schiffman M, Herrero R, DeSalle R, Anastos K, et al. (2011) Evolution and Taxonomic Classification of Human Papillomavirus 16 (HPV16)-Related Variant Genomes: HPV31, HPV33, HPV35, HPV52, HPV58 and HPV67. PLoS ONE 6(5): e20183. doi: 10.1371/journal.pone.0020183 Graph from He F, Fromion V, Westerhoff HV. (Im)Perfect robustness and adaptation of metabolic networks subject to metabolic and gene-expression regulation: marrying control engineering with metabolic control analysis. BMC Syst Biol. 2013;7 131. doi:10.1186/1752-0509-7-131. PubMed PMID: 24261908; PubMed Central PMCID: PMC4222491. Table from Table 1 Young GR, Mavrommatis B, Kassiotis G. Microarray analysis reveals global modulation of endogenous retroelement transcription by microbes. Retrovirology. 2014;11 59. doi:10.1186/1742-4690-11-59. PubMed PMID: 25063042; PubMed Central PMCID: PMC4222864. Text from Laidlaw CT, Condon JM, Belk MC. Viability Costs of Reproduction and Behavioral Compensation in Western Mosquitofish (Gambusia affinis). PLoS One. 2014;9(11) e110524. doi:10.1371/journal.pone.0110524. PubMed PMID: 25365426; PubMed Central PMCID: PMC4217728. Cell microscopy image from Pettinato G, Vanden Berg-Foels WS, Zhang N, Wen X. ROCK Inhibitor Is Not Required for Embryoid Body Formation from Singularized Human Embryonic Stem Cells. PLoS One. 2014;9(11) e100742. doi:10.1371/journal.pone.0100742. PubMed PMID: 25365581; PubMed Central PMCID: PMC4217711. Supertrees: Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One. 2013;8(4) e62510. doi:10.1371/journal.pone.0062510. PubMed PMID: 23638103; PubMed Central PMCID: PMC3636077. McDowell A, Nagy I, Magyari M, Barnard E, Patrick S. The opportunistic pathogen Propionibacterium acnes: insights into typing, human disease, clonal diversification and CAMP factor evolution. PLoS One. 2013;8(9) e70897. doi:10.1371/journal.pone.0070897. PubMed PMID: 24058439; PubMed Central PMCID: PMC3772855. Chemistry: Diagram from Klejnstrup ML, Frandsen RJ, Holm DK, Nielsen MT, Mortensen UH, Larsen TO, Nielsen JB. Genetics of Polyketide Metabolism in Aspergillus nidulans. Metabolites. 2012;2(1) 100-133. doi:10.3390/metabo2010100. PubMed PMID: 24957370; PubMed Central PMCID: PMC3901194. Methods text from Greshock, T. J., Grubbs, A. W., Jiao, P., Wicklow, D. T., Gloer, J. B., & Williams, R. M. (2008). Isolation, Structure Elucidation, and Biomimetic Total Synthesis of Versicolamide B, and the Isolation of Antipodal (−)‐Stephacidin A and (+)‐Notoamide B from Aspergillus versicolor NRRL 35600. Angewandte Chemie m frokInternational Edition, 47(19), 3573-3577. Annotation: Stubben, C. J., & Challacombe, J. F. (2014). Mining locus tags in PubMed Central to improve microbial gene annotation. BMC bioinformatics, 15(1), 43. Figure from Haeussler, M., Gerner, M., & Bergman, C. M. (2011). Annotating genes and genomes with DNA sequences extracted from biomedical articles. Bioinformatics, 27(7), 980-986.