SlideShare a Scribd company logo
1 of 43
Content Mining (TDM)
Peter Murray-Rust,
ContentMine.org and UniversityofCambridge
JISC Digifest, Birmingham, UK, 2016-03-02
Invited and Sponsored by JISC
F/OSS tools from contentmine.org
Images from Wikimedia CC-BY-SA
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
Overview
• Open Semistructured Documents .are the most exciting
underutilised knowledge resource
– Scholarly literature
– Theses
– Clinical trials
– Government and NGO publications
– Product information …
• Content Mining can make huge contributions.
• EuropePubMedCentral(*) is the world’s best place to start.
• Socio-politico-legal aspects cannot be ignored.
• (*) Wellcome Trust, RCUK, FWF (Austria), Cancer Research UK, NHS UK ….
Mining strategy
• Discover. negotiate permissions . => bibliography
• Crawl / Scrape (download), documents AND
supplemental
• Normalize. PDF => XML
• Index: facets => Facts and snippets (“entities”)
• Interpret/analyze entities => relationships,
aggregations (“Transformative”)
• Publish
catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
Want to know about Zika?
Just Type:
ZIKA!
Semantic Fulltext
• EuropePMC coherent OpenAccess
• getpapers: query , download (through API).
• AMI filters, checks[1], transforms facts in papers.
• sequences, species, genera, genes,
dictionaries
[0] All operations shown run in total of <3 minutes.
[1] Dictionaries and lookup.
[2] Usable from home by anyone
Zika endemic areas
Wikimedia CC-BY-SA
Download all Open Access “Zika” from
EuropePMC in 10 seconds
(click below for movie)
Aedes aegypti, Wikimedia CC-BY-SA
Note: movies of this and other slides can be seen at https://vimeo.com/154705161
Downloaded all Open Access “Zika” from
EuropePMC in 10 seconds
Final download screen
Eyeballing 20/120 Zika papers,
click below for movie
Yellow Fever Virus
Wikimedia CC-BY-SA
Note: movie of this and other slides can be seen at https://vimeo.com/154705161
3011 virus
1939 Ae./Aedes
1212 dengue
901 mosquito/es
894 species
791 ZIKV
721 using
716 DENV
567 detection
513 aegypti
484 infection
442 RNA
428 protein
401 albopictus
360 viral
Commonest words in 120 Zika papers
Mosquito spp.
Wikimedia CC-BY-SA
Filtering local files for sequence and viruses
AMI (part of ContentMine software)
(click below for movie)
Note: movies of this and other slides can be seen at https://vimeo.com/154705161
DNA Primers in running text
…the sodium channel voltage dependent gene (Nav). Primers
used to amplify this fragment were AaNaA
5’-ACAATGTGGATCGCTTCCC-3’
and AaNaB 5’-TGGACAAAAGCAAGGCTAAG-3’(8).
The primers amplify a fragment of approximately 472…
Snippet (quotable under 2014 UK Statutory Instrument (“Hargreaves”):
~/PMC4654492/results/sequence/dnaprimer/results.xml”
W3C Annotation
[PREFIX]
[MATCH] (link to target)
[SUFFIX]
CMine structure
plugin
option
DNA double stranded fragment
Wikimedia CC-BY-SA
Commonest species in 120 Zika papers
423 Ae./Aedes aegypti
333 Ae./Aedes albopictus
63 Ae. bromeliae
58 Ae. lilii
46 Ae. hensilli
42 Glossina pallidipes
40 Plasmodium vivax
35 Ae. luteocephalus
28 Ae. vittatus
25 Ae. furcifer
22 Plasmodium falciparum
21 Drosophila melanogaster
pre=“fever (DHF), are caused by the world's most prevalent mosquito-borne virus.
37 DENV is carried by " exact="Aedes aegypti” post=" mosquito, which is strongly
affected by ecological and human drivers, but also influenced by clima" name="binomial"/>
183 Wolbachia
70 Aedes
69 Flavivirus/Flaviviridae
30 Glossina
17 Culex
Commonest genera in Zika papers
pre=”…-negative endosymbiotic bacterium, is a promising tool against diseases
transmitted by mosquitoes. " exact="Wolbachia” post=" can be found worldwide in
numerous arthropod species. More than 65% of all insect species are natu…”
Wolbachia in insect cell
Wikimedia CC-BY-SA
38 ITS
20 MHC2TA
19 COI
14 CYPJ92
5 CYP6BB2
4 CYP9J28
3 MHC
Commonest genes in 120 Zika papers
• microcephaly 400/2400 papers; 2 mins;
commonest genes:
203 MCPH1
86 MECP2
54 SOX2
49 E2F1
47 SNAP29
40 IKBKG
40 NDE1
N-terminal domain of microcephalin
Wikimedia CC-BY-SA
Systematic Reviews
Researchers and their machines need to “read”
hundreds of papers a day or even more.
Polly has 20 seconds to read this paper…
…and 10,000 more
ContentMine software can do this in a few minutes
Polly: “there were 10,000 abstracts and due
to time pressures, we split this between 6
researchers. It took about 2-3 days of work
(working only on this) to get through
~1,600 papers each. So, at a minimum this
equates to 12 days of full-time work (and
would normally be done over several weeks
under normal time pressures).”
400,000 Clinical Trials
In 10 government registries
Mapping trials => papers
http://www.trialsjournal.com/content/16/1/80
2009 => 2015. What’s
happened in last 6 years??
Search the whole scientific literature
For “2009-0100068-41”
Extracting scientific information
Mining strategy
• Discover. negotiate permissions . => bibliography
• Crawl / Scrape (download), documents AND
supplemental
• Normalize. PDF => XML
• Index: facets => Facts and snippets (“entities”)
• Interpret/analyze entities => relationships,
aggregations (“Transformative”)
• Publish
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
Facts in context
daily IUCN endangered species news
en.wikipedia.org CC By-SA
ContentMine Fact of The Day
• Fact of the day
• Endangered species in recent science
• Facts
• Bubbles
https://en.wikipedia.org/wiki/Tree_of_life CC BY-SA
“Root”
4500 papers each
with 1 tree
OCR (Tesseract)
Norma (imageanalysis)
(((((Pyramidobacter_piscolens:195,Jonquetella_anthropi:135):86,Synergistes_jonesii:301):131,Thermotoga
_maritime:357):12,(Mycobacterium_tuberculosis:223,Bifidobacterium_longum:333):158):10,((Optiutus_te
rrae:441,(((Borrelia_burgdorferi:…202):91):22):32,(Proprinogenum_modestus:124,Fusobacterium_nucleat
um:167):217):11):9);
Semantic re-usable/computable output (ca 4 secs/image)
Supertree for 924 species
Tree
Supertree created from 4300 papers
Socio-politico-legal
• TDM is one of the most complex, uncertain,
confrontational, political, areas of human
endeavour.
Copyright and Mining
• PMR-premise: You cannot do reproducible
scientific mining and avoid violating copyright.
• UK (“Hargreaves”) 2014 legislation:
– “personal” “non-commercial*” “research” “data
analytics”
– legitimizes copying (?to disk), but not publishing
*teaching, textbooks, etc. may be “commercial”
STM Publishers prevent Mining
• FUD & disinformation about legality (Elsevier)
• Monopolies on infrastructure (“API”s, CCC
Rightfind)
• Technical obstruction (Wiley Captcha,
Macmillan Readcube)
• Restrictive contracts with libraries (ALL) [1]
• Wasting my/our time (ALL)
[1] [You may not] utilize the TDM Output to enhance … subject repositories
in a way that would [… ] have the potential to substitute and/or replicate
any other existing Elsevier products, services and/or solutions.
WILEY … “new security feature… to prevent systematic download of content
“[limit of] 100 papers per day”
“essential security feature … to protect both parties (sic)”
CAPTCHA
User has to type words
ContentMine working with Libraries
• Cambridge: Library, Plant Sciences,
Epidemiology, Chemistry
• Cochrane Collaboration on Systematic Reviews
of Clinical Trials
• FutureTDM (H2020, LIBER)
• Running workshops and training
CM Future
• Hypothes.is use ContentMine results for annotation
• (with Cambridge Univ Library) extracting daily
scientific facts from open and closed literature.
• with EBI, Cochrane Collaborations, JISC, OKF, LIBER,
TGAC/JohnInnes, DNADigest.
• Running workshops, hackdays.
• Planned outreach: MEPs, EC, Slashdot, Reddit,
Kickstarter, geekdom
• http://contentmine.org (OpenLock non-profit)
ContentMine working with Libraries
• Cambridge: Library, Plant Sciences, Epidemiology,
Chemistry
• Cochrane Collaboration on Systematic Reviews of
Clinical Trials
• FutureTDM (H2020, LIBER)
• Running workshops and training
• Offers services for information extraction and
indexing for born-digital documents.
Tractable Open Repositories
• CORE
• OpenAIRE
• arXiv
• HAL
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org

More Related Content

What's hot

Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in CambridgeTheContentMine
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistrypetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literaturepetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS TheContentMine
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgpetermurrayrust
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 

What's hot (20)

Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 

Viewers also liked

OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!petermurrayrust
 
Social, Political and Legal Aspects of Text and Data Mining (TDM)
Social, Political and Legal Aspects of Text and Data Mining (TDM)Social, Political and Legal Aspects of Text and Data Mining (TDM)
Social, Political and Legal Aspects of Text and Data Mining (TDM)Richard Smith-Unna
 
Why Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny MolloyWhy Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny MolloyOpenAIRE
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Ross Mounce
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDatapetermurrayrust
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Sciencepetermurrayrust
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Imagespetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 

Viewers also liked (12)

Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!
 
Europe PMC Section Tagger
Europe PMC Section TaggerEurope PMC Section Tagger
Europe PMC Section Tagger
 
Social, Political and Legal Aspects of Text and Data Mining (TDM)
Social, Political and Legal Aspects of Text and Data Mining (TDM)Social, Political and Legal Aspects of Text and Data Mining (TDM)
Social, Political and Legal Aspects of Text and Data Mining (TDM)
 
Why Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny MolloyWhy Open Data Means Better Science – Jenny Molloy
Why Open Data Means Better Science – Jenny Molloy
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]Open scholarship [a FOSTER open science talk]
Open scholarship [a FOSTER open science talk]
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Csvconf
CsvconfCsvconf
Csvconf
 

Similar to ContentMine (TDM) at JISC Digifest

ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research dataHerbert Gruttemeier
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trialspetermurrayrust
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical TrialsTheContentMine
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Jamie Bisset
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationAndrew Treloar
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Pedro Príncipe
 

Similar to ContentMine (TDM) at JISC Digifest (20)

ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research data
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly Communication
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 

More from petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyonepetermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017petermurrayrust
 

More from petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 

Recently uploaded

97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...
call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...
call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...saminamagar
 
call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.ANJALI
 
Informed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxInformed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxSasikiranMarri
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfSasikiranMarri
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptxTina Purnat
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingArunagarwal328757
 
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfDolisha Warbi
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdfDolisha Warbi
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfDivya Kanojiya
 
Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?bkling
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Badalona Serveis Assistencials
 
LUNG TUMORS AND ITS CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS  CLASSIFICATIONS.pdfLUNG TUMORS AND ITS  CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS CLASSIFICATIONS.pdfDolisha Warbi
 
Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!ibtesaam huma
 

Recently uploaded (20)

97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...
call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...
call girls in Dwarka Sector 21 Metro DELHI 🔝 >༒9540349809 🔝 genuine Escort Se...
 
call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in aerocity DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.
 
Informed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptxInformed Consent Empowering Healthcare Decision-Making.pptx
Informed Consent Empowering Healthcare Decision-Making.pptx
 
History and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdfHistory and Development of Pharmacovigilence.pdf
History and Development of Pharmacovigilence.pdf
 
The next social challenge to public health: the information environment.pptx
The next social challenge to public health:  the information environment.pptxThe next social challenge to public health:  the information environment.pptx
The next social challenge to public health: the information environment.pptx
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, Pricing
 
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in paharganj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdf
 
Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?Let's Talk About It: To Disclose or Not to Disclose?
Let's Talk About It: To Disclose or Not to Disclose?
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
 
LUNG TUMORS AND ITS CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS  CLASSIFICATIONS.pdfLUNG TUMORS AND ITS  CLASSIFICATIONS.pdf
LUNG TUMORS AND ITS CLASSIFICATIONS.pdf
 
Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!Biomechanics- Shoulder Joint!!!!!!!!!!!!
Biomechanics- Shoulder Joint!!!!!!!!!!!!
 

ContentMine (TDM) at JISC Digifest

  • 1. Content Mining (TDM) Peter Murray-Rust, ContentMine.org and UniversityofCambridge JISC Digifest, Birmingham, UK, 2016-03-02 Invited and Sponsored by JISC F/OSS tools from contentmine.org Images from Wikimedia CC-BY-SA
  • 2. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org
  • 3. Overview • Open Semistructured Documents .are the most exciting underutilised knowledge resource – Scholarly literature – Theses – Clinical trials – Government and NGO publications – Product information … • Content Mining can make huge contributions. • EuropePubMedCentral(*) is the world’s best place to start. • Socio-politico-legal aspects cannot be ignored. • (*) Wellcome Trust, RCUK, FWF (Austria), Cancer Research UK, NHS UK ….
  • 4. Mining strategy • Discover. negotiate permissions . => bibliography • Crawl / Scrape (download), documents AND supplemental • Normalize. PDF => XML • Index: facets => Facts and snippets (“entities”) • Interpret/analyze entities => relationships, aggregations (“Transformative”) • Publish
  • 5. catalogue getpapers query Daily Crawl EPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  • 6. Want to know about Zika? Just Type: ZIKA!
  • 7. Semantic Fulltext • EuropePMC coherent OpenAccess • getpapers: query , download (through API). • AMI filters, checks[1], transforms facts in papers. • sequences, species, genera, genes, dictionaries [0] All operations shown run in total of <3 minutes. [1] Dictionaries and lookup. [2] Usable from home by anyone Zika endemic areas Wikimedia CC-BY-SA
  • 8. Download all Open Access “Zika” from EuropePMC in 10 seconds (click below for movie) Aedes aegypti, Wikimedia CC-BY-SA Note: movies of this and other slides can be seen at https://vimeo.com/154705161
  • 9. Downloaded all Open Access “Zika” from EuropePMC in 10 seconds Final download screen
  • 10. Eyeballing 20/120 Zika papers, click below for movie Yellow Fever Virus Wikimedia CC-BY-SA Note: movie of this and other slides can be seen at https://vimeo.com/154705161
  • 11. 3011 virus 1939 Ae./Aedes 1212 dengue 901 mosquito/es 894 species 791 ZIKV 721 using 716 DENV 567 detection 513 aegypti 484 infection 442 RNA 428 protein 401 albopictus 360 viral Commonest words in 120 Zika papers Mosquito spp. Wikimedia CC-BY-SA
  • 12. Filtering local files for sequence and viruses AMI (part of ContentMine software) (click below for movie) Note: movies of this and other slides can be seen at https://vimeo.com/154705161
  • 13. DNA Primers in running text …the sodium channel voltage dependent gene (Nav). Primers used to amplify this fragment were AaNaA 5’-ACAATGTGGATCGCTTCCC-3’ and AaNaB 5’-TGGACAAAAGCAAGGCTAAG-3’(8). The primers amplify a fragment of approximately 472… Snippet (quotable under 2014 UK Statutory Instrument (“Hargreaves”): ~/PMC4654492/results/sequence/dnaprimer/results.xml” W3C Annotation [PREFIX] [MATCH] (link to target) [SUFFIX] CMine structure plugin option DNA double stranded fragment Wikimedia CC-BY-SA
  • 14. Commonest species in 120 Zika papers 423 Ae./Aedes aegypti 333 Ae./Aedes albopictus 63 Ae. bromeliae 58 Ae. lilii 46 Ae. hensilli 42 Glossina pallidipes 40 Plasmodium vivax 35 Ae. luteocephalus 28 Ae. vittatus 25 Ae. furcifer 22 Plasmodium falciparum 21 Drosophila melanogaster pre=“fever (DHF), are caused by the world's most prevalent mosquito-borne virus. 37 DENV is carried by " exact="Aedes aegypti” post=" mosquito, which is strongly affected by ecological and human drivers, but also influenced by clima" name="binomial"/>
  • 15. 183 Wolbachia 70 Aedes 69 Flavivirus/Flaviviridae 30 Glossina 17 Culex Commonest genera in Zika papers pre=”…-negative endosymbiotic bacterium, is a promising tool against diseases transmitted by mosquitoes. " exact="Wolbachia” post=" can be found worldwide in numerous arthropod species. More than 65% of all insect species are natu…” Wolbachia in insect cell Wikimedia CC-BY-SA
  • 16. 38 ITS 20 MHC2TA 19 COI 14 CYPJ92 5 CYP6BB2 4 CYP9J28 3 MHC Commonest genes in 120 Zika papers
  • 17. • microcephaly 400/2400 papers; 2 mins; commonest genes: 203 MCPH1 86 MECP2 54 SOX2 49 E2F1 47 SNAP29 40 IKBKG 40 NDE1 N-terminal domain of microcephalin Wikimedia CC-BY-SA
  • 18. Systematic Reviews Researchers and their machines need to “read” hundreds of papers a day or even more.
  • 19. Polly has 20 seconds to read this paper… …and 10,000 more
  • 20. ContentMine software can do this in a few minutes Polly: “there were 10,000 abstracts and due to time pressures, we split this between 6 researchers. It took about 2-3 days of work (working only on this) to get through ~1,600 papers each. So, at a minimum this equates to 12 days of full-time work (and would normally be done over several weeks under normal time pressures).”
  • 21. 400,000 Clinical Trials In 10 government registries Mapping trials => papers http://www.trialsjournal.com/content/16/1/80 2009 => 2015. What’s happened in last 6 years?? Search the whole scientific literature For “2009-0100068-41”
  • 23. Mining strategy • Discover. negotiate permissions . => bibliography • Crawl / Scrape (download), documents AND supplemental • Normalize. PDF => XML • Index: facets => Facts and snippets (“entities”) • Interpret/analyze entities => relationships, aggregations (“Transformative”) • Publish
  • 25. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  • 27. Open Content Mining of FACTs Machines can interpret chemical reactions We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
  • 28. Facts in context daily IUCN endangered species news en.wikipedia.org CC By-SA
  • 29. ContentMine Fact of The Day • Fact of the day • Endangered species in recent science • Facts • Bubbles
  • 33. Supertree for 924 species Tree
  • 34. Supertree created from 4300 papers
  • 35. Socio-politico-legal • TDM is one of the most complex, uncertain, confrontational, political, areas of human endeavour.
  • 36. Copyright and Mining • PMR-premise: You cannot do reproducible scientific mining and avoid violating copyright. • UK (“Hargreaves”) 2014 legislation: – “personal” “non-commercial*” “research” “data analytics” – legitimizes copying (?to disk), but not publishing *teaching, textbooks, etc. may be “commercial”
  • 37. STM Publishers prevent Mining • FUD & disinformation about legality (Elsevier) • Monopolies on infrastructure (“API”s, CCC Rightfind) • Technical obstruction (Wiley Captcha, Macmillan Readcube) • Restrictive contracts with libraries (ALL) [1] • Wasting my/our time (ALL) [1] [You may not] utilize the TDM Output to enhance … subject repositories in a way that would [… ] have the potential to substitute and/or replicate any other existing Elsevier products, services and/or solutions.
  • 38. WILEY … “new security feature… to prevent systematic download of content “[limit of] 100 papers per day” “essential security feature … to protect both parties (sic)” CAPTCHA User has to type words
  • 39. ContentMine working with Libraries • Cambridge: Library, Plant Sciences, Epidemiology, Chemistry • Cochrane Collaboration on Systematic Reviews of Clinical Trials • FutureTDM (H2020, LIBER) • Running workshops and training
  • 40. CM Future • Hypothes.is use ContentMine results for annotation • (with Cambridge Univ Library) extracting daily scientific facts from open and closed literature. • with EBI, Cochrane Collaborations, JISC, OKF, LIBER, TGAC/JohnInnes, DNADigest. • Running workshops, hackdays. • Planned outreach: MEPs, EC, Slashdot, Reddit, Kickstarter, geekdom • http://contentmine.org (OpenLock non-profit)
  • 41. ContentMine working with Libraries • Cambridge: Library, Plant Sciences, Epidemiology, Chemistry • Cochrane Collaboration on Systematic Reviews of Clinical Trials • FutureTDM (H2020, LIBER) • Running workshops and training • Offers services for information extraction and indexing for born-digital documents.
  • 42. Tractable Open Repositories • CORE • OpenAIRE • arXiv • HAL
  • 43. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org

Editor's Notes

  1. Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.