SlideShare a Scribd company logo
1 of 43
Download to read offline
Content Mining (TDM)
Peter Murray-Rust,
ContentMine.org and UniversityofCambridge
JISC Digifest, Birmingham, UK, 2016-03-02
Invited and Sponsored by JISC
F/OSS tools from contentmine.org
Images from Wikimedia CC-BY-SA
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org
Overview
• Open Semistructured Documents .are the most exciting
underutilised knowledge resource
– Scholarly literature
– Theses
– Clinical trials
– Government and NGO publications
– Product information …
• Content Mining can make huge contributions.
• EuropePubMedCentral(*) is the world’s best place to start.
• Socio-politico-legal aspects cannot be ignored.
• (*) Wellcome Trust, RCUK, FWF (Austria), Cancer Research UK, NHS UK ….
Mining strategy
• Discover. negotiate permissions . => bibliography
• Crawl / Scrape (download), documents AND
supplemental
• Normalize. PDF => XML
• Index: facets => Facts and snippets (“entities”)
• Interpret/analyze entities => relationships,
aggregations (“Transformative”)
• Publish
catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
Want to know about Zika?
Just Type:
ZIKA!
Semantic Fulltext
• EuropePMC coherent OpenAccess
• getpapers: query , download (through API).
• AMI filters, checks[1], transforms facts in papers.
• sequences, species, genera, genes,
dictionaries
[0] All operations shown run in total of <3 minutes.
[1] Dictionaries and lookup.
[2] Usable from home by anyone
Zika endemic areas
Wikimedia CC-BY-SA
Download all Open Access “Zika” from
EuropePMC in 10 seconds
(click below for movie)
Aedes aegypti, Wikimedia CC-BY-SA
Note: movies of this and other slides can be seen at https://vimeo.com/154705161
Downloaded all Open Access “Zika” from
EuropePMC in 10 seconds
Final download screen
Eyeballing 20/120 Zika papers,
click below for movie
Yellow Fever Virus
Wikimedia CC-BY-SA
Note: movie of this and other slides can be seen at https://vimeo.com/154705161
3011 virus
1939 Ae./Aedes
1212 dengue
901 mosquito/es
894 species
791 ZIKV
721 using
716 DENV
567 detection
513 aegypti
484 infection
442 RNA
428 protein
401 albopictus
360 viral
Commonest words in 120 Zika papers
Mosquito spp.
Wikimedia CC-BY-SA
Filtering local files for sequence and viruses
AMI (part of ContentMine software)
(click below for movie)
Note: movies of this and other slides can be seen at https://vimeo.com/154705161
DNA Primers in running text
…the sodium channel voltage dependent gene (Nav). Primers
used to amplify this fragment were AaNaA
5’-ACAATGTGGATCGCTTCCC-3’
and AaNaB 5’-TGGACAAAAGCAAGGCTAAG-3’(8).
The primers amplify a fragment of approximately 472…
Snippet (quotable under 2014 UK Statutory Instrument (“Hargreaves”):
~/PMC4654492/results/sequence/dnaprimer/results.xml”
W3C Annotation
[PREFIX]
[MATCH] (link to target)
[SUFFIX]
CMine structure
plugin
option
DNA double stranded fragment
Wikimedia CC-BY-SA
Commonest species in 120 Zika papers
423 Ae./Aedes aegypti
333 Ae./Aedes albopictus
63 Ae. bromeliae
58 Ae. lilii
46 Ae. hensilli
42 Glossina pallidipes
40 Plasmodium vivax
35 Ae. luteocephalus
28 Ae. vittatus
25 Ae. furcifer
22 Plasmodium falciparum
21 Drosophila melanogaster
pre=“fever (DHF), are caused by the world's most prevalent mosquito-borne virus.
37 DENV is carried by " exact="Aedes aegypti” post=" mosquito, which is strongly
affected by ecological and human drivers, but also influenced by clima" name="binomial"/>
183 Wolbachia
70 Aedes
69 Flavivirus/Flaviviridae
30 Glossina
17 Culex
Commonest genera in Zika papers
pre=”…-negative endosymbiotic bacterium, is a promising tool against diseases
transmitted by mosquitoes. " exact="Wolbachia” post=" can be found worldwide in
numerous arthropod species. More than 65% of all insect species are natu…”
Wolbachia in insect cell
Wikimedia CC-BY-SA
38 ITS
20 MHC2TA
19 COI
14 CYPJ92
5 CYP6BB2
4 CYP9J28
3 MHC
Commonest genes in 120 Zika papers
• microcephaly 400/2400 papers; 2 mins;
commonest genes:
203 MCPH1
86 MECP2
54 SOX2
49 E2F1
47 SNAP29
40 IKBKG
40 NDE1
N-terminal domain of microcephalin
Wikimedia CC-BY-SA
Systematic Reviews
Researchers and their machines need to “read”
hundreds of papers a day or even more.
Polly has 20 seconds to read this paper…
…and 10,000 more
ContentMine software can do this in a few minutes
Polly: “there were 10,000 abstracts and due
to time pressures, we split this between 6
researchers. It took about 2-3 days of work
(working only on this) to get through
~1,600 papers each. So, at a minimum this
equates to 12 days of full-time work (and
would normally be done over several weeks
under normal time pressures).”
400,000 Clinical Trials
In 10 government registries
Mapping trials => papers
http://www.trialsjournal.com/content/16/1/80
2009 => 2015. What’s
happened in last 6 years??
Search the whole scientific literature
For “2009-0100068-41”
Extracting scientific information
Mining strategy
• Discover. negotiate permissions . => bibliography
• Crawl / Scrape (download), documents AND
supplemental
• Normalize. PDF => XML
• Index: facets => Facts and snippets (“entities”)
• Interpret/analyze entities => relationships,
aggregations (“Transformative”)
• Publish
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
catalogue
getpapers
query
Daily
Crawl
EuPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are >
3,000,000 reactions/year. Added value > 1B Eur.
Facts in context
daily IUCN endangered species news
en.wikipedia.org CC By-SA
ContentMine Fact of The Day
• Fact of the day
• Endangered species in recent science
• Facts
• Bubbles
https://en.wikipedia.org/wiki/Tree_of_life CC BY-SA
“Root”
4500 papers each
with 1 tree
OCR (Tesseract)
Norma (imageanalysis)
(((((Pyramidobacter_piscolens:195,Jonquetella_anthropi:135):86,Synergistes_jonesii:301):131,Thermotoga
_maritime:357):12,(Mycobacterium_tuberculosis:223,Bifidobacterium_longum:333):158):10,((Optiutus_te
rrae:441,(((Borrelia_burgdorferi:…202):91):22):32,(Proprinogenum_modestus:124,Fusobacterium_nucleat
um:167):217):11):9);
Semantic re-usable/computable output (ca 4 secs/image)
Supertree for 924 species
Tree
Supertree created from 4300 papers
Socio-politico-legal
• TDM is one of the most complex, uncertain,
confrontational, political, areas of human
endeavour.
Copyright and Mining
• PMR-premise: You cannot do reproducible
scientific mining and avoid violating copyright.
• UK (“Hargreaves”) 2014 legislation:
– “personal” “non-commercial*” “research” “data
analytics”
– legitimizes copying (?to disk), but not publishing
*teaching, textbooks, etc. may be “commercial”
STM Publishers prevent Mining
• FUD & disinformation about legality (Elsevier)
• Monopolies on infrastructure (“API”s, CCC
Rightfind)
• Technical obstruction (Wiley Captcha,
Macmillan Readcube)
• Restrictive contracts with libraries (ALL) [1]
• Wasting my/our time (ALL)
[1] [You may not] utilize the TDM Output to enhance … subject repositories
in a way that would [… ] have the potential to substitute and/or replicate
any other existing Elsevier products, services and/or solutions.
WILEY … “new security feature… to prevent systematic download of content
“[limit of] 100 papers per day”
“essential security feature … to protect both parties (sic)”
CAPTCHA
User has to type words
ContentMine working with Libraries
• Cambridge: Library, Plant Sciences,
Epidemiology, Chemistry
• Cochrane Collaboration on Systematic Reviews
of Clinical Trials
• FutureTDM (H2020, LIBER)
• Running workshops and training
CM Future
• Hypothes.is use ContentMine results for annotation
• (with Cambridge Univ Library) extracting daily
scientific facts from open and closed literature.
• with EBI, Cochrane Collaborations, JISC, OKF, LIBER,
TGAC/JohnInnes, DNADigest.
• Running workshops, hackdays.
• Planned outreach: MEPs, EC, Slashdot, Reddit,
Kickstarter, geekdom
• http://contentmine.org (OpenLock non-profit)
ContentMine working with Libraries
• Cambridge: Library, Plant Sciences, Epidemiology,
Chemistry
• Cochrane Collaboration on Systematic Reviews of
Clinical Trials
• FutureTDM (H2020, LIBER)
• Running workshops and training
• Offers services for information extraction and
indexing for born-digital documents.
Tractable Open Repositories
• CORE
• OpenAIRE
• arXiv
• HAL
The Right to Read is the Right to Mine**PeterMurray-Rust, 2011
http://contentmine.org

More Related Content

What's hot

Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Jisc
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open scienceReme Melero
 
Open Science at the European Commission
Open Science at the European CommissionOpen Science at the European Commission
Open Science at the European CommissionCarl-Christian Buhr
 
Think Big about Data: Archaeology and the Big Data Challenge
Think Big about Data: Archaeology and the Big Data ChallengeThink Big about Data: Archaeology and the Big Data Challenge
Think Big about Data: Archaeology and the Big Data Challengeariadnenetwork
 
LEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career ResearchersLEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career ResearchersLEARN Project
 
Multimedia-2016_Brochure
Multimedia-2016_BrochureMultimedia-2016_Brochure
Multimedia-2016_BrochureGracy Jones
 
Deconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationDeconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationJisc
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeGigaScience, BGI Hong Kong
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Open Book Publishers, Rupert Gatti
Open Book Publishers, Rupert GattiOpen Book Publishers, Rupert Gatti
Open Book Publishers, Rupert GattiOAbooks
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisEDINA, University of Edinburgh
 
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mResearch Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mLEARN Project
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for librariesLEARN Project
 
EPSRC research data expectations and research software management
EPSRC research data expectations and research software managementEPSRC research data expectations and research software management
EPSRC research data expectations and research software managementHistoric Environment Scotland
 
Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileData, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileLEARN Project
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...LEARN Project
 

What's hot (20)

Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open science
 
Open Science at the European Commission
Open Science at the European CommissionOpen Science at the European Commission
Open Science at the European Commission
 
Think Big about Data: Archaeology and the Big Data Challenge
Think Big about Data: Archaeology and the Big Data ChallengeThink Big about Data: Archaeology and the Big Data Challenge
Think Big about Data: Archaeology and the Big Data Challenge
 
Open Science
Open ScienceOpen Science
Open Science
 
LEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career ResearchersLEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
LEARN Final Conference: Tutorial Group | How To Engage Early Career Researchers
 
Multimedia-2016_Brochure
Multimedia-2016_BrochureMultimedia-2016_Brochure
Multimedia-2016_Brochure
 
Deconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationDeconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communication
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Open Book Publishers, Rupert Gatti
Open Book Publishers, Rupert GattiOpen Book Publishers, Rupert Gatti
Open Book Publishers, Rupert Gatti
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data Analysis
 
Research Data MANTRA Project at Edinburgh
Research Data MANTRA Project at EdinburghResearch Data MANTRA Project at Edinburgh
Research Data MANTRA Project at Edinburgh
 
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3mResearch Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
Research Data in an Open Science World - Prof. Dr. Eva Mendez, uc3m
 
Data management: The new frontier for libraries
Data management: The new frontier for librariesData management: The new frontier for libraries
Data management: The new frontier for libraries
 
EPSRC research data expectations and research software management
EPSRC research data expectations and research software managementEPSRC research data expectations and research software management
EPSRC research data expectations and research software management
 
Data, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of ChileData, Science, Society - Claudio Gutierrez, University of Chile
Data, Science, Society - Claudio Gutierrez, University of Chile
 
Research Data Management: Policy Development
Research Data Management: Policy DevelopmentResearch Data Management: Policy Development
Research Data Management: Policy Development
 
Ppls mvm2
Ppls mvm2Ppls mvm2
Ppls mvm2
 
How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...How can we ensure research data is re-usable? The role of Publishers in Resea...
How can we ensure research data is re-usable? The role of Publishers in Resea...
 

Viewers also liked

New emerging assistive technologies - Jisc Digifest 2016
New emerging assistive technologies - Jisc Digifest 2016New emerging assistive technologies - Jisc Digifest 2016
New emerging assistive technologies - Jisc Digifest 2016Jisc
 
What is the impact of cloud computing? - Jisc Digifest 2016
What is the impact of cloud computing? - Jisc Digifest 2016What is the impact of cloud computing? - Jisc Digifest 2016
What is the impact of cloud computing? - Jisc Digifest 2016Jisc
 
Introduction to data and text mining - Jisc Digifest 2016
Introduction to data and text mining - Jisc Digifest 2016Introduction to data and text mining - Jisc Digifest 2016
Introduction to data and text mining - Jisc Digifest 2016Jisc
 
Box of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio contentBox of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio contentJisc
 
Transforming assessment and feedback with technology - Jisc Digifest 2016
Transforming assessment and feedback with technology - Jisc Digifest 2016Transforming assessment and feedback with technology - Jisc Digifest 2016
Transforming assessment and feedback with technology - Jisc Digifest 2016Jisc
 
Beacon technology in education (Pervasive Networks)
Beacon technology in education (Pervasive Networks)Beacon technology in education (Pervasive Networks)
Beacon technology in education (Pervasive Networks)Jisc
 
Figshare for institutions - Jisc Digifest 2016
Figshare for institutions - Jisc Digifest 2016Figshare for institutions - Jisc Digifest 2016
Figshare for institutions - Jisc Digifest 2016Jisc
 
Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016Jisc
 
The future of cloud computing - Jisc Digifest 2016
The future of cloud computing - Jisc Digifest 2016The future of cloud computing - Jisc Digifest 2016
The future of cloud computing - Jisc Digifest 2016Jisc
 
Benefits and efficiencies with Vscene - Jisc Digifest 2016
Benefits and efficiencies with Vscene - Jisc Digifest 2016Benefits and efficiencies with Vscene - Jisc Digifest 2016
Benefits and efficiencies with Vscene - Jisc Digifest 2016Jisc
 
Introducing the open citation experiment - Jisc Digifest 2016
Introducing the open citation experiment - Jisc Digifest 2016Introducing the open citation experiment - Jisc Digifest 2016
Introducing the open citation experiment - Jisc Digifest 2016Jisc
 
The evolution of FELTAG - Jisc Digifest 2016
The evolution of FELTAG - Jisc Digifest 2016The evolution of FELTAG - Jisc Digifest 2016
The evolution of FELTAG - Jisc Digifest 2016Jisc
 
Universities as e-textbook publishers - Jisc Digifest 2016
Universities as e-textbook publishers - Jisc Digifest 2016Universities as e-textbook publishers - Jisc Digifest 2016
Universities as e-textbook publishers - Jisc Digifest 2016Jisc
 
Introducing the IRUSdataUK pilot - Jisc Digifest 2016
Introducing the IRUSdataUK pilot - Jisc Digifest 2016Introducing the IRUSdataUK pilot - Jisc Digifest 2016
Introducing the IRUSdataUK pilot - Jisc Digifest 2016Jisc
 
Responsible metrics for research - Jisc Digifest 2016
Responsible metrics for research - Jisc Digifest 2016Responsible metrics for research - Jisc Digifest 2016
Responsible metrics for research - Jisc Digifest 2016Jisc
 
Using OA policy schema
Using OA policy schema Using OA policy schema
Using OA policy schema Jisc
 
Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...
Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...
Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...Jisc
 
Build your own university app in under an hour - Jisc Digifest 2016
Build your own university app in under an hour - Jisc Digifest 2016Build your own university app in under an hour - Jisc Digifest 2016
Build your own university app in under an hour - Jisc Digifest 2016Jisc
 
Link into your professional network - Jisc Digifest 2016
Link into your professional network - Jisc Digifest 2016Link into your professional network - Jisc Digifest 2016
Link into your professional network - Jisc Digifest 2016Jisc
 
Getting ready for learning analytics - Jisc Digifest 2016
Getting ready for learning analytics - Jisc Digifest 2016Getting ready for learning analytics - Jisc Digifest 2016
Getting ready for learning analytics - Jisc Digifest 2016Jisc
 

Viewers also liked (20)

New emerging assistive technologies - Jisc Digifest 2016
New emerging assistive technologies - Jisc Digifest 2016New emerging assistive technologies - Jisc Digifest 2016
New emerging assistive technologies - Jisc Digifest 2016
 
What is the impact of cloud computing? - Jisc Digifest 2016
What is the impact of cloud computing? - Jisc Digifest 2016What is the impact of cloud computing? - Jisc Digifest 2016
What is the impact of cloud computing? - Jisc Digifest 2016
 
Introduction to data and text mining - Jisc Digifest 2016
Introduction to data and text mining - Jisc Digifest 2016Introduction to data and text mining - Jisc Digifest 2016
Introduction to data and text mining - Jisc Digifest 2016
 
Box of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio contentBox of Broadcasts - enhance learning with TV and radio content
Box of Broadcasts - enhance learning with TV and radio content
 
Transforming assessment and feedback with technology - Jisc Digifest 2016
Transforming assessment and feedback with technology - Jisc Digifest 2016Transforming assessment and feedback with technology - Jisc Digifest 2016
Transforming assessment and feedback with technology - Jisc Digifest 2016
 
Beacon technology in education (Pervasive Networks)
Beacon technology in education (Pervasive Networks)Beacon technology in education (Pervasive Networks)
Beacon technology in education (Pervasive Networks)
 
Figshare for institutions - Jisc Digifest 2016
Figshare for institutions - Jisc Digifest 2016Figshare for institutions - Jisc Digifest 2016
Figshare for institutions - Jisc Digifest 2016
 
Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016Making sense of open scholarly communications data - Jisc Digifest 2016
Making sense of open scholarly communications data - Jisc Digifest 2016
 
The future of cloud computing - Jisc Digifest 2016
The future of cloud computing - Jisc Digifest 2016The future of cloud computing - Jisc Digifest 2016
The future of cloud computing - Jisc Digifest 2016
 
Benefits and efficiencies with Vscene - Jisc Digifest 2016
Benefits and efficiencies with Vscene - Jisc Digifest 2016Benefits and efficiencies with Vscene - Jisc Digifest 2016
Benefits and efficiencies with Vscene - Jisc Digifest 2016
 
Introducing the open citation experiment - Jisc Digifest 2016
Introducing the open citation experiment - Jisc Digifest 2016Introducing the open citation experiment - Jisc Digifest 2016
Introducing the open citation experiment - Jisc Digifest 2016
 
The evolution of FELTAG - Jisc Digifest 2016
The evolution of FELTAG - Jisc Digifest 2016The evolution of FELTAG - Jisc Digifest 2016
The evolution of FELTAG - Jisc Digifest 2016
 
Universities as e-textbook publishers - Jisc Digifest 2016
Universities as e-textbook publishers - Jisc Digifest 2016Universities as e-textbook publishers - Jisc Digifest 2016
Universities as e-textbook publishers - Jisc Digifest 2016
 
Introducing the IRUSdataUK pilot - Jisc Digifest 2016
Introducing the IRUSdataUK pilot - Jisc Digifest 2016Introducing the IRUSdataUK pilot - Jisc Digifest 2016
Introducing the IRUSdataUK pilot - Jisc Digifest 2016
 
Responsible metrics for research - Jisc Digifest 2016
Responsible metrics for research - Jisc Digifest 2016Responsible metrics for research - Jisc Digifest 2016
Responsible metrics for research - Jisc Digifest 2016
 
Using OA policy schema
Using OA policy schema Using OA policy schema
Using OA policy schema
 
Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...
Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...
Enhancing teaching and learning in FE with TV and radio content - Jisc Digife...
 
Build your own university app in under an hour - Jisc Digifest 2016
Build your own university app in under an hour - Jisc Digifest 2016Build your own university app in under an hour - Jisc Digifest 2016
Build your own university app in under an hour - Jisc Digifest 2016
 
Link into your professional network - Jisc Digifest 2016
Link into your professional network - Jisc Digifest 2016Link into your professional network - Jisc Digifest 2016
Link into your professional network - Jisc Digifest 2016
 
Getting ready for learning analytics - Jisc Digifest 2016
Getting ready for learning analytics - Jisc Digifest 2016Getting ready for learning analytics - Jisc Digifest 2016
Getting ready for learning analytics - Jisc Digifest 2016
 

Similar to Liberating facts from the scientific literature - Jisc Digifest 2016

Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and MedicineTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research dataHerbert Gruttemeier
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trialspetermurrayrust
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical TrialsTheContentMine
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Jamie Bisset
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 

Similar to Liberating facts from the scientific literature - Jisc Digifest 2016 (20)

Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research data
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 

More from Jisc

International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...Jisc
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxJisc
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxJisc
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Jisc
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...Jisc
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptxJisc
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxJisc
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxJisc
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxJisc
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJisc
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxJisc
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber EssentialsJisc
 
MarkChilds.pptx
MarkChilds.pptxMarkChilds.pptx
MarkChilds.pptxJisc
 
RStrachanOct23.pptx
RStrachanOct23.pptxRStrachanOct23.pptx
RStrachanOct23.pptxJisc
 
ISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptxISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptxJisc
 
FerrellWalker.pptx
FerrellWalker.pptxFerrellWalker.pptx
FerrellWalker.pptxJisc
 
ExpertsknightOct23.pptx
ExpertsknightOct23.pptxExpertsknightOct23.pptx
ExpertsknightOct23.pptxJisc
 

More from Jisc (20)

International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptx
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptx
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptx
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptx
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptx
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber Essentials
 
MarkChilds.pptx
MarkChilds.pptxMarkChilds.pptx
MarkChilds.pptx
 
RStrachanOct23.pptx
RStrachanOct23.pptxRStrachanOct23.pptx
RStrachanOct23.pptx
 
ISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptxISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptx
 
FerrellWalker.pptx
FerrellWalker.pptxFerrellWalker.pptx
FerrellWalker.pptx
 
ExpertsknightOct23.pptx
ExpertsknightOct23.pptxExpertsknightOct23.pptx
ExpertsknightOct23.pptx
 

Recently uploaded

MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 

Recently uploaded (20)

prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 

Liberating facts from the scientific literature - Jisc Digifest 2016

  • 1. Content Mining (TDM) Peter Murray-Rust, ContentMine.org and UniversityofCambridge JISC Digifest, Birmingham, UK, 2016-03-02 Invited and Sponsored by JISC F/OSS tools from contentmine.org Images from Wikimedia CC-BY-SA
  • 2. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org
  • 3. Overview • Open Semistructured Documents .are the most exciting underutilised knowledge resource – Scholarly literature – Theses – Clinical trials – Government and NGO publications – Product information … • Content Mining can make huge contributions. • EuropePubMedCentral(*) is the world’s best place to start. • Socio-politico-legal aspects cannot be ignored. • (*) Wellcome Trust, RCUK, FWF (Austria), Cancer Research UK, NHS UK ….
  • 4. Mining strategy • Discover. negotiate permissions . => bibliography • Crawl / Scrape (download), documents AND supplemental • Normalize. PDF => XML • Index: facets => Facts and snippets (“entities”) • Interpret/analyze entities => relationships, aggregations (“Transformative”) • Publish
  • 5. catalogue getpapers query Daily Crawl EPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  • 6. Want to know about Zika? Just Type: ZIKA!
  • 7. Semantic Fulltext • EuropePMC coherent OpenAccess • getpapers: query , download (through API). • AMI filters, checks[1], transforms facts in papers. • sequences, species, genera, genes, dictionaries [0] All operations shown run in total of <3 minutes. [1] Dictionaries and lookup. [2] Usable from home by anyone Zika endemic areas Wikimedia CC-BY-SA
  • 8. Download all Open Access “Zika” from EuropePMC in 10 seconds (click below for movie) Aedes aegypti, Wikimedia CC-BY-SA Note: movies of this and other slides can be seen at https://vimeo.com/154705161
  • 9. Downloaded all Open Access “Zika” from EuropePMC in 10 seconds Final download screen
  • 10. Eyeballing 20/120 Zika papers, click below for movie Yellow Fever Virus Wikimedia CC-BY-SA Note: movie of this and other slides can be seen at https://vimeo.com/154705161
  • 11. 3011 virus 1939 Ae./Aedes 1212 dengue 901 mosquito/es 894 species 791 ZIKV 721 using 716 DENV 567 detection 513 aegypti 484 infection 442 RNA 428 protein 401 albopictus 360 viral Commonest words in 120 Zika papers Mosquito spp. Wikimedia CC-BY-SA
  • 12. Filtering local files for sequence and viruses AMI (part of ContentMine software) (click below for movie) Note: movies of this and other slides can be seen at https://vimeo.com/154705161
  • 13. DNA Primers in running text …the sodium channel voltage dependent gene (Nav). Primers used to amplify this fragment were AaNaA 5’-ACAATGTGGATCGCTTCCC-3’ and AaNaB 5’-TGGACAAAAGCAAGGCTAAG-3’(8). The primers amplify a fragment of approximately 472… Snippet (quotable under 2014 UK Statutory Instrument (“Hargreaves”): ~/PMC4654492/results/sequence/dnaprimer/results.xml” W3C Annotation [PREFIX] [MATCH] (link to target) [SUFFIX] CMine structure plugin option DNA double stranded fragment Wikimedia CC-BY-SA
  • 14. Commonest species in 120 Zika papers 423 Ae./Aedes aegypti 333 Ae./Aedes albopictus 63 Ae. bromeliae 58 Ae. lilii 46 Ae. hensilli 42 Glossina pallidipes 40 Plasmodium vivax 35 Ae. luteocephalus 28 Ae. vittatus 25 Ae. furcifer 22 Plasmodium falciparum 21 Drosophila melanogaster pre=“fever (DHF), are caused by the world's most prevalent mosquito-borne virus. 37 DENV is carried by " exact="Aedes aegypti” post=" mosquito, which is strongly affected by ecological and human drivers, but also influenced by clima" name="binomial"/>
  • 15. 183 Wolbachia 70 Aedes 69 Flavivirus/Flaviviridae 30 Glossina 17 Culex Commonest genera in Zika papers pre=”…-negative endosymbiotic bacterium, is a promising tool against diseases transmitted by mosquitoes. " exact="Wolbachia” post=" can be found worldwide in numerous arthropod species. More than 65% of all insect species are natu…” Wolbachia in insect cell Wikimedia CC-BY-SA
  • 16. 38 ITS 20 MHC2TA 19 COI 14 CYPJ92 5 CYP6BB2 4 CYP9J28 3 MHC Commonest genes in 120 Zika papers
  • 17. • microcephaly 400/2400 papers; 2 mins; commonest genes: 203 MCPH1 86 MECP2 54 SOX2 49 E2F1 47 SNAP29 40 IKBKG 40 NDE1 N-terminal domain of microcephalin Wikimedia CC-BY-SA
  • 18. Systematic Reviews Researchers and their machines need to “read” hundreds of papers a day or even more.
  • 19. Polly has 20 seconds to read this paper… …and 10,000 more
  • 20. ContentMine software can do this in a few minutes Polly: “there were 10,000 abstracts and due to time pressures, we split this between 6 researchers. It took about 2-3 days of work (working only on this) to get through ~1,600 papers each. So, at a minimum this equates to 12 days of full-time work (and would normally be done over several weeks under normal time pressures).”
  • 21. 400,000 Clinical Trials In 10 government registries Mapping trials => papers http://www.trialsjournal.com/content/16/1/80 2009 => 2015. What’s happened in last 6 years?? Search the whole scientific literature For “2009-0100068-41”
  • 23. Mining strategy • Discover. negotiate permissions . => bibliography • Crawl / Scrape (download), documents AND supplemental • Normalize. PDF => XML • Index: facets => Facts and snippets (“entities”) • Interpret/analyze entities => relationships, aggregations (“Transformative”) • Publish
  • 25. catalogue getpapers query Daily Crawl EuPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  • 27. Open Content Mining of FACTs Machines can interpret chemical reactions We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
  • 28. Facts in context daily IUCN endangered species news en.wikipedia.org CC By-SA
  • 29. ContentMine Fact of The Day • Fact of the day • Endangered species in recent science • Facts • Bubbles
  • 33. Supertree for 924 species Tree
  • 34. Supertree created from 4300 papers
  • 35. Socio-politico-legal • TDM is one of the most complex, uncertain, confrontational, political, areas of human endeavour.
  • 36. Copyright and Mining • PMR-premise: You cannot do reproducible scientific mining and avoid violating copyright. • UK (“Hargreaves”) 2014 legislation: – “personal” “non-commercial*” “research” “data analytics” – legitimizes copying (?to disk), but not publishing *teaching, textbooks, etc. may be “commercial”
  • 37. STM Publishers prevent Mining • FUD & disinformation about legality (Elsevier) • Monopolies on infrastructure (“API”s, CCC Rightfind) • Technical obstruction (Wiley Captcha, Macmillan Readcube) • Restrictive contracts with libraries (ALL) [1] • Wasting my/our time (ALL) [1] [You may not] utilize the TDM Output to enhance … subject repositories in a way that would [… ] have the potential to substitute and/or replicate any other existing Elsevier products, services and/or solutions.
  • 38. WILEY … “new security feature… to prevent systematic download of content “[limit of] 100 papers per day” “essential security feature … to protect both parties (sic)” CAPTCHA User has to type words
  • 39. ContentMine working with Libraries • Cambridge: Library, Plant Sciences, Epidemiology, Chemistry • Cochrane Collaboration on Systematic Reviews of Clinical Trials • FutureTDM (H2020, LIBER) • Running workshops and training
  • 40. CM Future • Hypothes.is use ContentMine results for annotation • (with Cambridge Univ Library) extracting daily scientific facts from open and closed literature. • with EBI, Cochrane Collaborations, JISC, OKF, LIBER, TGAC/JohnInnes, DNADigest. • Running workshops, hackdays. • Planned outreach: MEPs, EC, Slashdot, Reddit, Kickstarter, geekdom • http://contentmine.org (OpenLock non-profit)
  • 41. ContentMine working with Libraries • Cambridge: Library, Plant Sciences, Epidemiology, Chemistry • Cochrane Collaboration on Systematic Reviews of Clinical Trials • FutureTDM (H2020, LIBER) • Running workshops and training • Offers services for information extraction and indexing for born-digital documents.
  • 42. Tractable Open Repositories • CORE • OpenAIRE • arXiv • HAL
  • 43. The Right to Read is the Right to Mine**PeterMurray-Rust, 2011 http://contentmine.org

Editor's Notes

  1. Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.