SlideShare a Scribd company logo
1 of 65
Defrosting the Digital Library A survey of bibliographic tools for the next generation Web Duncan Hull Faculty of Life Sciences (1992-6) BSc.  Computer Science (2002-2007) MSc, PhD.  Chemistry (2008-date) Postdoc
It’s all Casey’s fault! Dr. Casey Bergman, Lecturer  Faculty of Life Sciences I  s  Citeulike.org! http://ukpmc.ac.uk/
[object Object]
Defrosting the Digital Library (in one slide) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Metawhat? getMetadata getData ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Journal: PLoS Computational Biology Tell me more? What is it about? Where did it  come from?
Metadata in: Chemistry (Science of Matter) Biology (Science of Life) Informatics (Science of  Information) Cheminformatics Biochemistry Bioinformatics Science! www.mib.ac.uk nactem.ac.uk/refine www.citeulike.org
R epresenting  E vidence  F or  I nteracting  N etwork  E lements www.sbml.org  from  www.biomodels.net  database at the  EBI.ac.uk
Example from Glycolysis in Yeast reactant reactant product product modifier This is just one reaction, there are at least another 1700+ in Yeast
Synonyms from Pedro Mendes  B-Net Database http://www.comp-sys-bio.org/yeastnet/   Robison ester, D-Glucose 6-phosphate Glucose-6-phosphate 5'-adenylphosphoric acid; Adenosine 5'-diphosphate;  H3adp ADP Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C Hexokinase Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp ATP dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose;  grape sugar; Traubenzucker D-Glucose Synonyms Name
Chemistry Biology Informatics Cheminformatics Biochemistry Bioinformatics
For more info. www.nactem.ac.uk/refine   One of the biggest challenges is getting hold of accurate metadata from libraries and databases
But first… ,[object Object],[object Object],[object Object],[object Object]
[object Object],getMetadata getData 6 million+ “units” sold worldwide to date: america, europe, middle east, africa, australasia Lots of data, metadata and money! Owner’s handbook Tell me more? What is it about?
Final solution: Web XSLT Print
Summary: Lessons from Ford ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DATA METADATA
 
BBC Spooks? ,[object Object],[object Object],Keeping an eye on people around the world since 1939  Winston Churchill “ B ig  B ritish  C astle” (BBC)
I  hate powerpoint Radio MS Word TV
How do they stay in business? Broadcasting House, London Foreign governments, e.g. U.S.A. etc
Word:  Not  the best way to manage data and metadata
Getting Rid of Word database XML schema Web &  Intranet Printed documents XSLT
A solution that worked! getMetadata getData Who is Thabo Mbeki? These documents are all about  Thabo Mbeki Thabo Mbeki
Summary: Lessons from the BBC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How have libraries managed metadata? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Image via  http://en.wikipedia.org/wiki/Library_of_Alexandria
From  ~1824  until ~1989 Photos via dpicker  http://www.flickr.com/photos/dpicker/3107856991/  and pit yacker  http://www.flickr.com/photos/78825653@N00/131611136   JRULM (Main Library) Joule  Library Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
[object Object],Data Tightly bound (literally) Rarely separated First published 1687, over 300 years old
Data and metadata was like this for centuries! ,[object Object]
+ Tim Berners-Lee 1989
Timeline: Unchanged for centuries but… 20 years  ÷   2309 years  = <1%
Everything’s Gone Digital!  www.scopus.com www.pubmed.gov http://ukpmc.ac.uk   www. isiknowledge .com scholar.google.com
Digital Utopia? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Alexander Griekspoor www.mekentosj.com
Welcome to Digital Dystopia ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Isolated publication silos Chemistry Informatics Biology impersonal, isolated, unsociable, Generally rubbish
Identity Crisis part 1: Which publication? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identity crisis part 2: Who are you?  Who, who … who, who? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Neil Smalheiser and Vetle Torvik Typo Attribution would seem to be a simple process and yet it represents a  major, unsolved problem   for information science. http://tinyurl.com/authorid
Identity crisis part 3: Mistaken Identity ,[object Object],Dr. Duncan Hull Humble Postdoc Article about Authored-by Authored-by Wrong! “ DNA mania” title http://tinyurl.com/mistakenid
Can’t get metadata (decoupled from data): PDF getMetadata getData Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Tell me more Don’t know, Try google Don’t know,  Title might be  “ defrosting…” Where did this  come from?
Can’t get metadata (decoupled from data): PDF ,[object Object],Why can't I manage  academic papers like MP3s? http: //tinyurl .com/mp3vpdf   James Howison, Carnegie Mellon University Data is tightly coupled to its metadata getMetadata getData Artist: The Who Title: Who Are You? Recorded: 1978 Album: Who Are You
Can’t get metadata (decoupled from data): PDF Peter Murray-Rust Hamburger (unstructured data) PDF is a hamburger,  and we're trying to turn it  back into a cow.   http://tinyurl.com/pdfhamburger   Cow (structured data) publishing text-mining
Can’t get metadata (decoupled from data): HTTP ,[object Object]
Can’t get metadata (decoupled from data): HTTP ,[object Object],Tim Bray, Sun Microsystems One of the Web's distinguishing features  is that there's a big gaping hole  where the metadata ought to be. http://tinyurl.com/nometadata
I’ll stop moaning now ,[object Object],[object Object],[object Object],[object Object],[object Object]
www.citeulike.org   Richard Cameron Kevin Emamy Picture from  http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy  and  http://www.citeulike.org/faq/faq.adp   The reason I wrote the site [citeulike.org] was, after recently coming back to academia,  I was slightly shocked by the quality of some of the tools available to help academics  do their job. I found it preferable to start writing proper tools for my own use than to use existing software.
Why should you care about citeulike? ,[object Object],[object Object]
All references in one place
Click Post to Citeulike
Tag it (optional)
Citeulike: Recoupling data and metadata ,[object Object]
Citegeist = Citeulike + Zeitgeist
allegedly 2,243,177 ~2,000 /day variable 674,076 2,880 /day 2 papers / min Linear growth ~500,000
Where will citeulike break? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why should you bother with citeulike? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Casey Bergman story I was importing papers on solexa and 454  genome assembly and came across the following paper: http://www. citeulike .org/user/cisevol/article/1465689   which was a real find in terms of convincing me  that light shotgun sequence data is worth analysing. I nicked this from a phd student's library in Brazil  http://www. citeulike . org/profile/GustavoLacerda Wouldn’t have found this any other way e.g (keyword searching or following citation trails)
Many  different  solutions e.g.  Papyro:  Steve  Pettifer http://utopia.cs.manchester.ac.uk/
And the rest… www.mendeley.com   www.zotero.org   www.connotea.org   www.mekentosj.com   www.hubmed.org   Re-couple metadata that has be de-coupled from data www.2collab.com   www.refworks.com   “ iTunes for PDF files”
There is still lots  more metadata How many times  has  http://pubmed.gov/19060304  been cited? Who has cited  http://pubmed.gov/19060304   ?  Give me all the references that cite this one Give me all the references cited by  http://pubmed.gov/19060304   Who the hell is Doug Kell? Steve Pettifer? Duncan Hull? What is Doug Kell’s h-index? Remember: Machines ask these questions, not just humans Notify me whenever Steve Pettifer publishes a paper Notify me whenever someone cites http://pubmed.gov/19060304   Impact factor?
Digital Identity would solve  some  of these problems Give yourself a URI,  you deserve it! Tim Berners-Lee  http://www.w3.org/People/Berners-Lee/card#i see  http://dig.csail.mit.edu/breadcrumbs/node/71
URI’s for Douglas Kell ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.myopenid.com   www.openid.net   (Also Note researcher-id from thomson)
[object Object],Phil Bourne
[object Object],Science is  public  knowledge http://tinyurl.com/publicknowledge
Conclusions: What hasn’t changed ,[object Object],[object Object],[object Object],[object Object]
Conclusions: Publication metadata matters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions: Scientists are too blasé about metadata! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],metadata
Conclusions: Do us a favour!
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Data, data, data
Data, data, dataData, data, data
Data, data, dataandrewxhill
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataJose Emilio Labra Gayo
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframeKai Li
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesespetermurrayrust
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsMark Matienzo
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Paul Bradshaw
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science dataARDC
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internetdrgath
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 

What's hot (20)

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Data, data, data
Data, data, dataData, data, data
Data, data, data
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 
Unknown Unknowns
Unknown UnknownsUnknown Unknowns
Unknown Unknowns
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science data
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 

Similar to Defrosting the Digital Library: A survey of bibliographic tools for the next generation web

Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Dorothea Salo
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...Boris Adryan
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? Dr. Haxel Consult
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machinespetermurrayrust
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxesta2310819
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time MachineGiovanni Colavizza
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsJeremy Frey
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7Scott Edmunds
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration LectureSUNY Oneonta
 

Similar to Defrosting the Digital Library: A survey of bibliographic tools for the next generation web (20)

Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptx
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart Labs
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration Lecture
 

More from Duncan Hull

Why study plants?
Why study plants?Why study plants?
Why study plants?Duncan Hull
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumDuncan Hull
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyDuncan Hull
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09Duncan Hull
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenIDDuncan Hull
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible ScientistDuncan Hull
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging DangerouslyDuncan Hull
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upDuncan Hull
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information managementDuncan Hull
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureDuncan Hull
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and Duncan Hull
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your DataDuncan Hull
 
Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Duncan Hull
 

More from Duncan Hull (20)

Why study plants?
Why study plants?Why study plants?
Why study plants?
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
How to Blog
How to BlogHow to Blog
How to Blog
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information management
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 
Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Web of Science: REST or SOAP?
Web of Science: REST or SOAP?
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Defrosting the Digital Library: A survey of bibliographic tools for the next generation web

  • 1. Defrosting the Digital Library A survey of bibliographic tools for the next generation Web Duncan Hull Faculty of Life Sciences (1992-6) BSc. Computer Science (2002-2007) MSc, PhD. Chemistry (2008-date) Postdoc
  • 2. It’s all Casey’s fault! Dr. Casey Bergman, Lecturer Faculty of Life Sciences I s Citeulike.org! http://ukpmc.ac.uk/
  • 3.
  • 4.
  • 5.
  • 6. Metadata in: Chemistry (Science of Matter) Biology (Science of Life) Informatics (Science of Information) Cheminformatics Biochemistry Bioinformatics Science! www.mib.ac.uk nactem.ac.uk/refine www.citeulike.org
  • 7. R epresenting E vidence F or I nteracting N etwork E lements www.sbml.org from www.biomodels.net database at the EBI.ac.uk
  • 8. Example from Glycolysis in Yeast reactant reactant product product modifier This is just one reaction, there are at least another 1700+ in Yeast
  • 9. Synonyms from Pedro Mendes B-Net Database http://www.comp-sys-bio.org/yeastnet/ Robison ester, D-Glucose 6-phosphate Glucose-6-phosphate 5'-adenylphosphoric acid; Adenosine 5'-diphosphate; H3adp ADP Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C Hexokinase Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp ATP dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose; grape sugar; Traubenzucker D-Glucose Synonyms Name
  • 10. Chemistry Biology Informatics Cheminformatics Biochemistry Bioinformatics
  • 11. For more info. www.nactem.ac.uk/refine One of the biggest challenges is getting hold of accurate metadata from libraries and databases
  • 12.
  • 13.
  • 14. Final solution: Web XSLT Print
  • 15.
  • 16.  
  • 17.
  • 18. I hate powerpoint Radio MS Word TV
  • 19. How do they stay in business? Broadcasting House, London Foreign governments, e.g. U.S.A. etc
  • 20. Word: Not the best way to manage data and metadata
  • 21. Getting Rid of Word database XML schema Web & Intranet Printed documents XSLT
  • 22. A solution that worked! getMetadata getData Who is Thabo Mbeki? These documents are all about Thabo Mbeki Thabo Mbeki
  • 23.
  • 24.
  • 25. From ~1824 until ~1989 Photos via dpicker http://www.flickr.com/photos/dpicker/3107856991/ and pit yacker http://www.flickr.com/photos/78825653@N00/131611136 JRULM (Main Library) Joule Library Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
  • 26.
  • 27.
  • 29. Timeline: Unchanged for centuries but… 20 years ÷ 2309 years = <1%
  • 30. Everything’s Gone Digital! www.scopus.com www.pubmed.gov http://ukpmc.ac.uk www. isiknowledge .com scholar.google.com
  • 31.
  • 32.
  • 33. Isolated publication silos Chemistry Informatics Biology impersonal, isolated, unsociable, Generally rubbish
  • 34.
  • 35.
  • 36.
  • 37. Can’t get metadata (decoupled from data): PDF getMetadata getData Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Tell me more Don’t know, Try google Don’t know, Title might be “ defrosting…” Where did this come from?
  • 38.
  • 39. Can’t get metadata (decoupled from data): PDF Peter Murray-Rust Hamburger (unstructured data) PDF is a hamburger, and we're trying to turn it back into a cow. http://tinyurl.com/pdfhamburger Cow (structured data) publishing text-mining
  • 40.
  • 41.
  • 42.
  • 43. www.citeulike.org Richard Cameron Kevin Emamy Picture from http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy and http://www.citeulike.org/faq/faq.adp The reason I wrote the site [citeulike.org] was, after recently coming back to academia, I was slightly shocked by the quality of some of the tools available to help academics do their job. I found it preferable to start writing proper tools for my own use than to use existing software.
  • 44.
  • 45. All references in one place
  • 46. Click Post to Citeulike
  • 48.
  • 49. Citegeist = Citeulike + Zeitgeist
  • 50. allegedly 2,243,177 ~2,000 /day variable 674,076 2,880 /day 2 papers / min Linear growth ~500,000
  • 51.
  • 52.
  • 53. Casey Bergman story I was importing papers on solexa and 454 genome assembly and came across the following paper: http://www. citeulike .org/user/cisevol/article/1465689 which was a real find in terms of convincing me that light shotgun sequence data is worth analysing. I nicked this from a phd student's library in Brazil http://www. citeulike . org/profile/GustavoLacerda Wouldn’t have found this any other way e.g (keyword searching or following citation trails)
  • 54. Many different solutions e.g. Papyro: Steve Pettifer http://utopia.cs.manchester.ac.uk/
  • 55. And the rest… www.mendeley.com www.zotero.org www.connotea.org www.mekentosj.com www.hubmed.org Re-couple metadata that has be de-coupled from data www.2collab.com www.refworks.com “ iTunes for PDF files”
  • 56. There is still lots more metadata How many times has http://pubmed.gov/19060304 been cited? Who has cited http://pubmed.gov/19060304 ? Give me all the references that cite this one Give me all the references cited by http://pubmed.gov/19060304 Who the hell is Doug Kell? Steve Pettifer? Duncan Hull? What is Doug Kell’s h-index? Remember: Machines ask these questions, not just humans Notify me whenever Steve Pettifer publishes a paper Notify me whenever someone cites http://pubmed.gov/19060304 Impact factor?
  • 57. Digital Identity would solve some of these problems Give yourself a URI, you deserve it! Tim Berners-Lee http://www.w3.org/People/Berners-Lee/card#i see http://dig.csail.mit.edu/breadcrumbs/node/71
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. Conclusions: Do us a favour!
  • 65.