SlideShare a Scribd company logo
1 of 33
Comparing Published Scientific
Journal Articles
to Their Pre-print Versions
Martin Klein Peter Broadwell
@mart1nkle1n @peterbroadwell
with Sharon E. Farb and Todd Grappone
@farbthink, @liber8er
{martinklein,broadwell,farb,grappone}@library.ucla.edu
University of California Los Angeles
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
2
Scientific Output in Numbers
Global STM publishing market > $25 billion
• 55% of this from USA
• 28% from Europe, Middle East
• Journals core part of scholarly communication process
• English language journal revenue: ~ $10 billion
• ~ 70% of that out of libraries’ budget
• > 28k scholarly peer-reviewed journals (+3.5% p.a.)
• ~ 2.5 million articles per year (+3% p.a.)
• 21% of research papers from USA
“STM Report: An Overview of Scientific and Scholarly Publishing”, Mark Ware and Michael Mabe, March 2015
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
3
University of California Publication Impact
“Research Performance of the UC System,” Elsevier, March 2015
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
4
Open Access by Disciplines
“Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al. 2010
http://dx.doi.org/10.1371/journal.pone.0011273
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
5
Open Access Rate Overall
2010
“Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al.
(http://dx.doi.org/10.1371/journal.pone.0011273)
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
6
Open Access Rate Overall
2010
“Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al.
(http://dx.doi.org/10.1371/journal.pone.0011273)
 20.4% OA rate
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
7
Open Access Rate Overall
2010
“Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al.
(http://dx.doi.org/10.1371/journal.pone.0011273)
 20.4% OA rate
2015
“Open Access and Sources of Full-Text Articles in Google Scholar in Different
Subject Fields”, Hammid et al.
(http://dx.doi.org/10.1007/s11192-015-1642-2)
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
8
Open Access Rate Overall
2010
“Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al.
(http://dx.doi.org/10.1371/journal.pone.0011273)
 20.4% OA rate
2015
“Open Access and Sources of Full-Text Articles in Google Scholar in Different
Subject Fields”, Hammid et al.
(http://dx.doi.org/10.1007/s11192-015-1642-2)
 61.1% OA rate
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
9
Pre-print v. Final Published
arXiv.org
• Average annual operating cost for 2013 - 2017:
$826,000
Final Published
• English language STM journals: $10 billion in 2013
http://arxiv.org/help/support/faq#3D
“STM Report: An Overview of Scientific and Scholarly Publishing”, Mark Ware and Michael Mabe, March 2015
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
10
Role of Publisher
• Entrepreneur
• Copyediting
• Tagging
• Marketer
• Distributor
• E-Host
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
11
Value of Publisher
“Once you’ve gone through the peer review process, if you look
at the article that is actually published in a journal, it looks
radically different [to the one submitted due to] that process of
transformation, the copy-editing, the database linking, the data
visualisation tools, making sure that the metadata for the article
is all right, so when people come to [Elsevier database]
ScienceDirect or type a search into Google, they can actually
find what they are looking for on their platforms.”
Gemma Hersh
http://www.thebookseller.com/news/elsevier-defends-its-value-after-open-access-disputes-328037
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
12
Working Assumptions
1. If the publishers’ argument is valid, the text of a
pre-print paper should vary significantly from its
corresponding post-print version.
1. By applying standard similarity measures, we
should be able to detect and quantify such
differences.
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
13
Assembling a pre-print corpus
Source: arXiv.org
• 1.1 million publication records
• Metadata (typical DC, including DOI) obtained
via OAI-PMH interface
• PDF versions of articles available via Amazon’s
S3 service (using “requester pays” option)
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
14
Finding a matching post-print corpus
1. Extract DOIs from arXiv metadata
• 44.5% or articles have DOI
2. CrossRef’s Metadata Search API
• Match by DOI
• Download article & metadata in XML/PDF
 Results in:
• 11,017 full text articles
• Majority published by Elsevier between 2003 and
2015
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
15
Text Comparison Methods
1. Length ratio
2. Levenshtein ratio
3. Cosine similarity
4. Jaccard coefficient
5. Sorensen similarity
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
16
Comparison of Sections
“Analyzing News Events in Non-Traditional Digital Library Collections” M.Klein, P.Broadwell, 2015
http://dx.doi.org/10.1145/2756406.2756948
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
17
Comparison of Sections
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
18
Title Comparison
Explore our findings at http://sologlo.library.ucla.edu/prepost
Papers
Similarity (1 = most similar)
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
1100020003000400050006000700080009000
0102030405060708090100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
19
Comparison of Sections
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
20
Abstract Comparison
Papers
Similarity (1 = most similar)
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
1100020003000400050006000700080009000
0102030405060708090100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Explore our findings at http://sologlo.library.ucla.edu/prepost
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
21
10.1016/j.physletb.2006.10.068
Physics Letters B
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
22
Comparison of Sections
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
23
Body Comparison
Papers
Similarity (1 = most similar)
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
110002000300040005000600070008000
0102030405060708090100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Explore our findings at http://sologlo.library.ucla.edu/prepost
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
24
Publication Dates
Papers
0100030005000
1−90
91−180
181−270
271−360
361−450
451−540
541−630
631−720
>720
Pre−print first
Final published first
Number of days
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
25
Assembling a pre-print corpus
Source: arXiv.org
• 1.1 million publication records
• metadata (typical DC, including DOI) obtained
via OAI-PMH interface
• PDF versions of articles available via Amazon’s
S3 service (using “requester pays” option)
• *Latest version used if multiple available*
• 35% of all arXiv papers have > 1 version
• 58% of our matched papers have > 1 version
• Repeat experiment with *earliest version*
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
26
Publication Dates of Earliest Versions
Papers
Number of days
01000200030004000
1−90
91−180
181−270
271−360
361−450
451−540
541−630
631−720
>720
Pre−print first
Final published first
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
27
Title Deltas
Papers
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
−1000−800−600−400−2000200
1009080706050403020100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
28
Title Deltas
Papers
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
−1000−800−600−400−2000200
1009080706050403020100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
29
Title Deltas
Papers
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
−1000−800−600−400−2000200
1009080706050403020100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
30
Abstract Deltas
Papers
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
−1500−1000−5000500
1009080706050403020100
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
31
Body Deltas
Papers
%ofallpapers
1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0
−1500−1000−50005001000
100806040200
Length
Levenshtein
Cosine
Sorensen
Jaccard
Percentage
Comparing Published Scientific Journal Articles
to Their Pre-print Versions
@mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016
32
Discussion & Future Work
• Single corpus experiment
• Pre-print/final published matches based on:
• DOIs
• CrossRef API results
• UCLA serial subscriptions (majority Elsevier
publications)
• Expand to other disciplines/publishers
• Overlay with ISI Impact factor and usage statistics
• Refine extraction/comparison of authors and
references
• Operate at scale
Comparing Published Scientific
Journal Articles
to Their Pre-print Versions
Martin Klein Peter Broadwell
@mart1nkle1n @peterbroadwell
with Sharon E. Farb and Todd Grappone
@farbthink, @liber8er
{martinklein,broadwell,farb,grappone}@library.ucla.edu
University of California Los Angeles

More Related Content

What's hot

A replication crisis in the making: how we reward unreliable science
A replication crisis in the making: how we reward unreliable scienceA replication crisis in the making: how we reward unreliable science
A replication crisis in the making: how we reward unreliable scienceBjörn Brembs
 
Bibliosight Project - JournalTOCs Workshop
Bibliosight Project - JournalTOCs WorkshopBibliosight Project - JournalTOCs Workshop
Bibliosight Project - JournalTOCs Workshopazami
 
Why canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarshipWhy canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarshipBjörn Brembs
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebGillian Byrne
 
RPI Research in Linked Open Government Systems
RPI Research in Linked Open Government SystemsRPI Research in Linked Open Government Systems
RPI Research in Linked Open Government SystemsJames Hendler
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?Thomas Meehan
 
Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011Philip Bourne
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation indexGESIS
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Shortslknight
 
Giving researchers credit for data
Giving researchers credit for dataGiving researchers credit for data
Giving researchers credit for dataJisc
 
Crossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref
 

What's hot (20)

A replication crisis in the making: how we reward unreliable science
A replication crisis in the making: how we reward unreliable scienceA replication crisis in the making: how we reward unreliable science
A replication crisis in the making: how we reward unreliable science
 
Bibliosight Project - JournalTOCs Workshop
Bibliosight Project - JournalTOCs WorkshopBibliosight Project - JournalTOCs Workshop
Bibliosight Project - JournalTOCs Workshop
 
Why canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarshipWhy canceling subscriptions may just yet save scholarship
Why canceling subscriptions may just yet save scholarship
 
ER&L KBART Update
ER&L KBART UpdateER&L KBART Update
ER&L KBART Update
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
 
RPI Research in Linked Open Government Systems
RPI Research in Linked Open Government SystemsRPI Research in Linked Open Government Systems
RPI Research in Linked Open Government Systems
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
Semantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAMESemantic Web Applications in Libraries: The Road to BIBFRAME
Semantic Web Applications in Libraries: The Road to BIBFRAME
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
MLA CE Course: Third-Party PubMed Tools
MLA CE Course: Third-Party PubMed ToolsMLA CE Course: Third-Party PubMed Tools
MLA CE Course: Third-Party PubMed Tools
 
Third-Party PubMed Tools
Third-Party PubMed ToolsThird-Party PubMed Tools
Third-Party PubMed Tools
 
Presentation1
Presentation1Presentation1
Presentation1
 
Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011Open Access NBIC Workshop April 19, 2011
Open Access NBIC Workshop April 19, 2011
 
How to build your own citation index
How to build your own citation indexHow to build your own citation index
How to build your own citation index
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Short
 
Giving researchers credit for data
Giving researchers credit for dataGiving researchers credit for data
Giving researchers credit for data
 
Bracke may4-1
Bracke may4-1Bracke may4-1
Bracke may4-1
 
Crossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latestCrossref webinar - Maintaining your metadata - latest
Crossref webinar - Maintaining your metadata - latest
 

Viewers also liked

Jason chinchilla
Jason chinchillaJason chinchilla
Jason chinchillaJason Paz
 
Companies that produce & distribute rn b genre
Companies that produce & distribute rn b genreCompanies that produce & distribute rn b genre
Companies that produce & distribute rn b genrefahrinsultana
 
Ood启思录01
Ood启思录01Ood启思录01
Ood启思录01yiditushe
 
Carol vernallis theory
Carol vernallis theoryCarol vernallis theory
Carol vernallis theoryfahrinsultana
 
Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingJessica Ogden
 

Viewers also liked (7)

Jason chinchilla
Jason chinchillaJason chinchilla
Jason chinchilla
 
Companies that produce & distribute rn b genre
Companies that produce & distribute rn b genreCompanies that produce & distribute rn b genre
Companies that produce & distribute rn b genre
 
Ood启思录01
Ood启思录01Ood启思录01
Ood启思录01
 
Carol vernallis theory
Carol vernallis theoryCarol vernallis theory
Carol vernallis theory
 
About Webtechnologies
About WebtechnologiesAbout Webtechnologies
About Webtechnologies
 
Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web Archiving
 
pi950.pdf
pi950.pdfpi950.pdf
pi950.pdf
 

Similar to Comparing Published Scientific Journal Articles to Their Pre-print Versions

Preprints: a journey though time
Preprints: a journey though timePreprints: a journey though time
Preprints: a journey though timeGraham Steel
 
Publishing and impact Wageningen University IL for PhD 20141202
Publishing and impact  Wageningen University IL for PhD 20141202Publishing and impact  Wageningen University IL for PhD 20141202
Publishing and impact Wageningen University IL for PhD 20141202Hugo Besemer
 
British Library
British LibraryBritish Library
British Libraryclarivate
 
A Science Mapping Analysis Of Blood Donation Behaviour
A Science Mapping Analysis Of Blood Donation BehaviourA Science Mapping Analysis Of Blood Donation Behaviour
A Science Mapping Analysis Of Blood Donation BehaviourBria Davis
 
Author workshop TU Delft 20111122
Author workshop TU Delft 20111122Author workshop TU Delft 20111122
Author workshop TU Delft 20111122Anke Versteeg
 
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVESSTRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVESNicolaie Constantinescu
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Publish be cited, or perish
Publish be cited, or perishPublish be cited, or perish
Publish be cited, or perishWouter Gerritsma
 
The future of scholarly publishing: where do we go from here?
The future of scholarly publishing: where do we go from here? The future of scholarly publishing: where do we go from here?
The future of scholarly publishing: where do we go from here? Research Information Network
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
Where to publish_130709
Where to publish_130709Where to publish_130709
Where to publish_130709opl10
 
The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusUniversity of Bologna
 
Publishing and impact 20141028
Publishing and impact 20141028Publishing and impact 20141028
Publishing and impact 20141028Hugo Besemer
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureBenjamin Laken
 
Holy Cross Lunch and Learn
Holy Cross Lunch and LearnHoly Cross Lunch and Learn
Holy Cross Lunch and Learnrachelmccullough
 

Similar to Comparing Published Scientific Journal Articles to Their Pre-print Versions (20)

Preprints: a journey though time
Preprints: a journey though timePreprints: a journey though time
Preprints: a journey though time
 
Publishing and impact Wageningen University IL for PhD 20141202
Publishing and impact  Wageningen University IL for PhD 20141202Publishing and impact  Wageningen University IL for PhD 20141202
Publishing and impact Wageningen University IL for PhD 20141202
 
British Library
British LibraryBritish Library
British Library
 
A Science Mapping Analysis Of Blood Donation Behaviour
A Science Mapping Analysis Of Blood Donation BehaviourA Science Mapping Analysis Of Blood Donation Behaviour
A Science Mapping Analysis Of Blood Donation Behaviour
 
Author workshop TU Delft 20111122
Author workshop TU Delft 20111122Author workshop TU Delft 20111122
Author workshop TU Delft 20111122
 
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVESSTRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
STRETCHING THE BOUNDARIES OF PUBLISHING: ALTERNATIVES
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Publish be cited, or perish
Publish be cited, or perishPublish be cited, or perish
Publish be cited, or perish
 
SciVerse @ TJU
SciVerse @ TJUSciVerse @ TJU
SciVerse @ TJU
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
The future of scholarly publishing: where do we go from here?
The future of scholarly publishing: where do we go from here? The future of scholarly publishing: where do we go from here?
The future of scholarly publishing: where do we go from here?
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Stevan Harnad - Scholarly/Scientific Impact Metrics in the Open Access Era
Stevan Harnad - Scholarly/Scientific Impact Metrics in the Open Access EraStevan Harnad - Scholarly/Scientific Impact Metrics in the Open Access Era
Stevan Harnad - Scholarly/Scientific Impact Metrics in the Open Access Era
 
Open Access Publishing: More Readers, More Impact
Open Access Publishing: More Readers, More ImpactOpen Access Publishing: More Readers, More Impact
Open Access Publishing: More Readers, More Impact
 
Where to publish_130709
Where to publish_130709Where to publish_130709
Where to publish_130709
 
The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations Corpus
 
Publishing and impact 20141028
Publishing and impact 20141028Publishing and impact 20141028
Publishing and impact 20141028
 
Eps
EpsEps
Eps
 
Science in the context of journals, Open, and the future
Science in the context of journals, Open, and the futureScience in the context of journals, Open, and the future
Science in the context of journals, Open, and the future
 
Holy Cross Lunch and Learn
Holy Cross Lunch and LearnHoly Cross Lunch and Learn
Holy Cross Lunch and Learn
 

More from Martin Klein

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly WebMartin Klein
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...Martin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 

More from Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Comparing Published Scientific Journal Articles to Their Pre-print Versions

  • 1. Comparing Published Scientific Journal Articles to Their Pre-print Versions Martin Klein Peter Broadwell @mart1nkle1n @peterbroadwell with Sharon E. Farb and Todd Grappone @farbthink, @liber8er {martinklein,broadwell,farb,grappone}@library.ucla.edu University of California Los Angeles
  • 2. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 2 Scientific Output in Numbers Global STM publishing market > $25 billion • 55% of this from USA • 28% from Europe, Middle East • Journals core part of scholarly communication process • English language journal revenue: ~ $10 billion • ~ 70% of that out of libraries’ budget • > 28k scholarly peer-reviewed journals (+3.5% p.a.) • ~ 2.5 million articles per year (+3% p.a.) • 21% of research papers from USA “STM Report: An Overview of Scientific and Scholarly Publishing”, Mark Ware and Michael Mabe, March 2015
  • 3. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 3 University of California Publication Impact “Research Performance of the UC System,” Elsevier, March 2015
  • 4. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 4 Open Access by Disciplines “Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al. 2010 http://dx.doi.org/10.1371/journal.pone.0011273
  • 5. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 5 Open Access Rate Overall 2010 “Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al. (http://dx.doi.org/10.1371/journal.pone.0011273)
  • 6. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 6 Open Access Rate Overall 2010 “Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al. (http://dx.doi.org/10.1371/journal.pone.0011273)  20.4% OA rate
  • 7. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 7 Open Access Rate Overall 2010 “Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al. (http://dx.doi.org/10.1371/journal.pone.0011273)  20.4% OA rate 2015 “Open Access and Sources of Full-Text Articles in Google Scholar in Different Subject Fields”, Hammid et al. (http://dx.doi.org/10.1007/s11192-015-1642-2)
  • 8. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 8 Open Access Rate Overall 2010 “Open Access to the Scientific Journal Literature: Situation 2009”, Björk B-C et al. (http://dx.doi.org/10.1371/journal.pone.0011273)  20.4% OA rate 2015 “Open Access and Sources of Full-Text Articles in Google Scholar in Different Subject Fields”, Hammid et al. (http://dx.doi.org/10.1007/s11192-015-1642-2)  61.1% OA rate
  • 9. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 9 Pre-print v. Final Published arXiv.org • Average annual operating cost for 2013 - 2017: $826,000 Final Published • English language STM journals: $10 billion in 2013 http://arxiv.org/help/support/faq#3D “STM Report: An Overview of Scientific and Scholarly Publishing”, Mark Ware and Michael Mabe, March 2015
  • 10. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 10 Role of Publisher • Entrepreneur • Copyediting • Tagging • Marketer • Distributor • E-Host
  • 11. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 11 Value of Publisher “Once you’ve gone through the peer review process, if you look at the article that is actually published in a journal, it looks radically different [to the one submitted due to] that process of transformation, the copy-editing, the database linking, the data visualisation tools, making sure that the metadata for the article is all right, so when people come to [Elsevier database] ScienceDirect or type a search into Google, they can actually find what they are looking for on their platforms.” Gemma Hersh http://www.thebookseller.com/news/elsevier-defends-its-value-after-open-access-disputes-328037
  • 12. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 12 Working Assumptions 1. If the publishers’ argument is valid, the text of a pre-print paper should vary significantly from its corresponding post-print version. 1. By applying standard similarity measures, we should be able to detect and quantify such differences.
  • 13. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 13 Assembling a pre-print corpus Source: arXiv.org • 1.1 million publication records • Metadata (typical DC, including DOI) obtained via OAI-PMH interface • PDF versions of articles available via Amazon’s S3 service (using “requester pays” option)
  • 14. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 14 Finding a matching post-print corpus 1. Extract DOIs from arXiv metadata • 44.5% or articles have DOI 2. CrossRef’s Metadata Search API • Match by DOI • Download article & metadata in XML/PDF  Results in: • 11,017 full text articles • Majority published by Elsevier between 2003 and 2015
  • 15. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 15 Text Comparison Methods 1. Length ratio 2. Levenshtein ratio 3. Cosine similarity 4. Jaccard coefficient 5. Sorensen similarity
  • 16. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 16 Comparison of Sections “Analyzing News Events in Non-Traditional Digital Library Collections” M.Klein, P.Broadwell, 2015 http://dx.doi.org/10.1145/2756406.2756948
  • 17. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 17 Comparison of Sections
  • 18. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 18 Title Comparison Explore our findings at http://sologlo.library.ucla.edu/prepost Papers Similarity (1 = most similar) %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 1100020003000400050006000700080009000 0102030405060708090100 Length Levenshtein Cosine Sorensen Jaccard Percentage
  • 19. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 19 Comparison of Sections
  • 20. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 20 Abstract Comparison Papers Similarity (1 = most similar) %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 1100020003000400050006000700080009000 0102030405060708090100 Length Levenshtein Cosine Sorensen Jaccard Percentage Explore our findings at http://sologlo.library.ucla.edu/prepost
  • 21. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 21 10.1016/j.physletb.2006.10.068 Physics Letters B
  • 22. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 22 Comparison of Sections
  • 23. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 23 Body Comparison Papers Similarity (1 = most similar) %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 110002000300040005000600070008000 0102030405060708090100 Length Levenshtein Cosine Sorensen Jaccard Percentage Explore our findings at http://sologlo.library.ucla.edu/prepost
  • 24. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 24 Publication Dates Papers 0100030005000 1−90 91−180 181−270 271−360 361−450 451−540 541−630 631−720 >720 Pre−print first Final published first Number of days
  • 25. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 25 Assembling a pre-print corpus Source: arXiv.org • 1.1 million publication records • metadata (typical DC, including DOI) obtained via OAI-PMH interface • PDF versions of articles available via Amazon’s S3 service (using “requester pays” option) • *Latest version used if multiple available* • 35% of all arXiv papers have > 1 version • 58% of our matched papers have > 1 version • Repeat experiment with *earliest version*
  • 26. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 26 Publication Dates of Earliest Versions Papers Number of days 01000200030004000 1−90 91−180 181−270 271−360 361−450 451−540 541−630 631−720 >720 Pre−print first Final published first
  • 27. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 27 Title Deltas Papers %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 −1000−800−600−400−2000200 1009080706050403020100 Length Levenshtein Cosine Sorensen Jaccard Percentage
  • 28. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 28 Title Deltas Papers %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 −1000−800−600−400−2000200 1009080706050403020100 Length Levenshtein Cosine Sorensen Jaccard Percentage
  • 29. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 29 Title Deltas Papers %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 −1000−800−600−400−2000200 1009080706050403020100 Length Levenshtein Cosine Sorensen Jaccard Percentage
  • 30. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 30 Abstract Deltas Papers %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 −1500−1000−5000500 1009080706050403020100 Length Levenshtein Cosine Sorensen Jaccard Percentage
  • 31. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 31 Body Deltas Papers %ofallpapers 1 ... 0.9 0.9 ... 0.8 0.8 ... 0.7 0.7 ... 0.6 0.6 ... 0.5 0.5 ... 0.4 0.4 ... 0.3 0.3 ... 0.2 0.2 ... 0.1 0.1 ... 0 −1500−1000−50005001000 100806040200 Length Levenshtein Cosine Sorensen Jaccard Percentage
  • 32. Comparing Published Scientific Journal Articles to Their Pre-print Versions @mart1nkle1n #jcdl2016, Newark, NJ, 06/21/2016 32 Discussion & Future Work • Single corpus experiment • Pre-print/final published matches based on: • DOIs • CrossRef API results • UCLA serial subscriptions (majority Elsevier publications) • Expand to other disciplines/publishers • Overlay with ISI Impact factor and usage statistics • Refine extraction/comparison of authors and references • Operate at scale
  • 33. Comparing Published Scientific Journal Articles to Their Pre-print Versions Martin Klein Peter Broadwell @mart1nkle1n @peterbroadwell with Sharon E. Farb and Todd Grappone @farbthink, @liber8er {martinklein,broadwell,farb,grappone}@library.ucla.edu University of California Los Angeles