This document summarizes a workshop agenda for validating indicators for an Open Science Monitor. The workshop objectives are to validate the methodology for determining indicators on open access, open research data, and open collaboration. The methodology will be refined based on feedback to provide an evidence-based view of open science trends. While the community provides feedback, the consortium leading the project is ultimately responsible for the indicators. Unpaywall is proposed as an additional data source to help identify open access publications beyond what is currently found in Scopus and Web of Science.
2. Open Science Monitor – 19 September 2018
Agenda
1. Introduction to Open Science Monitor and objectives of the
workshop
2. Discussion about indicators on Open Access
3. Discussion about indicators on Open Research Data
4. Discussion about indicators on Open Collaboration
5. Wrap-up
3. Open Science Monitor – 19 September 2018
Objectives of Open Science Monitor
To determine the scope, nature and the impacts of Open Science in Europe
and globally across the Research Cycle in order to provide an evidence-
based view of evolution of Open Science and facilitate policy making.
In particular:
•To deliver a monitoring system (for Europe) and (global) observatory for
trends in Open Science;
•To analyze drivers, barriers and impacts through 30 case studies
•To provide structured analysis of policy-relevant trends in Open
Science.
OSM is a monitoring tool providing aggregate indicators, not a research
assessment tool.
4. Open Science Monitor – 19 September 2018
Progress so far
- Delivered draft methodology
- Published first batch of indicators, updating the previous
Open Science Monitor
- Opened up consultation on the indicators, 300 comments
received, plus one paper:
- insightful comments with proposals for new indicators;
- in depth high quality criticisms of indicators;
- criticisms because of the involvement of Elsevier;
- vague proposals for new indicator without specifying new
data sources.
5. Open Science Monitor – 19 September 2018
Objectives of the workshop
• To validate with experts and scientific community
the methodology, revised according to the feedback
received online
• To refine indicators and sources
Based on the results of the workshop, a revised
methodology will be published, together with
responses to comments, by end of September.
This updated methodology is not “definitive”.
Collaboration with the community will continue
throughout the study.
6. Open Science Monitor – 19 September 2018
The role of the community
• This project is based on collaboration and constructive
criticisms from the community. Feedback is welcome,
and proactively sought.
• However the consortium (Lisbon Council, CWTS,
ESADE) is ultimately in charge of defining and
producing data and indicators.
• The feedback received, both online and offline, is
assessed on the criterion of effectiveness in delivering
high quality data, now.
• Open data are a fundamental resource but not always
achievable now. We strive to use open data and make
data openly available.
8. Open Science Monitor – 19 September 2018
Outline of the presentation
• The openness of the OSM in receiving comments on
the methods of OSM
• On data in the OSM
• Comparing proprietary data sets in the OSM
• Need for more external, outside sources on open
access
• Inclusion of Unpaywall in the OSM
9. Open Science Monitor – 19 September 2018
Clustered issues raised in commenting
OSM
• About data:
• Proprietary data used versus open data
• Suggestions to use other data sets
• About indicators:
• The issue of the production of citation impact indicators
• Suggestions to generate other indicators (often in conjunction with
the suggested other, alternative data sets)
• About social media metrics (fka altmetrics):
• The issue of comparison of various data sets containing SMM
10. Open Science Monitor – 19 September 2018
About data
• Proprietary data used versus open data:
• Open data containing meta data necessary for the current study is
simply unavailable.
• ‘Open’ is not equal to ‘for free’, proprietary datasets are currently
unavoidable.
• Suggestions to use other datasets:
• Focus in the study should be on generic/macro-level monitoring
(country & field)
• Many alternative sets have a fragmented character.
• However, via case study like approaches we will explore what is
useful/meaningful.
11. Open Science Monitor – 19 September 2018
About indicators
• Citation Impact indicators have not been generated
• The OSM is not a research evaluation tool, it is a monitoring tool,
to describe generic trends and developments on a high level of
aggregation
• No Journal Impact Factors will be used, nor any other citation
impact indicators!
• Suggestions to use other indicators:
• We will assess the usefulness of the suggested indicators, as some
are dependent on data sets suggested to OSM (see before).
• Keep in mind, indicators might depend on data sets.
• However, via case study like approaches we will explore what is
useful/meaningful.
12. Open Science Monitor – 19 September 2018
About social media metrics
• Comparing PlumX with Altmetric.com data
• First of all, this has to be negotiated with Altmetric.com (proprietary
data)
• There are already some studies (see Zahedi & Costas, 2018)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0197326
• Keep in mind, various providers have different methods of
dealing with data, selection steps, etc.
• The usage of Mendeley data:
• Mendeley is fully operated by CWTS (as Mendeley is completey open
by ES).
• Important source of usage and visibility of publications.
• Keep in mind, what the pro’s and con’s of Mendeley actually are.
14. Open Science Monitor – 19 September 2018
How are data structured?
Meta data
Key bibliographic information Additional bibliographic information
First author, incl. initials
Journal name
Title of publication
Publication year
Volume
Page numbers
DOI
PMID
EID (Scopus)
UT (WoS)
Journal name
Subject classification
Higher-level field labels
Address information
- Country
- Institution
Field labels created by CWTS}
Address cleaning by CWTS
}
- This data is organized in relational databases
- Meta data are either WoS or Scopus (due to the multidisciplinary nature of the sets)
- Proprietary data, as maintenance is quite laborious, and thus not for free
- Currently, no Open meta data exist on such level, containing the required
information to conduct the study
15. Open Science Monitor – 19 September 2018
On top of that foundation…
Data adding information to Meta data
Gold OA
- DOAJ
- ROAD
Green OA
- CrossRef
- PubMedCentral
- PMID
- DOI
- OpenAIRE
-DOI
-Fuzzy matching
Additional source: Unpaywall
•Green (~80M full texts, DOI + fuzzy matching):
- 679 OpenAIRE repositories,
- ~2,000 additional repositories.
- PubMed Central (PMIDs, DOIs)
•Gold from DOAJ, ROAD, publisher harvesting
•Bronze and Hybrid from manual/automated
publisher harvesting and the Crossref API.
Meta data
Key bibliographic information
First author, incl. initials
Journal name
Title of publication
Publication year
Volume
Page numbers
DOI
PMID
EID (Scopus)
UT (WoS)
Additional bibliographic information
Journal name
Subject classification
Higher-level field labels
Address information
- Country
- Institution
16. Open Science Monitor – 19 September 2018
Without that foundation…
Data adding information to Meta data
Gold OA
- DOAJ
- ROAD
Green OA
- CrossRef
- PubMedCentral
- PMID
- DOI
- OpenAIRE
-DOI
-Fuzzy matching
Additional source: Unpaywall
•Green (~80M fulltexts, DOI+fuzzy matching):
- 679 OpenAIRE repositories,
- ~2,000 additional repositories.
- PubMed Central (PMIDs, DOIs)
•Gold from DOAJ, ROAD, publisher harvesting
•Bronze and Hybrid from manual/automated
publisher harvesting and the Crossref API.
Meta data
Key bibliographic information
First author, incl. initials
Journal name
Title of publication
Publication year
Volume
Page numbers
DOI
PMID
EID (Scopus)
UT (WoS)
Additional bibliographic information
Journal name
Subject classification
Higher-level field labels
Address information
- Country
- Institution
17. Open Science Monitor – 19 September 2018
What are we left with…
Data adding information to Meta data
Gold OA
- DOAJ
- ROAD
Green OA
- CrossRef
- PubMedCentral
- PMID
- DOI
- OpenAIRE
-DOI
-Fuzzy matching
Additional source: Unpaywall
•Green (~80M fulltexts, DOI+fuzzy matching):
- 679 OpenAIRE repositories,
- ~2,000 additional repositories.
- PubMed Central (PMIDs, DOIs)
•Gold from DOAJ, ROAD, publisher harvesting
•Bronze and Hybrid from manual/automated
publisher harvesting and the Crossref API.
Key bibliographic information Additional bibliographic information
Journal name
Subject classification
Higher-level field labels
Address information
- Country
- Institution
- Without meta data, we have no entrance to important entries in the data
(Countries and Fields)
- DOI-s and PMID-s alone do not allow for such analyses
- Information in OpenAIRE alone does not allow for such analyses as well
- Information in DOAJ and ROAD lists is also insufficient
18. Open Science Monitor – 19 September 2018
Conclusions
• We need meta data, for now coming from proprietary
data sets such as Scopus and WoS
• Affiliations and fields
• For OA analyses, we need additional, external data
sources, on top of existing meta data.
• Additional, external sources alone do not provide us
with the information to conduct large scale analyses
describing global trends (on country and field level).
• Unfortunately, no Open meta data exist yet !
20. Open Science Monitor – 19 September 2018
Define data sources for OA labels
Data sources should comply with two criteria:
• Sources have to be sustainable
– Data are in the public domain, without immediate and direct risk
of disappearing behind a pay-wall.
• Sources need to be legal
– Inclusion in the data source should not be based on ‘illegal acts’
by individual researchers.
21. Open Science Monitor – 19 September 2018
Sources that comply with both criteria
The DOAJ list Gold OA
The ROAD list Gold OA
CrossRef Green OA
PubMedCentral Green OA
OpenAIRE Green OA
Unpaywall Green OA & Gold OA
Unpaywall Hybrid OA
Data sources that do not comply with the second
requirement (Legality) are:
– ResearchGate
– SciHub }
Some of our critics suggested
to use these sources !
In the next run
of the data !
22. Open Science Monitor – 19 September 2018
Updating of the database and future
plans
• The first update of the database created some challenges:
– Change in the coverage of the DOAJ List of Gold Open Access journals
– Change in the way publications are indexed and disclosed in OpenAIRE
Reproducibility of the outcomes year-by-year is a challenge!
• Currently, we have agreed with Unpaywall for inclusion in
the process (due to the scale of implementation, we need to negotiate usage)
• Thereby, we can also distinguish Hybrid OA in the
database, due to inclusion of Unpaywall in the process, as
well as increasing coverage of Green and Gold OA.
23. Open Science Monitor – 19 September 2018
Comparing WoS and Scopus on
OA disclosure
24. Open Science Monitor – 19 September 2018
Comparing OA labelling methodology in
two projects using Scopus and WoS
Show outcomes for the Netherlands in two projects:
• KTD Indicators: service project for the EU
– Bibliometric scores for EU Countries, global regions, on a
number of topics, among which OA publishing.
• Open Science Monitor for the EU
– Variety of indicators on OA/OS development across the full
knowledge production cycle.
Indicator: Number of publications (P)
KTD: The number of normal articles, reviews, and letters as processed for
journals covered in the Web of Science database
OSM: The number of normal articles and reviews as processed for journals
covered in the Scopus database
25. Open Science Monitor – 19 September 2018
1 – OS Monitor/Scopus: Netherlands total
0
10000
20000
30000
40000
50000
60000
2009 2010 2011 2012 2013 2014 2015 2016
All
All OA
Gold
Green
26. Open Science Monitor – 19 September 2018
1 – OS Monitor/Scopus: Netherlands %
0%
5%
10%
15%
20%
25%
30%
35%
2009 2010 2011 2012 2013 2014 2015 2016
All OA
Gold
Green
27. Open Science Monitor – 19 September 2018
2 – KTD Indicators/WoS: Netherlands
total
0
10000
20000
30000
40000
50000
60000
2009 2010 2011 2012 2013 2014 2015 2016
All
All OA
Gold
Green
28. Open Science Monitor – 19 September 2018
2 – KTD Indicators/WoS: Netherlands %
0%
5%
10%
15%
20%
25%
30%
35%
2009 2010 2011 2012 2013 2014 2015 2016
All OA
Gold
Green
29. Open Science Monitor – 19 September 2018
Conclusions from the comparison
Some observations:
• Scopus gives higher absolute numbers for the
Netherlands
• WoS OA fluctuates, while Scopus is more stable
• Gold OA increases, Green OA decreases
• e-ISSN in Scopus is helpful in linking OA tags
for ROAD and DOAJ, as compared to WoS
• Relatively, the patterns are comparable!
30. Open Science Monitor – 19 September 2018
On coverage and document types
in Scopus
31. Open Science Monitor – 19 September 2018
Introduction
• Scopus covers global scientific literature more
broader as compared to WoS
• Expanded coverage is partially journal based,
partially other sources
• Extra sources cover conference papers, books and
chapters in books, among others.
• What does that contribute to the Open Science
Monitor?
32. Open Science Monitor – 19 September 2018
Document types in Scopus (2009-2016)
All % in Scopus OA % OA
Article 13783382 68% 3295148 24%
Conference Paper 3689594 18% 248771 7%
Chapter 744842 4% 7487 1%
Note 484117 2% 58807 12%
Editorial 471813 2% 52222 11%
Letter 392059 2% 57094 15%
Article in Press 221365 1% 28129 13%
Report 183743 1% 14280 8%
Erratum 125014 1% 25416 20%
Book 111006 1% 1372 1%
Conference Review 42353 0% 1146 3%
Business Article 10730 0% 1 0%
Abstract Report 1195 0% 15 1%
Review 10 0% 5 50%
Short Survey 5 0% 2 40%
33. Open Science Monitor – 19 September 2018
Trends in OA disclosure Scopus
document types
0%
5%
10%
15%
20%
25%
30%
35%
2009 2010 2011 2012 2013 2014 2015 2016
Article
Conference Paper
Chapter
Note
Editorial
Letter
Article in Press
Erratum
Book
Some observations
• Doc types with
fluctuating OA
shares are Art in
Press and Erratum
• Well covered are
Articles, Letters,
and Notes
• Conference papers
have low OA
disclosure
• SSH are again ill
served, as Books
and Chapters have
low OA disclosure
34. Open Science Monitor – 19 September 2018
Conclusions
• Although broader coverage, the additional
publications are hardly dealt with by external
sources.
• Therefore, only a small portion of additional
material can be included in analyses (on
developments) of OA publishing analyses.
• There is an urgent need for external sources that
deal with conference papers and book like
outputs.
35. Open Science Monitor – 19 September 2018
Unpaywall as an additional data
source in the OSM
36. Open Science Monitor – 19 September 2018
Unpaywall as an additional source of OA
labelling in the OSM
37. Open Science Monitor – 19 September 2018
Introduction
• Everything that follows is based upon a much
older version of the Unpaywall data (delivered to
CWTS in 2017)
• Currently, the Unpaywall set is much richer, and
…
• … therefore, a re-do of the next analyses based
upon the new Unpaywall data would change the
situation drastically.
38. Open Science Monitor – 19 September 2018
Comparing OA tagging methods, Scopus
(2009-2016)
- Throughout the whole period of analysis, CWTS methodology
shows an increasing unveiling of OA published output
- The combined effort shows most OA published output, which
means that CWTS methodology also misses OA output, covered by Unpaywall
0%
5%
10%
15%
20%
25%
30%
35%
40%
2009 2010 2011 2012 2013 2014 2015 2016
% OA CWTS
% OA CWTS & UPW
39. Open Science Monitor – 19 September 2018
Comparing OA tagging methods, only doi
based, Scopus (2009-2016)
- When only focusing on doi-based matching, Unpaywall adds still
to the CWTS methodology.
- In comparison with the previous slide, notice the dependency on
doi-s, as percentages now go up.
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
2009 2010 2011 2012 2013 2014 2015 2016
% OA CWTS
% OA CWTS & UPW
40. Open Science Monitor – 19 September 2018
Conclusions
• Please mind that we used with an Unpaywall set of 2017!
• Unpaywall will be an important expansion of the used
external data sources in disclosing OA publishing
• We will include Unpaywall in the OSM methodology
• Perhaps, CWTS methodology has something to offer to
Unpaywall as well (although Unpaywall looks to be
over-arching our current external data sets).
• Cooperation in research will lead to the best approach.
41. Open Science Monitor – 19 September 2018
Overall conclusions
• We will deal with the comments and critique
expressed on the OSM, in an academic way.
• The data infrastructure stresses the urge of having
meta data, unfortunately of a proprietary nature
• We have shown that using Scopus or WoS does not
alter the overall trends of OA publishing, and its
disclosure
• Unpaywall looks to be over-arching our currently
used external data sets, so switching seems logical
42. Open Science Monitor – 19 September 2018
Thank you for your attention!
Any questions?
Ask me now, or mail me
Leeuwen@cwts.nl
44. Open Science Monitor – 19 September 2018
Additional indicators
Indicator Source
Number of Funders with open access
policies (with caveat that it is skewed
towards western countries)
Sherpa Juliet
Number of Journals with open access
policies (with caveat that it is skewed
towards western countries)
Sherpa Romeo
Number of publishers/journals that have
adopted the TOP Guidelines (including
the level of adoption actual
implementation where possible)
Cos.io
45. Open Science Monitor – 19 September 2018
Open Research Data
• Several comments useful to identify new data sources to measure open
data publication.
• Several criticisms of using Elsevier to gather data through the survey, but
no valid alternatives of comparable quality and cost/efficiency were
proposed.
• Several comments pointed to the need for measuring new additional
aspects, such as “number of papers based on openly available raw data”.
However no concrete proposals were made about sources. We will
follow up with those commenting to obtain further detail.
46. Open Science Monitor – 19 September 2018
Existing indicators
Indicator Source
Number of Funders with policies on data
sharing (with caveat that it is skewed
towards western countries)
Sherpa Juliet
Number of Journals with policies on data
sharing
Vasilevsky et al, 2017
Number of open data repositories Re3data
% of paper published with data Bibliometrics: Datacite
Citations of data journals Bibliometrics: Datacite
Attitude of researchers on data sharing. Survey by Elsevier, follow-up of the
2017 report.
Other existing surveys will be also
included in the monitor.
47. Open Science Monitor – 19 September 2018
Additional indicators
Indicator Source
Number and/or total size of CC-0 datasets. Base-search.net
Number of OAI-compliant repositories. Base-search.net
Number of repositories with an open data
(https://opendefinition.org/ ) policy for
metadata.
OpenDOAR, "commercial" in metadata
reuse policy.
https://opendefinition.org/
48. Open Science Monitor – 19 September 2018
Open Collaboration
Indicator Source
Software citations in DataCite Datacite
Number of code projects in Zenodo Zenodo
Add: number of software deposits under
an OSI-approved license.
Base
Number of Software papers in Software
Journals
(e.g. JORS
https://openresearchs
oftware.metajnl.com/
and others)
N. of users in reproducibility platforms
such as CodeOcean
CodeOcean
Open Code: additional indicators
50. Open Science Monitor – 19 September 2018
Next steps
• Delivery of new methodology – end september
• Data gathering by end December 2018
• Ongoing collaboration with the community for identification of new
indicators and sources
• Updating data in 2019
• Delivery of 30 case studies and cross analysis on drivers, impacts and
policy recommendations in 2019.
• Project end: December 2019.
51. Open Science Monitor – 19 September 2018
Thanks
Contact:
For overall project opensciencemonitor@lisboncouncil.net
52. Open Science Monitor – 19 September 2018
CWTS perspective on the OSM
consortium
• Lisbon Council, Esade, & CWTS call the shots!
• Be as inclusive as possible!
• ES has to offer the ‘Survey on research data’, as well as PlumX
• ‘Open’ is not equal to ‘for free’, proprietary datasets are
currently unavoidable as open meta data are unavailable
• Choose for dialogue and cooperation around the OSM
• Common questions will be answered separately/individually
• Communication is organized centrally
• OSM developments are constantly aligned towards CWTS
policies towards external partners
53. Open Science Monitor – 19 September 2018
Comparing OA tagging methods, WoS,
2009-2016
0%
5%
10%
15%
20%
25%
30%
35%
40%
2009 2010 2011 2012 2013 2014 2015 2016
% OA CWTS
% OA UPW
% OA CWTS & UPW
54. Open Science Monitor – 19 September 2018
Comparing OA tagging methods, only doi
based, WoS, 2009-2016
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
2009 2010 2011 2012 2013 2014 2015 2016
% OA CWTS
% OA UPW
% OA CWTS & UPW
Editor's Notes
Introduce the three databases, shortly describe the OA functionality in them.
Then do explain the issues related to each system, particularly GS (as that is seen as the solver of all problems !!!)