Science of science, scientometrics, and research policy: The need for quantitative modeling
1. Science of science, scientometrics, and research
policy: The need for quantitative modeling
Ludo Waltman (with contributions by Vincent Traag and Adrian Lai)
Centre for Science and Technology Studies (CWTS), Leiden University
Northwestern Institute on Complex Systems (NICO), Northwestern University
Evaston, IL, USA
July 9, 2019
2. Outline
• Evaluation of research based on journal impact factors
• Evaluation of applicants for personal grants
1
6. Skewness of citation distributions invalidates use of
JIF for assessing individual articles
Seglen (1997): “the most cited half of the articles (in a journal) are cited, on
average, 10 times as often as the least cited half. Assigning the same score
(the JIF) to all articles masks this tremendous difference—which is the exact
opposite of what an evaluation is meant to achieve”
Garfield (2006): “Typically, when the author’s work is examined, the impact
factors of the journals involved are substituted for the actual citation count.
Thus, the JIF is used to estimate the expected count of individual papers, which
is rather dubious considering the known skewness observed for most journals”
5
7. San Francisco Declaration on Research Assessment
(DORA)
• “Do not use journal-based metrics, such
as JIF, as a surrogate measure of the
quality of individual research articles”
• “Make available a range of article-level
metrics to encourage a shift toward
assessment based on the scientific
content of an article rather than
publication metrics of the journal in
which it was published”
• Signed by almost 1500 organizations
and 15000 individuals
6
8. Scenario 1: Citations are more accurate than JIFs
• How to select as many high-value
articles as possible?
• Selection based on citations yields
72 + 18 = 90 high-value articles
• Selection based on JIFs yields 80
high-value articles (i.e., the high-
value articles in journal A)
• Citations are more accurate than
JIFs
7
9. Scenario 2: JIFs are more accurate than citations
• How to select as many high-value
articles as possible?
• Selection based on citations yields
56 + 14 = 70 high-value articles
• Selection based on JIFs yields 80
high-value articles (i.e., the high-
value articles in journal A)
• JIFs are more accurate than
citations
8
13. Van den Besselaar and Sandström (2015)
Is grant peer review able to
recognize the most talented
researchers?
12
14. Key findings of Van den Besselaar and Sandström
(2015)
• 262 applicants for early career grants in economics, education, or
psychology, of which 49 were funded
• Performance of applicants was measured bibliometrically by their number of
highly cited publications
• The 49 funded applicants outperformed the non-funded applicants
• However, the 49 funded applicants were outperformed by the 49 best
performing non-funded applicants
• “the grant decisions have no predictive validity”
• “the common belief that peers in selection panels are good in recognizing
outstanding talents is incorrect”
13
15. A simple model
• Ln(, ): Lognormal distribution with parameters and
• q: Quality differences between applicants
• r: Inaccuracy of peer review
• b: Inaccuracy of bibliometrics
• Quality of applicant i: qi ~ Ln(0, q)
• Peer review score of applicant i: ri ~ qi × Ln(0, r)
• Bibliometric score of applicant i: bi ~ qi × Ln(0, b)
• Van den Besselaar and Sandström implicitly assume that b << r
14
16. A simple model
• Let’s assume there are 100 applicants
• The 20 applicants with the highest ri receive funding
• We want to know whether the applicants with the highest qi receive funding,
but in practice qi cannot be observed
• Applicants are therefore compared based on bi rather than qi
• The 20 funded applicants are compared with the 20 non-funded applicants
with the highest bi
15
17. Results: Geometric mean of bibliometric scores
• If peer review and bibliometrics work well (row 1), the funded applicants have
a higher bibliometric performance than the best non-funded applicants
• If peer review does not work well while bibliometrics does (row 2), the funded
applicants have a lower bibliometric performance than the best non-funded
applicants
• However, the same result is obtained if peer review works well and
bibliometrics does not (row 3)
16
Funded applicants Non-funded applicants
Best non-funded
applicants
q = 1, r = 0, b = 0 4.06 0.70 1.70
q = 1, r = 2, b = 0 1.87 0.86 2.88
q = 1, r = 0, b = 2 4.05 0.70 10.62
18. Wang, Jones, and Wang (2019)
What doesn’t kill me
makes me stronger?
17
19. Near misses outperform near winners
18
In the years after their grant application, applicants just below the funding
threshold (near misses, in orange) produce about the same number of
papers as applicants just above the funding threshold (near winners, in blue),
but their papers are more likely to be highly cited hits (Wang et al., 2019,
Figure 2)
20. Near misses can be expected to be more likely to leave
the system than near winners
19
Funding awarded
Citations
Yes
No
High
Medium
Low
Stay
Leave
Stay
Citations
High
Medium
Low
Leave
Leave
Stay
21. Conservative removal
20
In the conservative removal procedure, differences in the number of near
winners and near misses staying in the system are corrected for by removing
the least performing winners; the misses still outperform the winners (Wang
et al., 2019, Figure 3)
22. Low-risk and high-risk strategies
21
Risk
Citations
Low
High
20%
60%
20%
Medium
Low
High
Citations
40%
20%
40%
Medium
Low
High
23. Near misses outperform near winners …
22
Funding awarded
Citations
Yes Low risk
No High risk
High (20%)
Medium (60%)
Low (20%)
Stay
Leave
Stay
Citations
High (40%)
Medium (20%)
Low (40%)
Leave
Leave
Stay
80% stays,
with 1/4
having high
citation
performance
40% stays, all
having high
citation
performance
24. … even after conservative removal
23
Funding awarded
Citations
Yes Low risk
No High risk
High (20%)
Medium (20%)
Low (20%)
Stay
Leave
Stay
Citations
High (40%)
Medium (20%)
Low (40%)
Leave
Leave
Stay
40% stays,
with 1/2
having high
citation
performance
40% stays, all
having high
citation
performance
Medium (40%) Leave
26. Conclusions
• Naive interpretations of scientometric data easily lead to incorrect
conclusions, which may have a harmful effect on research policy
• Such mistakes can be avoided through formal quantitative modeling of the
mechanisms underlying the data
• This should be a core element of the science of science research agenda
25