Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02

Download to read offline

JCDL2020 full paper.

Abstract:
Scientific digital libraries speed dissemination of scientific publications, but also the propagation of invalid or unreliable knowledge. Although many papers with known validity problems are highly cited, no auditing process is currently available to determine whether a citing paper’s findings fundamentally depend on invalid or unreliable knowledge. To address this, we introduce a new framework, the keystone framework, designed to identify when and how citing unreliable findings impacts a paper, using argumentation theory and citation context analysis. Through two pilot case studies, we demonstrate how the keystone framework can be applied to knowledge maintenance tasks for digital libraries, including addressing citations of a non-reproducible paper and identifying statements most needing validation in a high-impact paper. We identify roles for librarians, database maintainers, knowledge base curators, and research software engineers in applying the framework to scientific digital libraries.

doi:10.1145/3383583.3398514
Preprint: http://jodischneider.com/pubs/jcdl2020.pdf

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Towards knowledge maintenance in scientific digital libraries with the keystone framework -jcdl2020--2020-08-02

  1. 1. Towards Knowledge Maintenance in Scientific Digital Libraries with the Keystone Framework Yuanxi Fu & Jodi Schneider School of Information Sciences University of Illinois at Urbana-Champaign Presentation for JCDL 2020, Virtual, 2020-08-02
  2. 2. Motivation: Need for KNOWLEDGE MAINTENANCE • When you’re not an expert, how do you judge papers? – Recency – Citation count – Other heuristics • Literature becomes obsolete but usually that’s not explicit. • These errors can be passed on & can lead to misinterpretations. – Most research needs information from cognate fields.
  3. 3. How big is retraction & citation to retraction • Over 600,000 articles directly cite a retracted paper. • The Retraction Watch Database lists over 19,000 retracted publications. • In biomedicine 94% of retracted papers have received at least one citation, with an average citation count of 35 (Dinh, …, Schneider 2019).
  4. 4. Motivating Questions • Are papers citing a retracted paper necessarily wrong? • Does it matter when citing authors make use of a paper whose findings are no longer considered valid? • When DOES the citation matter? • Could we selectively alert authors who cite a retracted or abandoned paper?
  5. 5. Introducing the Keystone Framework
  6. 6. Under our framework: 1) A scientific research paper puts forward at least one main finding, along with a logical argument, giving reasons and evidence to support the main finding. 2) The main finding is accepted (or not) on the basis of the logical argument. 3) Evidence from earlier literature may be incorporated into the argument by citing a paper and presenting it as support, using a citation context.
  7. 7. Main Finding Arguments support support Data Methods Citations … Applying the Framework Step 1, model the argument
  8. 8. Citing ArticleCited Article Citation Context “Many papers with known validity problems are highly cited [3].” Our paper [3] = Bar-Ilan, J. and Halevi, G. 2018. Temporal characteristics of retracted articles Applying the Framework: Step 2, find the citation contexts that contribute to the argument
  9. 9. Applying the Framework: Step 3, analyze the citation context: How many items are cited? Citation Context “Many papers with known validity problems are highly cited [3].” Singleton Cluster “Digital library applications of argumentation theory include argument-based retrieval [21, 30]”
  10. 10. Cited Article Citation Context “Many papers with known validity problems are highly cited [3].” [3] = Bar-Ilan, J. and Halevi, G. 2018. Temporal characteristics of retracted articles Applying the Framework: Step 4, analyze the cited article: What kind of support does it give? Main Findings Support
  11. 11. Applying the Keystone Framework to Knowledge Maintenance Tasks
  12. 12. Case Study 1: Citing Non-reproducible Code Assess the real impact of citing an unreliable paper
  13. 13. Case Study 2: Citations Supporting One Paper’s Argument Curate a high impact paper (citation count > 1000) to find its keystone citations (de Calignon et al., 2012)
  14. 14. Experts & Non-experts Step 4 Flag those articles that are potentially impacted. Workflow for Assessing the Citations of an Unreliable Paper Step 2 The domain expert develops a list of screening questions. Step 3 Experts/non- experts/text mining tools screen target articles using the check list. Step 1 The domain expert develops a generalized argument model. Experts
  15. 15. Workflow for Targeted Curation of Important Papers Step 1 Identify main findings of the article from abstract, conclusion section, and section titles. Step 2 Construct an argument diagram for each main finding. Experts Step 3 Align citations and citation contexts to components in the argument diagrams. Step 4 Identify and categorize keystone citation contexts and keystone citations.
  16. 16. Results from Case Study 1 Our assessment # of papers Why? What to do? Unaffected by the code glitch 6 Didn’t directly use the protocol. Cited it to either support decisions they made in their calculations or use it as background information. Nothing! Affected by the code glitch 4 Followed the protocol. Calculations supported claims that went into abstracts or the conclusion section. Authors of the 4 potentially affected papers should double-check their results and either amend their claims or document how the claims are sustained despite the code glitch.
  17. 17. Results from Case Study 2 • 51 citations in the whole paper • 5 singleton keystone citations, including • 3 main finding singleton keystone citations support the choice of experimental materials • 1 pass-through keystone citation support the choice of experimental materials • 1 singleton main finding keystone citation support the interpretation of data • One keystone citation cluster with 3 reference items supports the choice of experimental method.
  18. 18. An ad-hoc literature review from pass-through keystone citation 1 was used to support the choice of PSD-95 as a synaptic marker.
  19. 19. Research Agenda • Scale up: Large-scale identification of keystone citations – Test argument-based curation approaches that have already been automated (e.g. rhetorical-based approaches, Teufel & Kan, 2009) – Develop text mining tools to aid manual curation and screening • Understand citation behaviors, esp. pass-through citations • Develop a taxonomy of validity for indicating the confidence of a reader can have in relying on or reusing the methods and findings of a paper
  20. 20. References Bar-Ilan, J. and Halevi, G. 2018. Temporal characteristics of retracted articles. Scientometrics. 116, 3 (Jun. 2018), 1771–1783. https://doi.org/10.1007/s11192-018-2802-y Clark, T., Ciccarese, P., & Goble, C.A. (2014). Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. Journal of Biomedical Semantics, 5, 28. https://doi.org/10.1186/2041-1480-5-28 Dinh, L., Sarol, J., Cheng, Y., Hsiao, T., Parulian, N.N., & Schneider, J. (2019). Systematic Examination of Pre- and Post-Retraction Citations. Proceedings of ASIST, 56(1), 390-394 https://doi.org/10.1002/pra2.35 Green, N.L. (2017). Argumentation Mining in Scientific Discourse. CMNA@ICAIL. http://ceur-ws.org/Vol-2048/paper02.pdf Greenberg, S.A. (2009). How citation distortions create unfounded authority: analysis of a citation network. The BMJ, 339, b2680. https://doi.org/10.1136/bmj.b2680 Teufel, S., & Kan, M. (2009). Robust Argumentative Zoning for Sensemaking in Scholarly Documents. NLP4DL/AT4DL, 154-170. https://doi.org/10.1007/978-3-642-23160-5_10 de Calignon, A., Polydoro, M., Suárez-Calvet, M., William, C., Adamowicz, D. H., Kopeikina, K. J., Pitstick, R., Sahara, N., Ashe, K. H., Carlson, G. A., Spires-Jones, T. L., & Hyman, B. T. (2012). Propagation of tau pathology in a model of early Alzheimer's disease. Neuron, 73(4), 685–697. https://doi.org/10.1016/j.neuron.2011.11.033
  21. 21. Thank You!
  22. 22. Appendix A Keystone Citations Found in Case Study 2
  23. 23. Three Types of Keystone Citation Contexts Observed Properties of the keystone citation context Removing the citation context would weaken the argument supporting a main finding Only one paper is cited. The main findings of the cited paper(s) provide evidence to support the argument. Corresponding cited article Singleton, main- findings support + + + Main-finding keystone citation Cluster, main-findings support + - + Main-finding keystone citation cluster Singleton, pass- through support + + - Pass-through keystone citation
  24. 24. Three main-finding keystone citations supported the choice of experimental materials. Main-finding keystone citation to support choice of material
  25. 25. The fourth main-finding keystone citation supports the interpretation of experimental data. Main-finding keystone citation to support data interpretation
  26. 26. An ad-hoc literature review from pass-through keystone citation 1 was used to support the choice of PSD-95 as a synaptic marker. Main-finding keystone citation cluster to support material
  27. 27. A main-finding keystone citation cluster supports the choice of experimental method. Main-finding keystone citation cluster to support method
  28. 28. Appendix B The Keystone Framework
  29. 29. Concepts in the Framework • Keystone statement: any statement whose unreliability threatens the argument for a main finding of a paper. • Keystone citation context: citation contexts supporting keystone statements. – singleton vs. cluster citation context. • A singleton citation context cites one item, e.g. ‘[2]’ • A cluster citation context cites multiple items, e.g., ‘[2, 16]’ or ‘(DeKosky and Scheff, 1990; Scheff and Price, 2006; Terry et al., 1991)’.
  30. 30. Concepts in the Framework • Keystone statement: any statement whose unreliability threatens the argument for a main finding of a paper. • Keystone citation context: citation contexts supporting keystone statements. – Number of supporting items • A singleton citation context cites one item, e.g. ‘[2]’ • A cluster citation context cites multiple items, e.g., ‘[2, 16]’ or ‘(DeKosky and Scheff, 1990; Scheff and Price, 2006; Terry et al., 1991)’.
  31. 31. Concepts in the Framework • Keystone statement: any statement whose unreliability threatens the argument for a main finding of a paper. • Keystone citation context: citation contexts supporting keystone statements. – Strength of support • Main-findings support, if the citation context closely relates to a main finding of the cited item. • Pass-through support, if support can be found within the cited item but only in an unsupported statement or a statement referencing one or more other work(s). • No clear support, if the citation context does not clearly relate to the cited item, either its main findings, or other statements it makes.
  32. 32. Concepts in the Framework • Keystone Statement (KS): any statement whose unreliability threatens the argument for a main finding of a paper. Main Finding Tau pathology results in neurodegeneration Data support Method Synaptic marker support Keystone Statement XXX is a good synaptic biomarker for detecting neurodegeneration
  33. 33. Concepts in the Framework • Keystone Citation Context (KCC): citation contexts that supports keystone statements. Keystone Statement XXX is a good synaptic biomarker for detecting neurodegeneration Keystone Citation Context support
  34. 34. Keystone Citation Contexts Continued How many items are cited? Singleton Cluster Whether the cited item’s main findings support the citation context? Main-finding support Pass-through support No support
  35. 35. Distinction between KS and KCC • KSs are summation of KCCs. Domain experts can distill the same KS from several KCCs found in different papers. • One study found that several citation contexts can be distilled to a statement about β amyloid accumulation, but the cited papers mentioned the statement only as “hypothesis” or not at all. Greenberg, 2009 Statement: The accumulation of β amyloid occurs early and precedes other abnormalities. Citing Article Cited Article The appearance of Aβ- positive, noncongophilic deposits precedes vacuolization in IBM muscle fibers.8 (as fact) Some muscle fibers had Aβ-positive accumulations,… Those muscle fibers,…, may represent early changes of IBM. (as hypothesis) support
  36. 36. Appendix C Argument-based Curation
  37. 37. (Teufel & Kan, 2011) Rhetoric-based Approaches • Extract information based on rhetorical feature • Limited need for domain knowledge • Provide a coarser argumentative structure • Well-automated
  38. 38. Argument-scheme based Approaches Premises: • A group of individuals G have atypical phenotype P • All of the individuals in G have atypical genotype M • Another group of individuals (controls) do not have P • None of controls have M Conclusion: M may be the cause of P (in G) (Green, 2017) • Require deep domain analysis • Show the logic of how a discipline justifies its findings • Identify potential weakness through critical questions • Currently depend on manual curation, but potentially automatable through text mining
  39. 39. Provenance-based Method (Clark et al., 2014) • Model scientists’ work process • Most suitable for modeling empirical research articles • Require manual curation

JCDL2020 full paper. Abstract: Scientific digital libraries speed dissemination of scientific publications, but also the propagation of invalid or unreliable knowledge. Although many papers with known validity problems are highly cited, no auditing process is currently available to determine whether a citing paper’s findings fundamentally depend on invalid or unreliable knowledge. To address this, we introduce a new framework, the keystone framework, designed to identify when and how citing unreliable findings impacts a paper, using argumentation theory and citation context analysis. Through two pilot case studies, we demonstrate how the keystone framework can be applied to knowledge maintenance tasks for digital libraries, including addressing citations of a non-reproducible paper and identifying statements most needing validation in a high-impact paper. We identify roles for librarians, database maintainers, knowledge base curators, and research software engineers in applying the framework to scientific digital libraries. doi:10.1145/3383583.3398514 Preprint: http://jodischneider.com/pubs/jcdl2020.pdf

Views

Total views

352

On Slideshare

0

From embeds

0

Number of embeds

4

Actions

Downloads

1

Shares

0

Comments

0

Likes

0

×