This document discusses integrating research data into scientific articles through the Article of the Future platform. It aims to improve online presentation, allow sharing of additional content like datasets and code, and provide valuable context by linking articles to external data repositories. The platform presents articles in an interactive three-pane format and supports additional content like 3D models, phylogenetic trees, and executable papers. Elsevier collaborates with over 10 data repositories to enable article-level and entity-level linking of related data. A new Research Data Services division explores archiving, sharing, and assessing research data to help validate and reproduce findings.
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Integrating research data in the Article of the Future
1. Integrating research data in the Article of the Future
Dr. Elena Zudilova-Seinstra, Content Innovation Manager, Journal & Content Technology, STM Journals
2. Outline
• Data and Scientific Article
• The Article of the Future platform for data integration
• Ongoing data-linking initiatives at Elsevier STM Journals
• Elsevier “Research Data Services”
• Summary
3. Research communication is changing…
• Different kinds of modern research output:
• articles, data, multimedia, code, ...
• More research output:
• Need for efficient selection of relevant information and data
• Be able to explore, build deep insight efficiently
From “print science” to “digital science”
4. … and the scientific article needs to adapt
“The Article of the Future” is a project to improve the scientific
article so that it allows researchers to optimally communicate
scientific research in all (digital) dimensions
Article format (PDF) is still very much print-based:
• “Ink on paper”
• No support for data, multimedia, computer code
• Limits validation and reproducibility
• Geared towards one style of reading: top-left to bottom-right
• Stand-alone - disconnected from relevant, related scientific information
5. Article of the Future: Approach & timeline
◦ Deeply involve researchers through interviews, workshops, forums,
surveys, etc. Over 800 people provided feedback.
◦ Focus on domain-specific enhancements – one size does not fit all
◦ Value- adding content and tools that are integrated with the article –
no “bells and whistles”
◦ Main focus is on HTML
◦ Novel article display format
◦ Continuous enhancements:
The Article of the Future is a
framework rather than an end-point
solution.
Timeline & status
◦ 2009: started with Cell Press
◦ 2011: 13 prototype articles on
articleofthefuture.com
◦ 2012: roll-out on ScienceDirect
◦ Ongoing: further enhancements
6. Article of the Future | Presentation
The three-pane format
Center pane: “Traditional” full-
text view, designed for optimal
online reading experience
Right pane: Additional content
& tools. Shown here: reference
browser
Left pane:
efficient navigation
& browsing
7. Article of the Future: Presentation, Content, Context
Three components of the Article of the Future concept:
◦ Presentation: Offering an optimal online browsing and reading experience
◦ Content: Support authors to share a wider range of research output –
discipline-specific interactive content, executable computer code,
multimedia files, and data sets.
◦ Context: Connecting the online article to trustworthy scientific resources
to present valuable additional information
in the context of the article
8. • Author-provided models
(PDB, PSE, MOL/MOL2 format)
• Fully 3D – enlarge in canvas
• Real-time user interaction
• Supports all major browsers and
mobile devices (without additional plug-ins)
• Huge files: 100s of MBs
• Display modes: “ribbon” and “balls & sticks”
• 9 participating journals
• Molecular biology, food research,
biochemistry
http://dx.doi.org/10.1016/j.jmb.2012.11.040
http://dx.doi.org/10.1016/j.str.2012.10.007
Article of the Future: Content
Interactive Viewer for 3D Molecular Models
http://www.elsevier.com/3DMolecularModels
9. Article of the Future: Content
Interactive phylogenetic tree viewer
http://www.elsevier.com/phylogenetictrees
• Explore phylogenetic trees:
zoom, search,
collapse/expand, change
layout, etc.
• Integrated into the article
• Tree data provided by the
authors
• Newick and NeXML file
formats are supported
• 12 participating journals
• Phylogenetics, genomics,
theoretical biology, etc.
• Validation tool
http://dx.doi.org/10.1016/j.ympev.2012.08.015
http://dx.doi.org/10.1016/j.ygcen.2012.07.023
10. Article of the Future: Content
Collage Executable Paper
• Collage authoring tool lets authors
capture their “numerical experiment”:
data, code, and their relationship
• Readers can re-compute results from
the paper
• Explore and study methodology by
changing parameters
• All code and data elements are
available for download
• Pilot with CS journals
http://www.elsevier.com/executablepaper
http://www.sciencedirect.com/science/article/pii/S0097849313000484
http://www.sciencedirect.com/science/article/pii/S0097849313000472
11. Connecting with Data Repositories
• Supplementary material is not always a good solution
• Many poor solutions in use: data on PCs, university websites,
personal homepages, ...
• Data repositories:
◦ Some scientists prefer independent data repositories
◦ Domain-specific coordination
◦ Centralized information “hubs”
• “Raw data should be freely accessible to researchers”
• Collaboration between Publishers and Data Repositories:
◦ Ensure long-term availability of useful content and context
◦ Coordinate submission process / deposit mechanism
12. DB linking partners of the Elsevier STM Journals
ModelDB
RunMyCode
http://www.elsevier.com/databaselinking
EMAGE
NIF
Dryad
13. Online Linking Schemes
ScienceDirect can support different linking arrangements:
◦ Article-level: associate an article with a data set
• Example: a data set that underlies the analysis in an article
• Could be author-deposited or curated
◦ Entity-level: link entities in articles to relevant data
• Examples: taxons, chemicals, proteins, ...
• Could be manual tagging: accurate, non-ambiguous, but
additional work for the author
• Could be text-mining: retrospective, automatic, but less accurate
(ambiguities)
• Could be embedded applications: enable researchers to
interactive explore data while reading the article
14. Article of the Future: Context
Data-linking via a DB banner
• Link to relevant datasets available in the
external (curated) data repository
• 10+ active banner linking schemes
• In close collaboration with data
repositories
• Links can be added retrospectively
15. Article of the Future: Context
Data-linking based on tagged entities
• For entities (concepts) mentioned
in an article – proteins, genes,
standards planets, cities, etc.
• Available for 20+ data repositories
• Unique identifiers provided by the
authors
http://www.elsevier.com/databaselinking
5
16. Article of the Future: Context
Application-based linking (e.g., with Protein Data Bank)
• Explore protein structures relevant to the article – zoom, rotate, change display settings, etc.
• 3D structure data integrated from Protein Data Bank
• Unique protein codes provided by the authors
• 60+ journals
• Biology,
biochemistry,
neuroscience, food
research, etc.
• In collaboration with
Protein Data Bank
http://dx.doi.org/10.1016/j.jmb.2010.05.030
17. Article of the Future: Context
Application-based linking with NCBI GenBank
• View and analyze sequence data of genes and genomes mentioned in articles
• Flip the strands, zoom in/out, zoom to a sequence, go to a specific position to define a
track of interest within the sequence, "drag" to another location in the sequence
• Unique NCBI accession codes provided by the author
• 50+ journals
• Genetics, toxicology,
neuroscience, etc.
• In collaboration with
NCBI
http://dx.doi.org/10.1016/j.gene.2011.03.004
18. Data articles: Genomics Data Journal
Genomics Data is an open access journal that publishes high
quality and standardized reports on all aspects of genome-scale
analysis
• Limited only to nucleic acids analysis
• Microarray and Next-Generation
Sequencing data
• All organisms
Journal info:
http://www.elsevier.com/locate/gdata
Submission:
http://ees.elsevier.com/gdata
https://basespace.illumina.com/apps/
144144/Genomics-Data
(from BaseSpace)
19. Research Data Services New Division within Elsevier
• Goals: explore role of Elsevier in helping:
◦ Archive and share research data
◦ Increase the value and use of data (with metadata)
◦ Credit and impact assessment of research data
◦ Sustainability of data repositories
• Principles:
◦ Open data – and open software
◦ Collaborative –
work with existing repositories
◦ Transparent and flexible business model
http://researchdata.elsevier.com/
20. 2013: Running pilots to explore data preservation
Data preservation pilot with Carnegie Mellon:
• Tablet app replacing paper lab notebook
• Record all aspects during experiment
Data preservation pilot with Columbia/NASA:
• Enriching and storing NASA’s data on lunar sample (moon rocks)
• Develop process to train data curators: what skills are needed?
Data integration pilot with Duke:
• Scale up image repository including disclosure/annotation services
• Build integrated solution to visualize and share medical imaging data
The 2013 International
Data Rescue Award in the Geosciences
Organised by IEDA and
Elsevier Research Data Services
21. Summary
• Integration of Data and Articles brings value to researchers
• The Article of the Future provides a new platform for improved online
presentation, rich content, and valuable context from data repositories
(and other resources)
• Applications provide tools to directly integrate data and articles
• Elsevier is working together with a great number of data repositories to
establish article/entity-based linking and building applications together
• “Research Data Services” is a new research group exploring how
Elsevier can help researchers share and annotate data
Thank you!
e.zudilova-seinstra@elsevier.com
Editor's Notes
So the key message is that it’s all about adaptation. The way that research is performed has changed considerably over the last 20 or 100 years, moving from a print-based endeavour to an electronic endeavour. That means that researchers who want to disseminate their work have different needs, since it is no longer just about text and images – but also raw data, computer code, multimedia files, etc.At the same time, from a reader’s perspective, there are more and more articles, and it can be a real challenge to keep up with the literature. So it is ever more crucial find relevant article and developer deep insights.
At the same time, the scientific article has remained the same over the past centuries. It has moved from print to PDF which has a lot of advantages in terms of delivery and discoverability but the format is still very much the same as centuries ago – geared for one style of reading, inadequate support for electronic material and disconnected from the rest of the world. A format that does not capture the full richness of modern-day research output and does not take full advantage of modern technologies to offer an optimal user experience.
GoalsIncrease archiving and sharing of research dataIncrease the value and use of shared data (with metadata)Foster and assist with the credit and impact assessment of research data for the researcher, the institution, and the funding bodiesIncrease the sustainability of data repositoriesPrinciplesOpen data – all data remain open and availableCollaborative – with institutions, the research community, funding bodiesTransparent business model – if we make money, some goes back to fund the repositories