1. Meeting the Research Data Challenge
12th October 2011, 12.00 – 13.00
#jiscmrd #jiscwebinar
10/11/2011 JISC Webinar slide 1
2. Meeting the Research Data Challenge
Sarah Porter
10/11/2011 JISC Webinar: slide 2
3. Presentation outline
The role of JISC in research support
The data challenge
Drivers for good management of research data
10/11/2011 Wellcome Collection Conference Centre slide 3
4. Why JISC? JISC’s role in research support
National infrastructure services for research such as the JANET network,
data centres to host published resources, repository infrastructure
– The provision complements that of other stakeholders
– Research funders - Research Councils, Funding Councils
– Data hosts e.g. National Data Centres
JISC supports universities and colleges
to make effective and efficient use of technology – in research and in the
management of research
Key themes:
Increasing the impact and visibility of research
Increasing research competitiveness
Management of research information
Collaboration with business and the community
Improved management of research data.
10/11/2011 slide 4
5. The challenge of data
– Volume
– Diversity (what is data, anyway?)
– ‘Long tail’
– Drivers for data management not well understood – complex picture due
to range of funders’ policies, other policies at multiple levels (European,
UK, each research council, each institution)
– Good management practice not yet well understood
• so not embedded into research practice
– Institutional roles and responsibilities may be unclear
– Responsibility for meeting costs not yet established.
10/11/2011 slide 5
6. Drivers to improve research data management
Considerations for research integrity
Research Funder Policies
Freedom of Information / Environmental Information
Regulations
Benefits of data reuse and improved research data
management (including Research Excellence Framework)
10/11/2011 slide 6
7. Drivers: Research Integrity
UK Research Integrity Office Code of Practice for Research: Promoting
good practice and preventing misconduct, September 2009
Data management planning is an essential part of research design
[3.4.1.c; also 3.12.6]
Section 3.12 covers collection AND RETENTION of research data.
Organisations and researchers should ensure that research data relating
to publications is available for discussion with other researchers, subject to
any existing agreements on confidentiality. [3.12.1]
Organisations should have in place procedures, resources (including
physical space) and administrative support to assist researchers in the
accurate and efficient collection of data and its storage in a secure and
accessible form. [3.12.5]
Due regard to privacy, confidentiality and ethical issues.
Research integrity requires addressing these issues in order to make
data as ‘shareable’ as possible.
10/11/2011 slide 7
8. Drivers: Funders’ Policies
Research funders’ policies form an important part of the research data ecology.
In common with international developments, requirements are becoming
increasingly exacting.
Many policy statements reference the OECD Principles and Guidelines for Access
to Research Data from Public Funding:
http://www.oecd.org/dataoecd/9/61/38500813.pdf
NSF recently added the requirement of a data management plan to grant
proposals: http://www.arl.org/rtl/eresearch/escien/nsf/index.shtml
Health Research Funders’ ‘Joint Statement of Purpose: Sharing research data to
improve public health’: http://www.wellcome.ac.uk/About-us/Policy/Spotlight-
issues/Data-sharing/Public-health-and-epidemiology/WTDV030690.htm
– making research data sets available to investigators beyond the original research
team in a timely and responsible manner, subject to appropriate safeguards, will
generate three key benefits:
• faster progress in improving health;
• better value for money;
• higher quality science.
10/11/2011 slide 8
9. Joint RCUK Policy
RCUK Common Principles on Data Policy:
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
1. Publicly funded research data are a public good, produced in the public
interest and therefore should be made as openly available as possible;
2. Institutional and project specific data management policies and plans should
be in accordance with relevant standards and community best practice;
3. Sufficient metadata should be recorded and made openly available to enable
other researchers to understand the research and re-use potential of the
data;
4. Legal, ethical and commercial constraints on release of research data must
be recognised;
5. Recognition and ‘reward’ for managing and sharing research data are
essential, and so limited embargo periods on the release of data are
acceptable;
6. All users of research data should acknowledge the sources of their data and
abide by the terms and conditions under which they are accessed;
7. It is appropriate to use public funds to support the management and sharing
of publicly-funded research data, but this should be done in as cost-effective
and efficient way as possible.
Infrastructure implications to be inferred rather than directly stated?
10/11/2011 slide 9
10. Drivers: Funders’ Policies
New MRC policies on research data management and sharing being
prepared; tested and refined; guidance produced as part of a JISC funded
project.
BBSRC Statement, April 2007, updated June 2010:
http://www.bbsrc.ac.uk/web/FILES/Policies/data-sharing-policy.pdf
– Requires statement on data sharing.
New ESRC policy now in vigour: http://www.esrc.ac.uk/about-
esrc/information/data-policy.aspx
– Introduces the requirement of a data management and sharing statement (J-eS)
and a data management and sharing plan as part of the grant submission
10/11/2011 slide 10
11. Drivers: Funders’ Policies (EPSRC)
Responsibility:
EPSRC has a Policy Framework stating expectations concerning the
Management of and Access to EPSRC-funded Research Data. Places
responsibility with institutions, departments and centres in receipt of
EPSRC funding to show they can manage and preserve data to
adequate standards.
Appropriate division of costs:
EPSRC believes that where research has been publicly-funded it is reasonable and
appropriate to use public funds to also fund the associated data management
costs. EPSRC therefore expects research organisations to make appropriate
provision from within public research funding received, making use of both direct
and indirect funding streams as appropriate.
http://www.epsrc.ac.uk/about/standards/researchdata/Pages/responsibility.aspx
10/11/2011 slide 11
12. Drivers: Freedom of Information and
Environmental Information requests
Research data can be subject to Freedom of Information /
Environmental Information requests: UEA and Queen’s
University Belfast cases.
Guidance available at JISC Q&A on ‘Freedom of Information
and Research Data’:
http://www.jisc.ac.uk/publications/programmerelated/2010/foir
esearchdata.aspx
Indicative research on numbers of FoI requests for research
data: sample of 21 Universities, received total of 40 FoI
requests for research data from 2007-10.
–Wide variance in distribution 12 universities received 0; 1
received 8; another 9.
–All but six were from 2009 and 2010;
• Indicates a growing trend.
10/11/2011 slide 12
13. Driver: preparation for Research Excellence
Framework submissions in 2013
Good data management practice improves and reduces the burden of data
collection for institutions
– Need to embed practices into key roles – researchers, research
managers, administrators.
Demonstrate the contribution that research makes to the economy and
society (impact)
Opening up data provides one level of increased opportunity for ‘citizen
science’, etc.
Can be aided by research information management systems
JISC has funded universities to demonstrate the benefits of using the
Common European Research Information Format (CERIF) to manage
research information
– The cost of use is more than offset by efficiency savings.
Research management ‘Shared Service’ being developed for April 2012.
10/11/2011 slide 13
14. Wednesday 12 October 2011
JISC Webinar
Meeting the Research Data Challenge
Simon Hodson
Programme Manager, Managing Research Data, Digital Infrastructure Team
15. Responding to the drivers?
How can universities respond to these drivers?
What is JISC doing to help?
16. Supporting the Research Data Lifecycle
Store
Plan
Annotate
Reuse Create
Access
Discover Use
Describe Select
Publish Appraise
Identify Hand Over? Discard
17. Supporting the Research Data Lifecycle
Guidance and Store
Policy
Plan
Development
Annotate
Reuse Create
Publication and
Training and
Citation
Information
Mechanisms
Access
Discover Use
Describe RDM Systems Support for Data Select
andPublish Appraise
Management
Infrastructure Planning
Identify Hand Over? Discard
18. Wednesday 12 September 2011
JISC Webinar
Meeting the Research Data Challenge
Advice and guidance.
Training materials.
Data management planning.
Research data management systems and infrastructure.
Making the case: recognition, rewards, benefits.
19. DCC’s Data Management Roadshows
Regional Data Management
Roadshows.
http://www.dcc.ac.uk/events/data-
management-roadshows
Next: Cambridge, 9-11 November
http://www.dcc.ac.uk/events/data-
management-roadshows/dcc-
roadshow-cambridge
Then: Cardiff, 14-16 December
Blog on Oxford Roadshow:
http://www.dcc.ac.uk/news/review-
dcc-roadshow-oxford
21. Institutional Research Data Management Policies
University of Edinburgh Research Data Management Policy:
http://www.ed.ac.uk/schools-departments/information-
services/about/policies-and-regulations/research-data-policy
University of Oxford Commitment to Research Data
Management:
http://www.ict.ox.ac.uk/odit/projects/datamanagement/
University of Hertfordshire: http://research-data-
toolkit.herts.ac.uk/?p=11
See DCC on institutional data management policies:
http://www.dcc.ac.uk/resources/policy-and-legal/institutional-
data-policies
22. Guidance Materials (JISCMRD Programme)
Sudamih Project:
http://sudamih.oucs.ox.ac.
uk/
Oxford Research Data
Management Pages
(EIDCSR Project):
http://www.admin.ox.ac.uk
/rdm/
Training Materials for
Humanities Scholars –
delivered as part of central
Humanities Division IT
training courses:
http://sudamih.oucs.ox.ac.
uk/documents.xml
23. Guidance Materials (JISCMRD Programme)
Incremental Project, collaboration between Glasgow
and Cambridge, concentrated on providing guidance
and training materials at an institutional level; focus
on arts and humanities, social sciences,
archaeology, social anthropology:
http://www.lib.cam.ac.uk/preservation/incremental/in
dex.html
Cambridge Website: www.lib.cam.ac.uk/dataman/
Glasgow Website: www.gla.ac.uk/datamanagement/
Workshops and Seminars:
http://www.lib.cam.ac.uk/preservation/incremental/se
minars.html
– Series at CRASSH covering: ethics, FoI, IPR, new
technologies.
– Series at Glasgow covering: performing arts and
archaeology.
Interviews from Seminars:
– http://www.lib.cam.ac.uk/dataman/training.html#Intervie
ws
– http://www.gla.ac.uk/services/datamanagement/training
/videos/
Incremental Project Blog:
http://incrementalproject.wordpress.com/
24. DCC How-To Guides
DCC How-To Guides:
http://www.dcc.ac.uk/resou
rces/how-guides
– Appraise and select
research data for curation
– How to license research
data
– How to develop a data
management and sharing
plan
Further Guides in
preparation.
25. JISCMRD Training Projects
Need for subject focussed research data management / curation training, integrated with
PG studies
Five projects to design and pilot (reusable) discipline-focussed training units for
postgraduate courses: http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmtrain.aspx
Health studies:
http://www.northumbria.ac.uk/sd/academic/ceis/re/isrc/themes/rmarea/datum/
Creative arts: http://www.projectcairo.org/
Archaeology, social anthropology: http://www.lib.cam.ac.uk/preservation/datatrain/
Psychological sciences: http://www.dmtpsych.york.ac.uk/
Social sciences, geographical sciences, clinical psychology: http://bit.ly/RDMantra
DaMSSI Support Project: http://www.rin.ac.uk/our-work/researcher-development-and-
skills/data-management-and-information-literacy
27. ERIM Project
Data Management Planning for engineering and manufacturing research, IdMRC
and UKOLN, Bath: http://www.ukoln.ac.uk/projects/erim/
Data very heterogeneous: data type, conditions of use etc.
Review of the State of the Art of the Digital Curation of Research Data.
Report on Understanding and Characterizing Engineering Research Data for its
Better Management: included detailed Research Activity Information
Development modeling.
Draft IdMRC Projects Data Management Plan; Requirements for a RAID
associative tool.
Principle: interventions should result in ‘a zero net resource requirement
increase’; i.e. data management needs to be supported by appropriate tools, or
balanced by immediate benefits. Role of data manager in research centres
needs to be examined closely.
28. DMP-ESRC Project
Led by UK Data Archive: http://www.data-archive.ac.uk/create-
manage/projects/jisc-dmp
Study of data management practices in ESRC funded Centres and Programmes.
Data Management Recommendations for Research Centres and Programmes:
http://www.data-
archive.ac.uk/media/257765/ukdadatamanagementrecommendations_centrespro
grammes.pdf
– Clear roles and responsibilities; RDM coordinator; Data Inventory; Data Management
Resources Library.
– Recommendations and guidelines on anonymisation, security and backup etc.
Data Management Costing Tool: http://www.data-
archive.ac.uk/media/257647/ukda_jiscdmcosting.pdf
29. RDM Platforms and Infrastructure
FISHnet Project, freshwater
biology:
http://www.fishnetonline.org/
MaDAM Project, biomedical
research in an institutional
context:
http://www.merc.ac.uk/?q=MaD
AM
30. JISC UMF Shared Services and Cloud Programme
Strand A: Shared IT Infrastructure:
http://www.jisc.ac.uk/whatwedo/programmes/umf.aspx
JANET(UK) brokerage to create trusted cloud(s) for HE.
Pilot Cloud provided by Eduserv.
Augment the role of DCC (in part to deploy tools in the cloud).
‘Killer RDM Apps’ developed to be deployed as Software as a
Service.
31. RDM SaaS Applications
VIDaaS (Virtual Infrastructure for Database as a Service),
University of Oxford: http://vidaas.oucs.ox.ac.uk/
DataFlow, University of Oxford: http://www.dataflow.ox.ac.uk/
Smart Research Framework, University of Southampton:
http://www.mylabnotebook.ac.uk/
Biomedical Research Infrastructure (BRISSkit), University of
Leicester
32. Financial Savings
OXREP case study:
Estimated research savings
during 2010 = 21%
Estimated data hosting savings during
2010 = 37%
(just central VI, not cloud hosted)
Comparison of DaaS hosting costs:
Single physical server running 30 2GB database instances = £125
Oxford VM running on local VI with 100 2GB instances = £79
Oxford VM running on local VI with 100 8GB instances = £109
Eduserv VM running on VI with 500 8GB instances = £76-98
Amazon VM with 8GB instances = £660-744
33. Making the Case:
recognition, rewards, benefits
Data Citation
– DCC how to guide on data citation (in preparation)
– DCC Briefing Paper on Data Citation and Linking:
http://www.dcc.ac.uk/resources/briefing-papers/introduction-
curation/data-citation-and-linking
– BL is a founding member of DataCite
– Currently have DataCite user group; will be extending this and
working with JISCMRD Projects
34. Dryad: a repository for supporting research data
Joint declarations, Feb 2010, in American Naturalist, Evolution, the Journal of
Evolutionary Biology, Molecular Ecology, Heredity, and other key journals in
evolution and ecology:
http://www.journals.uchicago.edu/doi/full/10.1086/650340
This journal requires, as a condition for publication, that data supporting
the results in the paper should be archived in an appropriate public
archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network
for Biocomplexity.
Allows embargos of up to one year; allows exceptions for, e.g., sensitive
information such as human subject data or the location of endangered species.
Data that have an established standard repository, such as DNA sequences,
should continue to be archived in the appropriate repository, such as GenBank.
For more idiosyncratic data, the data can be placed in a more flexible digital
data library such as the National Science Foundation-sponsored Dryad archive
at http://datadryad.org.'
35. Dryad-UK: a repository for supporting research data
Dryad-UK
Expand the number of journals: BMJ Open, titles from PLoS and BioMed Central:
Prepare a business model for long term funding of the data repository: supported by
payments from journals, in turn recouped from subscription or author-pays OA fees.
Benefits?
Benefits for researchers: indications that publishing
data increases citation rates
– Piwowar HA, Day RS, Fridsma DB (2007) Sharing
Detailed Research Data Is Associated with Increased
Citation Rate. PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308 (cancer microarray
clinical trial publications).
– Piwowar ongoing work e.g.
http://researchremix.wordpress.com/2011/02/18/early_re
sults/ (citation, reuse of data from Gene Expression
Omnibus).
36. Incentives and Benefits
Research Data Management Forum, 2-3 November,
University of Warwick: http://www.dcc.ac.uk/events/research-
data-management-forum/rdmf7-incentivising-data-
management-sharing
Making the Case for RDM, DCC Briefing Paper:
http://www.dcc.ac.uk/resources/briefing-papers/making-case-
rdm
Report on the Benefits from the Infrastructure Projects in the
JISC Managing Research Data Programme:
http://www.jisc.ac.uk/whatwedo/programmes/mrd/outputs/ben
efitsreport.aspx
37. JISC Managing Research Data Programme
JISC Managing Research Data Programme, Outputs:
http://www.jisc.ac.uk/whatwedo/programmes/mrd/outputs.aspx
Second JISC Managing Research Data Programme, Google
Map of funded projects:
http://maps.google.co.uk/maps/ms?msid=2104934568561360
57364.0004ab687f5a25636a285&msa=0
Call for Proposals on research data publications/citation and
on training planned for the New Year.