4.18.24 Movement Legacies, Reflection, and Review.pptx
Research Data Management for the Humanities and Social Sciences
1. Research Data Management for the
Humanities and Social Sciences
Martin Donnelly, Digital Curation Centre, University of Edinburgh
University of Liverpool, 1 October 2014
Jonathan Safran Foer, Tree of Codes
(2010) – based on Bruno Schulz’s The
Street of Crocodiles (1934)
2. Overview
1. Introductions and definitions
The Digital Curation Centre
Research data management
Drivers and benefits
What do we mean by ‘data’, exactly?
2. Data in/and the Humanities and Social Sciences
How the Humanities and Social Sciences differ
Strengths and weaknesses
Possible questions for discussion
3. Resources
3. The Digital Curation Centre
The (est. 2004) is…
A UK centre of expertise in digital
preservation, with a particular focus on
research data management (RDM)
Based across three sites: Universities of
Edinburgh, Glasgow and Bath
Working with a number of UK universities
to identify gaps in RDM provision and
raise capabilities across the sector
Also involved in a variety of international
collaborations
5. What is RDM? A definition…
“the active management
and appraisal of data
over the lifecycle of
scholarly and scientific
interest”
6. What sort of activities?
- Planning and describing data-related
work before it takes place
- Documenting your data so that
others can find and understand it
- Storing it safely during the project
- Depositing it in a trusted archive
at the end of the project
- Linking publications to the
datasets that underpin them
Data management is a part of
good research practice.
- RCUK Policy and Code of Conduct on the
Governance of Good Research Conduct
7. Drivers and benefits of RDM
TRANSPARENCY: The evidence that underpins research
can be made open for anyone to scrutinise, and attempt
to replicate the findings of others.
EFFICIENCY: Data collection can be funded once, and
used many times for a variety of purposes.
RISK MANAGEMENT: A pro-active approach to data
management reduces the risk of inappropriate
disclosure of sensitive data, whether commercial or
personal.
PRESERVATION: Lots of data is unique, and can only be
captured once. If lost, it can’t be replaced.
8. Without intervention, data + time = no data
Vines et al. “examined the availability of data from 516 studies between 2 and 22 years old”
- The odds of a data set being reported as extant fell by 17% per year
- Broken e-mails and obsolete storage devices were the main obstacles to data sharing
- Policies mandating data archiving at publication are clearly needed
“The current system of leaving data with authors means that almost all of it is lost over
time, unavailable for validation of the original results or to use for entirely new purposes”
according to Timothy Vines, one of the researchers. This underscores the need for intentional
management of data from all disciplines and opened our conversation on potential roles for
librarians in this arena. (“80 Percent of Scientific Data Gone in 20 Years,” HNGN, Dec. 20,
2013, http://www.hngn.com/articles/20083/20131220/80-percent-of-scientific-data-gone-in-
20-years.htm.)
Vines et al., The Availability of Research Data Declines Rapidly with Article Age,
Current Biology (2014), http://dx.doi.org/10.1016/j.cub.2013.11.014
9. So what is ‘data’ exactly?
Definitions vary from discipline to discipline, and from funder to funder…
A science-centric definition:
“The recorded factual material commonly accepted in the scientific community as
necessary to validate research findings.” (US Office of Management and Budget,
Circular 110)
[Addendum: This policy applies to scientific collections, known in some disciplines
as institutional collections, permanent collections, archival collections, museum
collections, or voucher collections, which are assets with long-term scientific value.
(US Office of Science and Technology Policy, Memorandum, 20 March 2014)]
And another from the visual arts:
“Evidence which is used or created to generate new knowledge and
interpretations. ‘Evidence’ may be intersubjective or subjective; physical or
emotional; persistent or ephemeral; personal or public; explicit or tacit; and is
consciously or unconsciously referenced by the researcher at some point during
the course of their research.” (JISC “KAPTUR” project: see
http://kaptur.wordpress.com/2013/01/23/what-is-visual-arts-research-data-revisited/)
11. What do funders have to say?
AHRC requires that significant
electronic resources or datasets
are made available in an accessible repository for at least
three years after the end of the grant
Applicants submit a statement on data sharing
in the relevant section of the Je-S form, and
provide a two-page data management and
sharing plan addressing 9 distinct themes
Datasets must be offered to the UK Data
Archive on conclusion of the project
14. Field notes etc
That’s the thing with fieldnotes, you never show them to anyone . . . as long as it works for you . . .
it’s not like you have to show it to someone else and have them make sense of it, which is kind of a
shame because it would be nice to have data in formats where people could, you know, sort of,
archive that information. (2-16-120211)
In his study of fieldnote practices, Jean Jackson observes, “Many respondents point out that the highly personal
nature of fieldnotes influences the extent of one’s willingness to share them: ‘Fieldnotes can reveal how
worthless your work was, the lacunae, your linguistic incompetence, your not being made a blood brother, your
childish temper.’”39 Anthropologist Simon Ottenberg describes his fieldnotes similarly, “[W]hen I was younger, I
would have felt uncomfortable at the thought of someone else using my notes, whether I was alive or dead—
they are so much a private thing, so much an aspect of personal field experience, so much a private language, so
much part of my ego, my childhood [as an anthropologist], and my personal maturity.”40 This attachment and
ambivalence may make researchers reluctant to turn over control of their notes to an archives—especially one
that has the purpose of making materials available publically on the internet. This professional practice not only
makes it very difficult to substantiate or verify ethnographers’ data, suggesting a need for ethnographers to
develop more proactive and sensitive data-sharing procedures, but also creates a situation where ethnographic
materials are saved, but with inadequate plans for preservation. Zeitlyn notes, “Paradoxically most
anthropologists want neither to destroy their field material nor archive it.”41 So, too, with our study: almost no
one had made plans for their data beyond the short term, let alone the final dispensation of their materials at the
end of their career or after their death.
Andrew Asher and Lori M. Jahnke (2013) “Curating the Ethnographic Moment” Archive Journal, Issue 3,
http://www.archivejournal.net/issue/3/archives-remixed/curating-the-ethnographic-moment/
15. Strengths and weaknesses re. data in the
Humanities and Social Sciences
Qualitative and quantitative data is well-understood in the Social
Sciences, less so in the Arts and Humanities
But there’s nothing new about data re-use in the Humanities; it’s an
integral part of the culture, and always has been…
Think Kristeva’s intertextuality, Barthes’ ‘galaxy of signifiers’,
Shakespeare’s plots, Lanark’s assorted ‘plagiarisms’, Edwin Morgan’s
‘found’ newspaper poems, Marcel Duchamp, variations on a theme,
collage and intermedia art, T.S. Eliot, sampling/hip-hop, etc etc
(http://www.slideshare.net/martindonnelly/data-reuse-in-the-arts)
However, it’s often more fraught than scientific data re-use in other
For starters, people don’t always think of their sources or influences
as ‘data’, and the value and referencing systems are quite different
Furthermore, practice/praxis based research is pretty much the sole
preserve of the Humanities, and research/production methods are
not always rigorously methodical or linear…
16. Issues re. data in the Humanities and Social
Sciences
Some characteristics of HSS data are likely to require a different kind of handling from
that afforded to other disciplines
Some of the ‘data’ will be quite personal, and may not be factual in nature. Furthermore,
it may be quite valuable or precious to its creator. What matters most may not be the
content itself, but rather the presentation, the arrangement, the quality of expression…
This tends to be why Open Access embargoes are often longer in the Arts and
Humanities than other areas
There may be a higher incidence of non-digital materials, and a blurring of boundaries
between funded and unfunded work
Digital ‘data’ emerging from the Arts and Humanities is as likely to be an outcome of the
creative research process as an input to a workflow. This is at odds with the scientific
method, and with the language in which most existing RDM resources are described
The relevance of the word ‘data’ is not always apparent. (The term ‘research object’ is
gaining in currency, incorporating data (numeric, written, audiovisual….), software code,
workflows and methodologies, slides, logs, lab books, sketchbooks, notebooks, etc –
basically anything that underpins or enriches the (written) outputs of research)
17. Possible discussion points
How does the university wish to pitch its RDM provision?
Need – what do we need to archive/manage? Is it evidence without
which the research outcomes are in doubt?
Want – do we want to archive/preserve materials for other reasons?
Does preserving preparatory/developmental work provide a richer
experience/understanding of the work and processes of production?
How do we make a business case for this?
Some institutions take a compliance-driven approach to RDM, i.e.
doing only what is required of them by funders / the law
Others see scholarly benefit in going beyond these minima, treating
data as ‘special collections’
Ultimately it’s a question of appraisal…
18. i. DCC resources
Publications
The DCC publishes a series of themed Briefing Papers, How-To Guides
and Case Studies, pitched at different audiences / levels of detail
http://www.dcc.ac.uk/resources/briefing-papers
http://www.dcc.ac.uk/resources/how-guides
http://www.dcc.ac.uk/resources/developing-rdm-services
Training
e.g. DC101 courses and the Curation Reference Manual
Advice
e.g. Disciplinary metadata,
www.dcc.ac.uk/resources/metadata-standards
Tools
DMPonline, CARDIO, Data Asset Framework, DRAMBORA
Events
International Digital Curation Conference (next event is in
London, Feb 2015)
Research Data Management Forum (themed events – next one
on Research Data and Repositories, Leicester, 18-19 Nov 2014)
Tailored support
Via our Institutional Engagement programme
19. ii. Other resources
Jisc services and resources
RDM resources, www.jisc.ac.uk/guides/research-data-management
EDINA and Mimas (national data centres)
JISCMRD projects – Phase 1 (2009-2011) and Phase 2 (2011-2013)
1) Research Data Management Infrastructure (RDMI)
2) Research Data Management Planning (RDMP)
3) Support and Tools
4) Citing, Linking, Integrating and Publishing Research Data (CLIP)
5) Research Data Management Training Materials
6) Enhancing DMPonline
7) Events
UK Data Archive at the University of Essex
Universities
Good materials are available from Edinburgh, Cambridge, Oxford,
Glasgow, Bristol, and many others
20. Thank you
Martin Donnelly
Digital Curation Centre
University of Edinburgh
martin.donnelly@ed.ac.uk
@mkdDCC
For more about DCC services see www.dcc.ac.uk
or follow us on twitter @digitalcuration and #ukdcc
Image credits
Slide 2 (forest) – http://assets.worldwildlife.org/photos/934/images/hero_small/forest-overview-
HI_115486.jpg?1345533675
This work is licensed under
the Creative Commons
Attribution 2.5 UK:
Scotland License.
Editor's Notes
First cohort of institutional engagements, 2011-2013
Leverhulme don’t say much about data
Appraisal categories:
Relevance to Mission – including any legal/funder requirement to retain the data beyond its immediate use.
Scientific or Historical Value – significance and relationship to publications etc.
Uniqueness – can it be found elsewhere / if we don’t preserve it, who will?
Potential for Redistribution – quality / IP / ethical concerns are addressed.
Non-Replicability – either impossible to replicate (e.g. atmospheric or social science data) or not financially viable.
Economic Case – costs of managing and preserving the resource stack up well against potential future benefits.
Full Documentation – surrounding / contextual information necessary to facilitate future discovery, access, and reuse is adequate.