3. Outline
• Introduction
• Why do we need good data management?
• Good data management
• Databases and tools
• Sharing your data
4. Who are we?
• NicoleVasilevsky, PhD
– Assistant Professor, Helfgott Research Institute, NCNM
– Project Manager, Ontology Development Group, OHSU
• JackieWirz, PhD
– Assistant Professor, Bioinformation Specialist, OHSU library
• Melissa Haendel, PhD
– Assistant Professor, Department Head, Ontology Development
Group, OHSU
11. Do you get frustrated with any of the following
in your personal or professional life?
a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers/clinicians
e. Ensuring data are secure
f. Making data accessible to other researchers
g. Controlling access to data
h. Tracking updates to data (ie versioning)
i. Creating metadata (ie describing the data to be more useful at a later
time or by others)
j. Protecting intellectual property rights
k. Ensuring appropriate professional credit/citation is given to data
sets/generated
13. Which of the following do you do?
a. Save copies of data on a disk, USB drive, tape, or computer hard drive
b. Save copies of data on a local server
c. Save copies of data on a central campus server
d. Save copies of data on a web-based or cloud server
e. Store data in a repository or archives
f. Automatically backup files
g. Manually generate backup
h. Restrict access to files
14. Credit where credit is due
Data collection
& Analysis
Authoring
Storage,
Archiving, &
Preservation
Publication &
Dissemination
The scholarly
communicatio
n cycle
15. Reproducibility of science
• Lack of information
makes it difficult to
reproduce experiments
• Retraction rates are on
the rise
• Difficulty identifying
resources in the
published literature
Cokol et al. EMBO reports (2008) 9, 2
0%
25%
50%
75%
100%
Antibodies Cell lines Constructs Knockdown
reagents
Organisms
16. Sharing can be advantageous
http://www.flickr.com/photos/eltonl/107582334/sizes/l/in/photostream/
17. Why share your data?
• Data sharing
mandates
– NIH public access
policy
– NIH/NSF data
sharing plan for
new applications
• Further science and
and medicine
• Build collaborations
• Enable new
discoveries with
your data
• Can be required at
time of publication
23. Directory Structure
Sticking with a directory structure can be hard
Files:
SPARC presentation
CTSAconnect presentation
Monarch presentation
Presentations
SPARC CTSAconnect Monarch
27. Remember to backup your
data!
• Recommended to back up three copies!
– 1 on your local workstation
– 1 local/remove, such as external hard drive
– 1 remote, such as on a cloud server*
*Depending on the type of data, as cloud servers are not always secure
http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf
28. Organizing your IRB application
Created by Heather Schiffke
See:
http://libguides.ohsu.edu/data
34. File name File type
Who created the
data
Title
Date created
35.
36.
37. Using structured phenotype data to identify genetic
basis of disease
Human Disease:
HADZISELIMOVIC
SYNDROME
Most similar
mouse model:
b2b1035Clo
(aka Blue Meanie)
tricuspid
valve atresia
MP:0006123
prenatal growth
retardation
MP:0010865
persistent truncus
arteriosis
MP:0002633
cleft palate
MP:0000111
Ventricular
hypertrophy
HP:0001714
High-arched
palate
HP:0000156
Failure to thrive
HP:0001508
Pulmonary
artery atresia
HP:0004935
Renal
hypoplasia
HP:0000089
abnormal
kidney
morphology
abnormal
palate
morphology
growth
deficiency
Malformation
of the heart
and great
vessels
abnormal
heart and
great artery
attachment
duplex kidney
MP:0004017
Phenotypes in
common
(UBEROpheno)
42. What is an Ontology?
1. Hierarchical terms are
defined textually and
logically
2. Relationships between
the terms are defined
3. Expressed in a language
that can be reasoned
across by computers
4. Data can be reused and
can be easily linked
together
47. In Summary:
Structured Metadata = good
How can I create structured metadata?
http://www.flickr.com/photos/san_drino/1454922072/
48. and Tools…
(to make your life easier)
(s)
http://farm4.static.flickr.com/3560/3332644561_c9d5041d02.jpg
49. Data Management tools and
repositories
• Purpose: Software where you can
organize, store and/or share data
• Often contain metadata to assist with data
entry and create structured data
52. Repositories use Unique IDs
• Document Object Identifier (DOI)
• Example: DOIs for publications
– doi: 10.1371/journal.pbio.1001339
• Unique resource identifier (URI)
• A URI will resolve to a single location on the web
• URIs for people
54. • Example:
• John L Campbell, Research Ecologist, Oregon State University, Corvallis
OR
• John L Campbell, Research Ecologist, Center for Research on
Ecosystem Change, Durham, NC
55.
56. Tools for personal data
management
• Google drive
• Dropbox
• Evernote
• Task Paper
• Diigo- bookmarking websites
• Mendeley, EndNote, Zotero- citation manager
• Sound Gecko
http://blogs.scientificamerican.com/information-culture/2012/12/10/managing-personal-knowledge-data-and-information/
NICOLE, MELISSA, JACKIEWhen I was a graduate student, data looked like thisJackie, Melissa, Nicole each show an exampleWhat does mean to you?
NICOLE
NICOLE
NICOLEAsk them to brainstorm some examples of each of theseClinical dataData that is captured in the clinic, ie, vitals, chief complaints, diagnosesExperimental dataOutput from assays, such as numbers in a spreadsheet, images, recordings in a lab notebook, facs plots from a flow cytometerSchool related dataSyllabus, coursework/assignments, tracking student information, etcPersonal dataPersonal files on your computer, your word files, your google docs, your music stored on your computer, your facebook profileSocial dataFacebook, LinkedIn, Instagram
NICOLE
NICOLESmall scale- to big scalePersonal:Efficiency- big data and how airplane companies have figured out how to make airline departures more efficientAirplane dept and arrivals Healthcare can be more efficientTraffic patternsEtcCan leverage data that we have to be more efficient and effective
NICOLE
NICOLEFind passwordFind file on your computer
NICOLE
MELISSA: Impact story- scholarly communications come in many forms, not just publications
MELISSA
MELISSA
MELISSA
MELISSAImproved airline ETAsPilots used to provide the ETA at the airportA company started collecting data about arrival times, and can now better calculate the time of arrival, up to 10 mins closer to the actual timeUses combination of data including weather patterns, flight schedules, previous flight history and arrivals under certain conditions, etc.
JACKIE
JACKIE
JACKIE
JACKIE
JACKIE
JACKIEShow examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
JACKIE
JACKIE
JACKIE
NICOLENOTE: Need to post this on our lib guide
NICOLESoftware that can rename your files, if you already have them named
NICOLE
NICOLE
NICOLE
NICOLEHave them look at this data and try to come up with more metadata
Additional metadata on the patientData on the fileData on the columnData on the rowData in each cellPatient 1 has an ID? Where is the ID and where is it stored?
NICOLEHelfgott would like to pull data from epic to do secondary analysis on patientsCan track outcomes such as, do patients have decreased pain over time after visiting the NCNM clinic, when treated with certain interventionsCan come up with hypotheses and do analysis on patient data
NICOLE- Epic is commonly used in the clinic and contains structured fields for collecting data about patients- Issue is with data entryGarbage in/garbage outStudents (and faculty) are not consistently trained on how to enter data into epicData entry and collection is not done consistently within the clinicFor example, some practioners enter BP into BP fieldOthers add it in progress notesUsing structured metadata allows more consistent date collection and reportingEnables researchers to do secondary analysis on the dataCan pull all the BP data from the BP field if it’s thereIf it’s in the notes or comments, it’s difficult to grab and analyze this data
MELISSA
MELISSA
MELISSA
JACKIE
JACKIE
MELISSA
MELISSA
MELISSA
NICOLE
NICOLE
NICOLEBiosharing and Isatab tools
NICOLE
NICOLE
NICOLEFigShareDryadData.gov
NICOLE
NICOLE
NICOLE
MELISSAGoal is to solve the author/contributor name ambiguity problem in scholarly communications Creating a central registry of unique identifiers for individual researchers Identifiers, and the relationships among them, can be linked to the researcher