SlideShare a Scribd company logo
1 of 43
We are always looking for data
Finding & Accessing
Human Genomic
Datasets
CRUK, 7th November 2016
Tweets welcome
#CamFindData
@repositiveio
Outline of the day
- Data sources and data access
- Case study: University of Cambridge
- Coffee break
- Introduction to Repositive
- Hands-on session: searching for data
- Round up and closure
On-line tools used during the workshop
To ask questions during the presentation and answer questions:
go to slido.com
enter event code: 7315
We are always looking for data
Finding & Accessing
Human Genomic
Datasets
CRUK, 7th November 2016
Tweets welcome
#CamFindData
@repositiveio
• 2001: First Human Genome Sequence
• 2005: Personal Genome Project
• 2008: UK10K
• 2013: UK 100K Project
• 2015: 1M Precision Medicine US
• 2016: AstraZeneca – HLI 2M
• Many other national and international projects
Genome Technology Evolution
•Consensus among researchers, clinicians,
politicians & the public that genomics will
transform biomedical research, healthcare
and lifestyle choices (Stephan Beck, UCL)
OPPORTUNITY
Data should be made available
• Required by funders
• Cannot publish unless accession
number given
• Specialised
• ENA
• EGA
• dbGaP
• dbSNP…
• Generalist
• Dryad
• figshare
Public Repositories
• Open Access
• Eg. PGP, CC0
• Bermuda Accord
• Managed (Restricted or Controlled Access)
• Data Access Committee
• No effective agreement (policy vacuum)
• Global Alliance for Genomics & Health
• enable compatible, readily accessible, and scalable approaches for
sharing
GOVERNANCE Models
Open vs Managed Access
Open Access
75,000,000 per month
Managed Access
150 per month
500,000 fold difference (Stephan Beck, UCL)
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Large amounts of data, but not accessible
≈ .5PB
Open
Access
80+PB
Sequenced
Genome data
available in public
repos
Exponential
growth rate
Under-utilised data
has huge potential for
medical research
Access to Managed Data
Benefits:
• Strict governance
• Individuals are protected
• Review of consent
• Applicant signs for full
responsibility for governance
Disadvantages:
• No control of data once access
is given
• High barrier for access – too
high?
Often a long process
Bottlenecks:
• Finding relevant and usable
data
• Getting authorisation to
access data
• Formatting data
• Storing and moving data
We studied the problem with
qualitative interviews followed
by a survey of researchers in
human genetics
T. A. van Schaik et al
The need to redefine genomic data sharing: a focus on
data accessibility, Applied & Translational Genomics, 2014
http://tinyurl.com/schaik-dnadigest
NIH / eRA Commons login
No
Yes
Organisation registered with eRA
Organisation has DUNS number
No
No
Write research proposal
Yes
+ 2-3 days
+ 1-2 weeks
+ 1 week
Yes
Submit proposal
+ 1-2 days
Access granted
Find/Download/Decrypt data
+ 1-4 weeks
Science…
+ 1-2 days
PRO Tip: If you use human
genomic data, apply for the
GRU datasets in dbGaP, one
application – access to all the
GRU datasets.
dbGaP application process
Blog Post:
http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/
Sanger eDAM Account
No
Write research proposal
+ 1 hour
Yes
Submit proposal
+ 1-2 days
Access granted
Find/Download/Decrypt data
+ 2-7 days
Science…
+ 1-2 days
EGA application process
Blog Post:
http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/
• Finding specific relevant genomic data for research can
take up to six months for an untrained researcher
without dedicated tools
• Application & response time for data access
committees can vary widely depending on
• the type of dataset
• consent regulations of the study
• => there is no consensus for the ‘contracts’ between each dataset
FACTS
Researchers often choose to not access data at all
WHY should we bother?
• Validate existing studies
• Avoid unnecessary duplication
• Compare to new studies
• Enhance new datasets
Why datasets are useful
Case studies
Raquel, PhD Student, London, UK.
Researching genes associated with rare eye disorders.
Problems:
- Doesn’t know where to look for data.
- Doesn't know if data even exists.
“I gave up on finding the data - it was very time consuming and not
proving fruitful – so I started focusing more on generating my own
data.”
Case studies
Mahantesh, Academic Researcher, Taipei, Taiwan.
Studying pharmacogenomics in cardiovascular epidemiology.
Problems:
- Needs lots of data.
- Knows it exists but struggles with getting access to it.
“Often it’s very hard to get the required number of cases and controls
to carry out research in public health and epidemiology.”
Case studies
Jana, Company Biocurator, Zurich, Switzerland.
Biocurating microarray and RNA-Seq data.
Problems:
- Needs lots of data.
- Lots of data out there but hard to filter down to ‘useful / relevant’
data.
“Many repositories don’t list the metadata details I need to know if a
dataset is useful to me, I can waste a lot of time searching.”
How many data sources?
How many sources of human
genomics data do you know
about?
11
155
2
2
4
4
7
780
0
5
10
15
20
25
30
35
40
45
GB FI NL FR DE CH EE BE DK ES SI IE SE
0
5
10
15
20
25
30
35
CA MD MA WA NY TX AZ DC NJ NC PA UT TN CO IN FL LA VA IL ME OH MO MI SC OR
1
1
1
1
1
1
Data sources across the globe
GEO location of 278
data sources analysed.
Found by tracking IP address
of the source.
These include:
 Public Repositories
 Universities
 Companies
 BioBanks
 Research consortiums
Data source content
Assay Types
Dedicated to…
DATA is fragmented
Hundreds of data sources
…but they aren’t easy to find!
http://tinyurl.com/plos-biology-repositiveFirst 30 data sources listed here:
10
25
33 35
102
174
239
0
50
100
150
200
250
300
Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-16
Cambridge specific Case Study
• Post doctoral researcher at University of Cambridge
Medical School
• Working on genetic inheritance and Cancer
• Using NGS data and bioinformatics
• After searching for data online she decided to apply for:
• 2 dbGaP datasets
• 3 EGA datasets
Cambridge specific Case Study
Blog Post:
Pending… will be on http://blog.repositive.io/
The Research Operations Office - will help you with the
contracts (Data Transfer Agreements - DTAs) and signatures.
• Has a designated individual who processes all dbGaP
applications as they all abide by NIH legal restrictions and
regulations about how to handle the data once granted
access
• For EGA applications, each DTA must be processed
separately because there is no consensus for the ‘contracts’
between each dataset.
Cambridge specific Case Study
Blog Post:
Pending… will be on http://blog.repositive.io/
The nominated IT director - will be specific to your
department.
• They will need to confirm you can support the requirements of
the DTA.
• If the head of your departmental IT is not happy to sign – the
head of IT for the University will be able to sign it off.
Cambridge specific Case Study
Blog Post:
Pending… will be on http://blog.repositive.io/
Top Tips:
• Think about your storage space!
• Think about what sort of analysis and processing
you are going to do with the data once you do have
it. After such a long process, the approval could be
too quick.
• Understand what you need before you start the
application process!
• You may have access for a limited period
Cambridge specific Case Study
COFFEE BREAK
Back in 10’
@repositiveio
1-click to human genomic data access
to make finding data as easy as finding a book
on Amazon, book a hotel on Expedia!
Simpler workflow
for data access
Our expertise is data search platforms
Discover and
access
Search, see
related results
Find colleagues &
their data interests
Co-annotate data &
community feedback
We are enabling best practices
MAKE DATA
DISCOVERABLE
SIMPLIFY
WORKFLOWS
CONTRIBUTE TO
COMMUNITY
DNAdigest and Repositive – Connecting the world of genomic data
http://www.tinyurl.com/plos-biology-repositive
Connecting the world of genomic data
1. Form groups of 2-3 people
2. Select a leader & a spokeperson
3. Choose 1 data theme you are interested in
1. E.g, colon cancer, prostate cancer, breast cancer
4. Sign up at https://discover.repositive.io/
5. Search the Repositive with selected theme
Hands on
Team presentation: 2 minutes
1. Introduction
 What data did you try to find and why?
 Have you tried to search for this data before?
2. Methods
 The 5 main steps you took on Repositive to try and find this data.
3. Results
 Did you find the data on Repositive?
 What challenges did you encounter?
4. Conclusion
 Sum up your experience in 1 sentence.
1 2 3 4 5
Feedback on the workshop
Bugs and feedback to: Charlotte at Repositive.io
Please leave your feedback on the workshop:
http://tinyurl.com/feedback280916
http://discover.repositive.io
@repositive

More Related Content

What's hot

Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)Heather Piwowar
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6ARDC
 
Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWAKatina Toufexis
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Katina Toufexis
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 

What's hot (20)

Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6Fsci 2018 monday30_july_am6
Fsci 2018 monday30_july_am6
 
Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 

Viewers also liked

2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine LectureDan Gaston
 
동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)gilforum
 
Drug discovery By Neelima Sharma WCC chennai,neelima.sharma60@gmail.com
Drug discovery By  Neelima Sharma WCC chennai,neelima.sharma60@gmail.comDrug discovery By  Neelima Sharma WCC chennai,neelima.sharma60@gmail.com
Drug discovery By Neelima Sharma WCC chennai,neelima.sharma60@gmail.comNeelima Sharma
 
Introduction to human genetic 2016
Introduction to human genetic 2016Introduction to human genetic 2016
Introduction to human genetic 2016Mohamed Bakr
 
3D In Vitro Models for Drug Efficiency Testing
3D In Vitro Models for Drug Efficiency Testing3D In Vitro Models for Drug Efficiency Testing
3D In Vitro Models for Drug Efficiency TestingTiffany Ho
 
미래 인재상과 스펙초월 채용시스템(장석호)
미래 인재상과 스펙초월 채용시스템(장석호)미래 인재상과 스펙초월 채용시스템(장석호)
미래 인재상과 스펙초월 채용시스템(장석호)gilforum
 
길벗 오픈 안내문
길벗 오픈 안내문길벗 오픈 안내문
길벗 오픈 안내문gilforum
 
Computational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomicsComputational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomicsGary Bader
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't SpecialAllen Day, PhD
 
[2014년 5월 20일] 바이오 및 의료산업동향
[2014년 5월 20일] 바이오 및 의료산업동향[2014년 5월 20일] 바이오 및 의료산업동향
[2014년 5월 20일] 바이오 및 의료산업동향gilforum
 
Invitro antidiabetic activity
Invitro antidiabetic activityInvitro antidiabetic activity
Invitro antidiabetic activityRohit K.
 
Genomics Facts: Did You Know?
Genomics Facts: Did You Know?Genomics Facts: Did You Know?
Genomics Facts: Did You Know?InsideDNA
 
s.s.c (Alternative to animal study)
s.s.c (Alternative to animal study)s.s.c (Alternative to animal study)
s.s.c (Alternative to animal study)Sandip Chaudhari
 
Alternative to animal studies
Alternative to animal studiesAlternative to animal studies
Alternative to animal studiespaulvitrion91
 
Going Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's NextGoing Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's NextHealth Catalyst
 
Alternatives to animal screening methods p'screening. mohammadhusain
Alternatives to animal screening methods p'screening. mohammadhusainAlternatives to animal screening methods p'screening. mohammadhusain
Alternatives to animal screening methods p'screening. mohammadhusainVasaya Mohammadhusain
 
human genetics and population genetics
human genetics and population geneticshuman genetics and population genetics
human genetics and population geneticsDEEPAK SAINI
 

Viewers also liked (20)

2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)동북아 국제 정세(박인휘 교수)
동북아 국제 정세(박인휘 교수)
 
Drug discovery By Neelima Sharma WCC chennai,neelima.sharma60@gmail.com
Drug discovery By  Neelima Sharma WCC chennai,neelima.sharma60@gmail.comDrug discovery By  Neelima Sharma WCC chennai,neelima.sharma60@gmail.com
Drug discovery By Neelima Sharma WCC chennai,neelima.sharma60@gmail.com
 
Biomedical genomics lecture
Biomedical genomics lectureBiomedical genomics lecture
Biomedical genomics lecture
 
Introduction to human genetic 2016
Introduction to human genetic 2016Introduction to human genetic 2016
Introduction to human genetic 2016
 
3D In Vitro Models for Drug Efficiency Testing
3D In Vitro Models for Drug Efficiency Testing3D In Vitro Models for Drug Efficiency Testing
3D In Vitro Models for Drug Efficiency Testing
 
미래 인재상과 스펙초월 채용시스템(장석호)
미래 인재상과 스펙초월 채용시스템(장석호)미래 인재상과 스펙초월 채용시스템(장석호)
미래 인재상과 스펙초월 채용시스템(장석호)
 
길벗 오픈 안내문
길벗 오픈 안내문길벗 오픈 안내문
길벗 오픈 안내문
 
Computational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomicsComputational challenges in precision medicine and genomics
Computational challenges in precision medicine and genomics
 
Genomics isn't Special
Genomics isn't SpecialGenomics isn't Special
Genomics isn't Special
 
[2014년 5월 20일] 바이오 및 의료산업동향
[2014년 5월 20일] 바이오 및 의료산업동향[2014년 5월 20일] 바이오 및 의료산업동향
[2014년 5월 20일] 바이오 및 의료산업동향
 
Gene therapy
Gene therapyGene therapy
Gene therapy
 
Invitro antidiabetic activity
Invitro antidiabetic activityInvitro antidiabetic activity
Invitro antidiabetic activity
 
Genomics Facts: Did You Know?
Genomics Facts: Did You Know?Genomics Facts: Did You Know?
Genomics Facts: Did You Know?
 
s.s.c (Alternative to animal study)
s.s.c (Alternative to animal study)s.s.c (Alternative to animal study)
s.s.c (Alternative to animal study)
 
Alternative to animal studies
Alternative to animal studiesAlternative to animal studies
Alternative to animal studies
 
Toxic studies
Toxic studiesToxic studies
Toxic studies
 
Going Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's NextGoing Beyond Genomics in Precision Medicine: What's Next
Going Beyond Genomics in Precision Medicine: What's Next
 
Alternatives to animal screening methods p'screening. mohammadhusain
Alternatives to animal screening methods p'screening. mohammadhusainAlternatives to animal screening methods p'screening. mohammadhusain
Alternatives to animal screening methods p'screening. mohammadhusain
 
human genetics and population genetics
human genetics and population geneticshuman genetics and population genetics
human genetics and population genetics
 

Similar to Finding & Accessing Human Genomic Datasets

SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...Fiona Nielsen
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshoplindahauck
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhilip Bourne
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutIUPUI
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research RequirementsICPSR
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Carolyn Ten Holter
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...Christopher Hart
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsARDC
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamPlatforma Otwartej Nauki
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing dataSarah Jones
 

Similar to Finding & Accessing Human Genomic Datasets (20)

SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Data management profiles workshop
Data management profiles workshopData management profiles workshop
Data management profiles workshop
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Meeting Federal Research Requirements
Meeting Federal Research RequirementsMeeting Federal Research Requirements
Meeting Federal Research Requirements
 
Data at the NIH
Data at the NIHData at the NIH
Data at the NIH
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Biosb2017_Repositive
Biosb2017_RepositiveBiosb2017_Repositive
Biosb2017_Repositive
 
Research Data Management and your PhD
Research Data Management and your PhDResearch Data Management and your PhD
Research Data Management and your PhD
 
Research-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhDResearch-Data-Management-and-your-PhD
Research-Data-Management-and-your-PhD
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Managing and sharing data
Managing and sharing dataManaging and sharing data
Managing and sharing data
 

Recently uploaded

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 

Finding & Accessing Human Genomic Datasets

  • 1. We are always looking for data Finding & Accessing Human Genomic Datasets CRUK, 7th November 2016 Tweets welcome #CamFindData @repositiveio
  • 2. Outline of the day - Data sources and data access - Case study: University of Cambridge - Coffee break - Introduction to Repositive - Hands-on session: searching for data - Round up and closure
  • 3. On-line tools used during the workshop To ask questions during the presentation and answer questions: go to slido.com enter event code: 7315
  • 4. We are always looking for data Finding & Accessing Human Genomic Datasets CRUK, 7th November 2016 Tweets welcome #CamFindData @repositiveio
  • 5. • 2001: First Human Genome Sequence • 2005: Personal Genome Project • 2008: UK10K • 2013: UK 100K Project • 2015: 1M Precision Medicine US • 2016: AstraZeneca – HLI 2M • Many other national and international projects Genome Technology Evolution
  • 6. •Consensus among researchers, clinicians, politicians & the public that genomics will transform biomedical research, healthcare and lifestyle choices (Stephan Beck, UCL) OPPORTUNITY
  • 7. Data should be made available
  • 8. • Required by funders • Cannot publish unless accession number given • Specialised • ENA • EGA • dbGaP • dbSNP… • Generalist • Dryad • figshare Public Repositories
  • 9. • Open Access • Eg. PGP, CC0 • Bermuda Accord • Managed (Restricted or Controlled Access) • Data Access Committee • No effective agreement (policy vacuum) • Global Alliance for Genomics & Health • enable compatible, readily accessible, and scalable approaches for sharing GOVERNANCE Models
  • 10. Open vs Managed Access Open Access 75,000,000 per month Managed Access 150 per month 500,000 fold difference (Stephan Beck, UCL)
  • 11. 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Large amounts of data, but not accessible ≈ .5PB Open Access 80+PB Sequenced Genome data available in public repos Exponential growth rate Under-utilised data has huge potential for medical research
  • 12. Access to Managed Data Benefits: • Strict governance • Individuals are protected • Review of consent • Applicant signs for full responsibility for governance Disadvantages: • No control of data once access is given • High barrier for access – too high?
  • 13. Often a long process Bottlenecks: • Finding relevant and usable data • Getting authorisation to access data • Formatting data • Storing and moving data We studied the problem with qualitative interviews followed by a survey of researchers in human genetics T. A. van Schaik et al The need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 http://tinyurl.com/schaik-dnadigest
  • 14. NIH / eRA Commons login No Yes Organisation registered with eRA Organisation has DUNS number No No Write research proposal Yes + 2-3 days + 1-2 weeks + 1 week Yes Submit proposal + 1-2 days Access granted Find/Download/Decrypt data + 1-4 weeks Science… + 1-2 days PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets. dbGaP application process Blog Post: http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/
  • 15. Sanger eDAM Account No Write research proposal + 1 hour Yes Submit proposal + 1-2 days Access granted Find/Download/Decrypt data + 2-7 days Science… + 1-2 days EGA application process Blog Post: http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/
  • 16. • Finding specific relevant genomic data for research can take up to six months for an untrained researcher without dedicated tools • Application & response time for data access committees can vary widely depending on • the type of dataset • consent regulations of the study • => there is no consensus for the ‘contracts’ between each dataset FACTS
  • 17. Researchers often choose to not access data at all
  • 18. WHY should we bother?
  • 19. • Validate existing studies • Avoid unnecessary duplication • Compare to new studies • Enhance new datasets Why datasets are useful
  • 20. Case studies Raquel, PhD Student, London, UK. Researching genes associated with rare eye disorders. Problems: - Doesn’t know where to look for data. - Doesn't know if data even exists. “I gave up on finding the data - it was very time consuming and not proving fruitful – so I started focusing more on generating my own data.”
  • 21. Case studies Mahantesh, Academic Researcher, Taipei, Taiwan. Studying pharmacogenomics in cardiovascular epidemiology. Problems: - Needs lots of data. - Knows it exists but struggles with getting access to it. “Often it’s very hard to get the required number of cases and controls to carry out research in public health and epidemiology.”
  • 22. Case studies Jana, Company Biocurator, Zurich, Switzerland. Biocurating microarray and RNA-Seq data. Problems: - Needs lots of data. - Lots of data out there but hard to filter down to ‘useful / relevant’ data. “Many repositories don’t list the metadata details I need to know if a dataset is useful to me, I can waste a lot of time searching.”
  • 23. How many data sources? How many sources of human genomics data do you know about?
  • 24. 11 155 2 2 4 4 7 780 0 5 10 15 20 25 30 35 40 45 GB FI NL FR DE CH EE BE DK ES SI IE SE 0 5 10 15 20 25 30 35 CA MD MA WA NY TX AZ DC NJ NC PA UT TN CO IN FL LA VA IL ME OH MO MI SC OR 1 1 1 1 1 1 Data sources across the globe GEO location of 278 data sources analysed. Found by tracking IP address of the source. These include:  Public Repositories  Universities  Companies  BioBanks  Research consortiums
  • 25. Data source content Assay Types Dedicated to…
  • 27. Hundreds of data sources …but they aren’t easy to find! http://tinyurl.com/plos-biology-repositiveFirst 30 data sources listed here: 10 25 33 35 102 174 239 0 50 100 150 200 250 300 Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-16
  • 29. • Post doctoral researcher at University of Cambridge Medical School • Working on genetic inheritance and Cancer • Using NGS data and bioinformatics • After searching for data online she decided to apply for: • 2 dbGaP datasets • 3 EGA datasets Cambridge specific Case Study Blog Post: Pending… will be on http://blog.repositive.io/
  • 30. The Research Operations Office - will help you with the contracts (Data Transfer Agreements - DTAs) and signatures. • Has a designated individual who processes all dbGaP applications as they all abide by NIH legal restrictions and regulations about how to handle the data once granted access • For EGA applications, each DTA must be processed separately because there is no consensus for the ‘contracts’ between each dataset. Cambridge specific Case Study Blog Post: Pending… will be on http://blog.repositive.io/
  • 31. The nominated IT director - will be specific to your department. • They will need to confirm you can support the requirements of the DTA. • If the head of your departmental IT is not happy to sign – the head of IT for the University will be able to sign it off. Cambridge specific Case Study Blog Post: Pending… will be on http://blog.repositive.io/
  • 32. Top Tips: • Think about your storage space! • Think about what sort of analysis and processing you are going to do with the data once you do have it. After such a long process, the approval could be too quick. • Understand what you need before you start the application process! • You may have access for a limited period Cambridge specific Case Study
  • 35. 1-click to human genomic data access to make finding data as easy as finding a book on Amazon, book a hotel on Expedia!
  • 36. Simpler workflow for data access Our expertise is data search platforms Discover and access Search, see related results Find colleagues & their data interests Co-annotate data & community feedback
  • 37. We are enabling best practices MAKE DATA DISCOVERABLE SIMPLIFY WORKFLOWS CONTRIBUTE TO COMMUNITY DNAdigest and Repositive – Connecting the world of genomic data http://www.tinyurl.com/plos-biology-repositive
  • 38. Connecting the world of genomic data
  • 39.
  • 40. 1. Form groups of 2-3 people 2. Select a leader & a spokeperson 3. Choose 1 data theme you are interested in 1. E.g, colon cancer, prostate cancer, breast cancer 4. Sign up at https://discover.repositive.io/ 5. Search the Repositive with selected theme Hands on
  • 41. Team presentation: 2 minutes 1. Introduction  What data did you try to find and why?  Have you tried to search for this data before? 2. Methods  The 5 main steps you took on Repositive to try and find this data. 3. Results  Did you find the data on Repositive?  What challenges did you encounter? 4. Conclusion  Sum up your experience in 1 sentence. 1 2 3 4 5
  • 42. Feedback on the workshop Bugs and feedback to: Charlotte at Repositive.io Please leave your feedback on the workshop: http://tinyurl.com/feedback280916

Editor's Notes

  1. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and vetting of users
  2. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  3. Population scale genome sequencing projects have been launched all over the world More than 80PB of human genomic data is being sequenced Every year BUT To date only around .5PB of data available in public repositories
  4. Because interpretation requires LOTS of data And although data exists around the world, it is siloed, and even if available, it is not accessible This is Jenn, a genetic researcher –our target customer- seeking to interpret data from genetic diseases and cancer She needs data from other patients to compare and interpret Mabels DNA She also has data available in her own lab, but she cannot share because of concerns how to deal with secure access to sensitive data and data governance, e.g. vetting of users
  5. Examples of researchers looking for genomics data. All have problems, even though in different parts of the world, in different industries and with different research questions.
  6. Examples of researchers looking for genomics data. All have problems, even though in different parts of the world, in different industries and with different research questions.
  7. Examples of researchers looking for genomics data. All have problems, even though in different parts of the world, in different industries and with different research questions.
  8. Further confounded by the data being highly fragmented. Siloed in repositories and institutions around the world.
  9. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data
  10. Our vision is to make genomic data access as easy as finding a book on Amazon or book a hotel on Expedia
  11. KEY POINTS: Repositive builds tools for genomics data search & access. We’re really good at it. We have the expertise in-house. It’s what we do. Aside from building a highly functional tool, we’ve taken the time to prioritise User Experience, streamlining of user workflows & presentation. Within a month of our formal platform launch we have over 600 registered users. The Repositive platform is an online community and marketplace connecting data consumers with data providers. On Repositive, Jenn has Easy, Interactive search Faster data access workflow Easy access to new data collaborators Benefiting from reading feedback on data from community, colleagues, to assess data quality and utility The Repositive platform and technology will remove barriers to data sharing and will incentivise users to explore, contribute and collaborate in alignment with best practices
  12. FAIR data: https://www.force11.org/group/fairgroup/fairprinciples
  13. DNA.land OpenSNP PersonalGenomesProject Direct to consumer genetic tests & microbiome
  14. Our mission is to speed up research and diagnostics for genetic diseases by enabling efficient and ethical access to genomic research data