SlideShare a Scribd company logo
1 of 64
Open science primer
meets

Scott Edmunds
@SCEdmunds
@GigaScience
Can this be considered open data?

http://biology.clc.uc.edu/fankhauser/labs/genetics/dna_isolation/thymus_dna.htm
Does this qualify as open source?

http://2011.igem.org/Team:UC_Davis
What is Open (Science) Data?

• Something very very very geeky
• Free & open access to data about the world
around us
Searchable, findable
o Machine-readable, app-makeable, Excel-usable
o Without restrictions/limitations
o

• This (examples)
About me:

• Scott Edmunds
• Molecular biology, sci editing & comms
• Scientific journal & (big) data publishing
• Reproducibility & open science

Journal, data-platform and database for
large-scale biological data
www.gigasciencejournal.com
About me:
About my employer:
• Formerly Beijing Genomics Institute
• Founded in 1999 (1% of HGP)
• China’s 1st citizen managed not-for-profit research
institute funded by commercial sequencing-as-a-service
(BGI Tech)
• Now largest genomic organization in the world
• HQ in Shenzhen, most data production in BGI HK (Tai Po)
Standing on the shoulders of giants
Open Data 1665?

Scholarly articles are merely advertisement of scholarship . The
actual scholarly artefacts, i.e. the data and computational
methods, which support the scholarship, remain largely
inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab
and reproducible research, 1995
OKFN: 8 types of open data

http://science.okfn.org/
Panton Principles

=
http://pantonprinciples.org/
Science Data Volumes
Astrophysics
Exabytes

HE Physics
100’s of Petabytes

Biology
Petabytes

Sequencing
Square Kilometer Array
Large Hadron Collider
Mass Spec

Imaging
The long tail of scientific data…
Esoteric formats, poorly structured,
Tabular, often spreadsheet based
Issues open data community well used to
(data cleaning, scraping, etc.,)
Open Data in Physics
1961 CERN pre-prints shelf

1991-date arXiv

http://cerncourier.com/cws/article/cern/28654
http://arxiv.org/
Open Data in Biology
1934: newsletter era

1980: database era 1987: online era

2010’s: “bioinformatics
bingo” era
BGI HK Chamber O’Illumina’s
The LHC of Biology?
20PB of storage
Open Data in Chemistry
Closed Data in Chemistry
Genomics: open-data success story?

V
Sharing/reproducibility helped by
stability of:
1st Gen

2nd Gen

1. Platforms
1. Repositories
2. Standards

:
Genomics Data Sharing Policies…
Bermuda Accords 1996/1997/1998:
1. Automatic release of sequence assemblies within 24 hours.
2. Immediate publication of finished annotated sequences.
3. Aim to make the entire sequence freely available in the public domain for
both research and development in order to maximise benefits to society.

Fort Lauderdale Agreement, 2003:
1. Sequence traces from whole genome shotgun projects are to be
deposited in a trace archive within one week of production.
2. Whole genome assemblies are to be deposited in a public nucleotide
sequence database as soon as possible after the assembled sequence
has met a set of quality evaluation criteria.

Toronto International data release workshop, 2009:
The goal was to reaffirm and refine, where needed, the policies related to
the early release of genomic data, and to extend, if possible, similar data
release policies to other types of large biological datasets – whether from
proteomics, biobanking or metabolite research.
Sharing aids fields…
Rice v Wheat: consequences of publically available
genome data.
rice

700
600
500

400
300
200
100
0

wheat
Digitizing the world

Can we make everything open data?
NO
The (non-) human centipede: first sequence

NO
PUBLISHER
NARRATIVE

CURATION/
INTEGRATION

SOURCE

DATA

USER
(SOCIAL)
MEDIA

EXTERNAL
DATABASES

Morphbank
ARRAYEXPRESS

DATA PRODUCTION
•
•
•
•
•

Genomics
Barcoding
Imaging
microCT
Video
NO
What is open science? 5 flavours:

Benedikt Fecher and Sascha Friesike: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2272036
Democratic:
Biggest Challenge: Closed Access

WWW.RIGHTTORESEARCH.ORG
Biggest Challenge: Closed Access
Handful of closed access STM publishers control market
Force libraries to buy “bundles”

Revenue >$9B
Average cost /article >$5000 USD
Publishers retain copyright
Prevent data mining of content
Withold information from 99.9% who need it!
Biggest Challenge: Closed Access
Publishing: better than a gold mine

See: http://alexholcombe.wordpress.com/2013/01/09/scholarly-publishers-and-their-high-profits/
Increasing strain on library budgets
MIT library purchases v inflation 1986-2006
400%

Journal expenditure
350%

300%

Percentage Change

250%

200%

150%

Inflation

100%

50%

0%
1986

1988

1990

1992

1994

1996

1998

2000

2002

-50%
Year
Consumer Price Index % +

Serial Expenditures % +

# Books Purchased % +

Book Expenditures % +

# Serials Purchased % +

2004
Too expensive for Harvard…
The good news: the fightback has started…

http://thecostofknowledge.com/
The Solution: Open Access
Budapest Open Access Initiative:
“By “open access” to [peer-reviewed research literature], we mean its
free availability on the public internet, permitting any users to
read, download, copy, distribute, print, search, or link to the full texts
of these articles, crawl them for indexing, pass them as data to
software, or use them for any other lawful purpose, without
financial, legal, or technical barriers other than those inseparable
from gaining access to the internet itself. The only constraint on
reproduction and distribution, and the only role for copyright in this
domain, should be to give authors control over the integrity of their
work and the right to be properly acknowledged and cited.”

• Maximizes reuse and access
• Gives authors control over the integrity of their work and the right
to be properly acknowledged and cited.
• “Real” OA asks for no restrictions/limitations = CC-BY
Hong Kong: off the map
Push the button!

https://www.openaccessbutton.org/
Hong Kong: good with theses…

http://hub.hku.hk/
Hong Kong: still some work to go with OA

…Singapore beats us
Pragmatic:
Infrastructure:
Pragmatic/Infrastructure:
Crowdsourcing, wisdom of the masses

Wiki science:
GeneWiki
• 10,000 distinct gene pages.
• 1.42 million words and 78MB data.
• 50 million views & 15,000 edits per year.
http://en.wikipedia.org/wiki/Portal:Gene_Wiki

GitHub science:

A hypothetical Git workflow for a scientific collaboration involving 3 authors.
Karthik Ram: http://www.scfbm.org/content/8/1/7
Open Lab Notebooks
Our crowdsourcing example:

To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J;
Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y;
Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X;
Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY2482 isolate genome sequencing consortium (2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen.
doi:10.5524/100001
http://dx.doi.org/10.5524/100001
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Downstream consequences:
1. Citations (~180) 2. Therapeutics (primers, antimicrobials)

3. Platform Comparisons

4. Example for faster & more open science

“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastrointestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the
US, affecting about 4000 people and resulting in over 50
deaths. All tested positive for an unusual and little-known
Shiga-toxin–producing E. coli bacterium. The strain was initially
analysed by scientists at BGI-Shenzhen in China, working
together with those in Hamburg, and three days later a draft
genome was released under an open data licence. This
generated interest from bioinformaticians on four continents. 24
hours after the release of the genome it had been assembled.
Within a week two dozen reports had been filed on an opensource site dedicated to the analysis of the strain. These
analyses provided crucial information about the strain’s
virulence and resistance genes – how it spreads and which
antibiotics are effective against it. They produced results in
time to help contain the outbreak. By July 2011, scientists
published papers based on this work. By opening up their early
sequencing results to international collaboration, researchers in
Hamburg produced results that were quickly tested by a wide
range of experts, used to produce new knowledge and
ultimately to control a public health emergency.
Pragmatic/Infrastructure:
Open Innovation Challenges

http://www.scientificamerican.com/openinnovation/

http://www.gov.hk/en/theme/psi/contest/contest_events.htm
Public:
Indie Science

Biohacker spaces
CoResearch labs
Crowdfunding
DIYbio
Open hardware
http://www.perlsteinlab.com/
Biggest crowdfunding successes
Utilizing students: iGEM

iGEM:

http://2011.igem.org/Team:UC_Davis
The “Peoples Parrot”
Puerto Rican Parrot Genome Project (Amazona vittata )
Rarest parrot, national bird of Puerto Rico

Community funded from artworks, fashion shows, beer brands, crowdfunding…
Genome annotated by students in community college as part of bioinformatics education
Paper and Data published in GigaScience and GigaDB

Taras K Oleksyk, et al., (2012) A Locally Funded Puerto Rican Parrot (Amazona vittata) Genome Sequencing Project Increases Avian Data and Advances Young
Researcher Education. GigaScience 2012, 1:14
Steven J. O’Brien. (2012): Genome empowerment for the Puerto Rican parrot – Amazona vittata. GigaScience 2012, 1:13
Oleksyk et al., (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience.
http://dx.doi.org/10.5524/100039
Public: Citizen Science
Galaxy Zoo:

Zoonoverse:

887,355 “Zooites” and counting
https://www.zooniverse.org/
Public: Citizen Science
1987-1997

http://sabap2.adu.org.za/
Easy to get started…

http://crowdcrafting.org/
Public: Games with a Purpose

http://fold.it/
http://www.sciencegamecenter.org/
https://apps.facebook.com/fraxinusgame/
OpenSciDev

http://openscidev.com/
OpenSciDev
Questions asked:
1. What value framework is a prerequisite for open science?
2. How can open science support visibility and communication of
science outside formal academic structures?
3. How can open science create education?
4. How can the economic and social value of open science be
measured?

Currently working on:
• Writing working paper on these questions
• Building networks across Africa, Asia, Latin America and the
Caribbean.
• Setting up call for funding for OpenSciDev projects ($2-3M)
http://openscidev.com/
To summarize:
• Open data is more than just government data
(although research data mostly is government funded too)
• Need for OA advocates & policies in Hong Kong (role for ODHK?)
• Much science community can still learn about open licensing
• Much wider open data community can learn on community
engagement from Citizen Science, GWAP, etc.

• Asia (inc HK) behind US/EU on many of these activities, but can
we learn lessons from success of iGEM and “Jamboreee” model?
*…King+

More Related Content

What's hot

What's hot (20)

ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.GigaScience: a new resource for the big-data community.
GigaScience: a new resource for the big-data community.
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
 
GigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDBGigaScience: data and beta-database launch. Announcing GigaDB
GigaScience: data and beta-database launch. Announcing GigaDB
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Content Mining of Science and Medicine
Content Mining of Science and MedicineContent Mining of Science and Medicine
Content Mining of Science and Medicine
 
A personal perspective on open access publishing
A personal perspective on open access publishingA personal perspective on open access publishing
A personal perspective on open access publishing
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 

Viewers also liked

Viewers also liked (20)

Integrative analysis and visualization of clinical and molecular data for can...
Integrative analysis and visualization of clinical and molecular data for can...Integrative analysis and visualization of clinical and molecular data for can...
Integrative analysis and visualization of clinical and molecular data for can...
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open Science
 
Columbia Talk on Open Notebook Science
Columbia Talk on Open Notebook ScienceColumbia Talk on Open Notebook Science
Columbia Talk on Open Notebook Science
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020 Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020
 
Relationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social MediaRelationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social Media
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open science
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open Science
 
Open science
Open scienceOpen science
Open science
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts';
 
What is Open Science and what role does it play in Development?
What is Open Science and what role does it play in Development?What is Open Science and what role does it play in Development?
What is Open Science and what role does it play in Development?
 
Open Science: What, why, how?
Open Science: What, why, how? Open Science: What, why, how?
Open Science: What, why, how?
 
Winning research proposals with open science
Winning research proposals with open scienceWinning research proposals with open science
Winning research proposals with open science
 
Scholarly publishing in the context of open science
Scholarly publishing in the context of open scienceScholarly publishing in the context of open science
Scholarly publishing in the context of open science
 
Open Science at the European Commission
Open Science at the European CommissionOpen Science at the European Commission
Open Science at the European Commission
 
Open Science in a European Perspective
Open Science in a European PerspectiveOpen Science in a European Perspective
Open Science in a European Perspective
 
Principles and practice of Open Science
Principles and practice of Open SciencePrinciples and practice of Open Science
Principles and practice of Open Science
 
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
 

Similar to Open Data HK: open science meets open data. A primer from Scott Edmunds

5-pln-1520-Conlon
5-pln-1520-Conlon5-pln-1520-Conlon
5-pln-1520-Conlon
med20su
 

Similar to Open Data HK: open science meets open data. A primer from Scott Edmunds (20)

Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecutureScott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
Scott Edmunds & Mendel Wong, Citizen Science #101. HKU MPA lecuture
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
From Deadly E. coli to Endangered Polar Bear: GigaScience Provides First Cita...
 
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
 
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire TalkScott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
Scott Edmunds: GigaScience Datacite meeting Rapid Fire Talk
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
Scott Edmunds Open data examples, from the Science as an Open Enterprise sess...
 
Ebi
EbiEbi
Ebi
 
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
Open Notebook Science
Open Notebook ScienceOpen Notebook Science
Open Notebook Science
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 
Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research  Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research
 
Nicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do researchNicole Nogoy: GigaScience...how licensing can change the way we do research
Nicole Nogoy: GigaScience...how licensing can change the way we do research
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
Scott Edmunds talk at ODHK.meet.26: Open Science Data = Open Data (a rant in ...
 
Laurie Goodman at #crossref14: Ways and Needs to Promote Rapid Data Sharing
Laurie Goodman at #crossref14: Ways and Needs to Promote Rapid Data SharingLaurie Goodman at #crossref14: Ways and Needs to Promote Rapid Data Sharing
Laurie Goodman at #crossref14: Ways and Needs to Promote Rapid Data Sharing
 
5-pln-1520-Conlon
5-pln-1520-Conlon5-pln-1520-Conlon
5-pln-1520-Conlon
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 

More from Scott Edmunds

More from Scott Edmunds (20)

Free the Data! Pitch to Hong Kong Open Data Day 2019
Free the Data! Pitch to Hong Kong Open Data Day 2019Free the Data! Pitch to Hong Kong Open Data Day 2019
Free the Data! Pitch to Hong Kong Open Data Day 2019
 
Scott Edmunds: Access to Information Consultation Recomendations
Scott Edmunds: Access to Information Consultation RecomendationsScott Edmunds: Access to Information Consultation Recomendations
Scott Edmunds: Access to Information Consultation Recomendations
 
Open Data Hong Kong Update: CCCHK@10
Open Data Hong Kong Update: CCCHK@10Open Data Hong Kong Update: CCCHK@10
Open Data Hong Kong Update: CCCHK@10
 
Scott Edmunds Lightning talk: Experiences of NGO
Scott Edmunds Lightning talk: Experiences of NGOScott Edmunds Lightning talk: Experiences of NGO
Scott Edmunds Lightning talk: Experiences of NGO
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10
 
Emblematic education to know thy DNA? TEDxEduHK
Emblematic education to know thy DNA? TEDxEduHKEmblematic education to know thy DNA? TEDxEduHK
Emblematic education to know thy DNA? TEDxEduHK
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HKHong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
Hong Kong 2017 Open Data Day hackathon results: RacismWatch:HK
 
Bauhinia Genome talk at the Galaxy Australasia Meeting
Bauhinia Genome talk at the Galaxy Australasia MeetingBauhinia Genome talk at the Galaxy Australasia Meeting
Bauhinia Genome talk at the Galaxy Australasia Meeting
 
David Palmer: China Open Access week
David Palmer: China Open Access weekDavid Palmer: China Open Access week
David Palmer: China Open Access week
 
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
Bauhina Genome talk: Grass Roots Genomics: Using Hong Kong's Emblem to Crack ...
 
ODHK.Meet.37 Intro to Research Data Policies and Platforms
ODHK.Meet.37 Intro to Research Data Policies and PlatformsODHK.Meet.37 Intro to Research Data Policies and Platforms
ODHK.Meet.37 Intro to Research Data Policies and Platforms
 
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetupScott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
Scott Edmunds pitch Mosquito Alert at the Earthwatch HK Citizen Science meetup
 
Scott Edmunds talking Bauhina Genome at DIYBIOHK
Scott Edmunds talking Bauhina Genome at DIYBIOHKScott Edmunds talking Bauhina Genome at DIYBIOHK
Scott Edmunds talking Bauhina Genome at DIYBIOHK
 
Introductory slides for the MakerBay/ODHK #ZikaHackathon
Introductory slides for the MakerBay/ODHK #ZikaHackathonIntroductory slides for the MakerBay/ODHK #ZikaHackathon
Introductory slides for the MakerBay/ODHK #ZikaHackathon
 
Bauhina Genome slides for school visit
Bauhina Genome slides for school visitBauhina Genome slides for school visit
Bauhina Genome slides for school visit
 
Intro for ODHK.meet.32 on Hacking the "Human Genome"
Intro for ODHK.meet.32 on Hacking the "Human Genome"Intro for ODHK.meet.32 on Hacking the "Human Genome"
Intro for ODHK.meet.32 on Hacking the "Human Genome"
 
BauhinaGenome preview at #ICG10
BauhinaGenome preview at #ICG10BauhinaGenome preview at #ICG10
BauhinaGenome preview at #ICG10
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Open Data HK: open science meets open data. A primer from Scott Edmunds

  • 1. Open science primer meets Scott Edmunds @SCEdmunds @GigaScience
  • 2. Can this be considered open data? http://biology.clc.uc.edu/fankhauser/labs/genetics/dna_isolation/thymus_dna.htm
  • 3. Does this qualify as open source? http://2011.igem.org/Team:UC_Davis
  • 4. What is Open (Science) Data? • Something very very very geeky • Free & open access to data about the world around us Searchable, findable o Machine-readable, app-makeable, Excel-usable o Without restrictions/limitations o • This (examples)
  • 5. About me: • Scott Edmunds • Molecular biology, sci editing & comms • Scientific journal & (big) data publishing • Reproducibility & open science Journal, data-platform and database for large-scale biological data www.gigasciencejournal.com
  • 7. About my employer: • Formerly Beijing Genomics Institute • Founded in 1999 (1% of HGP) • China’s 1st citizen managed not-for-profit research institute funded by commercial sequencing-as-a-service (BGI Tech) • Now largest genomic organization in the world • HQ in Shenzhen, most data production in BGI HK (Tai Po)
  • 8. Standing on the shoulders of giants
  • 9. Open Data 1665? Scholarly articles are merely advertisement of scholarship . The actual scholarly artefacts, i.e. the data and computational methods, which support the scholarship, remain largely inaccessible --- Jon B. Buckheit and David L. Donoho, WaveLab and reproducible research, 1995
  • 10. OKFN: 8 types of open data http://science.okfn.org/
  • 12. Science Data Volumes Astrophysics Exabytes HE Physics 100’s of Petabytes Biology Petabytes Sequencing Square Kilometer Array Large Hadron Collider Mass Spec Imaging
  • 13. The long tail of scientific data… Esoteric formats, poorly structured, Tabular, often spreadsheet based Issues open data community well used to (data cleaning, scraping, etc.,)
  • 14. Open Data in Physics 1961 CERN pre-prints shelf 1991-date arXiv http://cerncourier.com/cws/article/cern/28654 http://arxiv.org/
  • 15. Open Data in Biology 1934: newsletter era 1980: database era 1987: online era 2010’s: “bioinformatics bingo” era
  • 16. BGI HK Chamber O’Illumina’s The LHC of Biology? 20PB of storage
  • 17. Open Data in Chemistry
  • 18. Closed Data in Chemistry
  • 20. Sharing/reproducibility helped by stability of: 1st Gen 2nd Gen 1. Platforms 1. Repositories 2. Standards :
  • 21. Genomics Data Sharing Policies… Bermuda Accords 1996/1997/1998: 1. Automatic release of sequence assemblies within 24 hours. 2. Immediate publication of finished annotated sequences. 3. Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society. Fort Lauderdale Agreement, 2003: 1. Sequence traces from whole genome shotgun projects are to be deposited in a trace archive within one week of production. 2. Whole genome assemblies are to be deposited in a public nucleotide sequence database as soon as possible after the assembled sequence has met a set of quality evaluation criteria. Toronto International data release workshop, 2009: The goal was to reaffirm and refine, where needed, the policies related to the early release of genomic data, and to extend, if possible, similar data release policies to other types of large biological datasets – whether from proteomics, biobanking or metabolite research.
  • 22. Sharing aids fields… Rice v Wheat: consequences of publically available genome data. rice 700 600 500 400 300 200 100 0 wheat
  • 23. Digitizing the world Can we make everything open data?
  • 24. NO
  • 25. The (non-) human centipede: first sequence NO
  • 27. NO
  • 28. What is open science? 5 flavours: Benedikt Fecher and Sascha Friesike: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2272036
  • 30. Biggest Challenge: Closed Access WWW.RIGHTTORESEARCH.ORG
  • 31. Biggest Challenge: Closed Access Handful of closed access STM publishers control market Force libraries to buy “bundles” Revenue >$9B Average cost /article >$5000 USD Publishers retain copyright Prevent data mining of content Withold information from 99.9% who need it!
  • 33. Publishing: better than a gold mine See: http://alexholcombe.wordpress.com/2013/01/09/scholarly-publishers-and-their-high-profits/
  • 34. Increasing strain on library budgets MIT library purchases v inflation 1986-2006 400% Journal expenditure 350% 300% Percentage Change 250% 200% 150% Inflation 100% 50% 0% 1986 1988 1990 1992 1994 1996 1998 2000 2002 -50% Year Consumer Price Index % + Serial Expenditures % + # Books Purchased % + Book Expenditures % + # Serials Purchased % + 2004
  • 35. Too expensive for Harvard…
  • 36. The good news: the fightback has started… http://thecostofknowledge.com/
  • 37. The Solution: Open Access Budapest Open Access Initiative: “By “open access” to [peer-reviewed research literature], we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.” • Maximizes reuse and access • Gives authors control over the integrity of their work and the right to be properly acknowledged and cited. • “Real” OA asks for no restrictions/limitations = CC-BY
  • 38. Hong Kong: off the map Push the button! https://www.openaccessbutton.org/
  • 39. Hong Kong: good with theses… http://hub.hku.hk/
  • 40. Hong Kong: still some work to go with OA …Singapore beats us
  • 42. Pragmatic/Infrastructure: Crowdsourcing, wisdom of the masses Wiki science: GeneWiki • 10,000 distinct gene pages. • 1.42 million words and 78MB data. • 50 million views & 15,000 edits per year. http://en.wikipedia.org/wiki/Portal:Gene_Wiki GitHub science: A hypothetical Git workflow for a scientific collaboration involving 3 authors. Karthik Ram: http://www.scfbm.org/content/8/1/7
  • 44. Our crowdsourcing example: To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001 To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
  • 45.
  • 46.
  • 47. Downstream consequences: 1. Citations (~180) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons 4. Example for faster & more open science “Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”
  • 48. 1.3 The power of intelligently open data The benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastrointestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin–producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an opensource site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.
  • 49.
  • 50.
  • 53. Indie Science Biohacker spaces CoResearch labs Crowdfunding DIYbio Open hardware http://www.perlsteinlab.com/
  • 56. The “Peoples Parrot” Puerto Rican Parrot Genome Project (Amazona vittata ) Rarest parrot, national bird of Puerto Rico Community funded from artworks, fashion shows, beer brands, crowdfunding… Genome annotated by students in community college as part of bioinformatics education Paper and Data published in GigaScience and GigaDB Taras K Oleksyk, et al., (2012) A Locally Funded Puerto Rican Parrot (Amazona vittata) Genome Sequencing Project Increases Avian Data and Advances Young Researcher Education. GigaScience 2012, 1:14 Steven J. O’Brien. (2012): Genome empowerment for the Puerto Rican parrot – Amazona vittata. GigaScience 2012, 1:13 Oleksyk et al., (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience. http://dx.doi.org/10.5524/100039
  • 57. Public: Citizen Science Galaxy Zoo: Zoonoverse: 887,355 “Zooites” and counting https://www.zooniverse.org/
  • 59. Easy to get started… http://crowdcrafting.org/
  • 60. Public: Games with a Purpose http://fold.it/ http://www.sciencegamecenter.org/
  • 63. OpenSciDev Questions asked: 1. What value framework is a prerequisite for open science? 2. How can open science support visibility and communication of science outside formal academic structures? 3. How can open science create education? 4. How can the economic and social value of open science be measured? Currently working on: • Writing working paper on these questions • Building networks across Africa, Asia, Latin America and the Caribbean. • Setting up call for funding for OpenSciDev projects ($2-3M) http://openscidev.com/
  • 64. To summarize: • Open data is more than just government data (although research data mostly is government funded too) • Need for OA advocates & policies in Hong Kong (role for ODHK?) • Much science community can still learn about open licensing • Much wider open data community can learn on community engagement from Citizen Science, GWAP, etc. • Asia (inc HK) behind US/EU on many of these activities, but can we learn lessons from success of iGEM and “Jamboreee” model? *…King+

Editor's Notes

  1. BGI (formerly known as Beijing Genomics Institute) was founded in 1999 and has since become the largest genomic organization in the world, with a focus on research and applications in healthcare, agriculture, conservation, and bio-energy fields.Our goal is to make leading-edge genomics highly accessible to the global research community by leveraging industry’s best technology, economies of scale and expert bioinformatics resources. BGI Americas was established as an interface with customer and collaborations in North and South Americas.
  2. Image: MIT Library