3. Buckheit & Donoho: Scholarly articles are
merely advertisement of scholarship. The
actual scholarly artifacts, i.e. the data and
computational methods, which support
the scholarship, remain largely
inaccessible.
Scientists: what we are doing instead
5. Scientists: what we are doing instead
Focusing on unscientific unreproducibile metrics
Incentivising short term-citations
6. JIFBAIT Network
more
GWAS
GWAS
JIFBAIT NEWS
Arsenic Life forms, will
they take over the planet?
By Melba Ketchum, PhD
Which Overhyped, Unreproducible
Experiment Are You?
Want rapid citations for 2 years only? Carry out this quiz.
You got: STAP Cells
Of course dipping cells in
coffee will make them
pluripotent. Even if the
research gets discredited, it’ll
still get 100’s of citations in
two years.
7. Publish or impoverish: An investigation of the monetary
reward system of science in China (1999-2016)
https://arxiv.org/abs/1707.01162
http://www.szhrss.gov.cn/xxgk/zcfgjjd/gcjzyrcgl/201708/t20170831_8317284.htm
Scientists: what we are doing instead
Shenzhen "Peacock" "National leading talent scheme”:
Science/Nature = ¥3M RMB, JCI Q1 = ¥1.6M RMB (1st & corresponding authors only)
8. Attempts to “game the peer-review system on an industrial
scale”
http://www.scientificamerican.com/article/for-sale-your-name-here-in-a-prestigious-science-journal/
Companies offering authorship of papers made to order by “paper
mills”.
Guaranteed publication in JIF journal, often using fake referees, ID
theft, etc. JIF 1-2 papers = ~$10,000 USD
Scientists: what we are doing instead
9. Do you want to be author of an IF 4.499 paper (J Cellular & Mol Med)?
Title: “…up-regulation of MiR*** can promote the development of liver
cancer by reducing the expression of ****”
Scientists: what we are doing instead
http://www.518sci.com/index.php?catid=17&ydzt=0-9999&zdprice=0-9999
17. The Solution: Open Access
“By “open access” to [peer-reviewed research literature], we mean its
free availability on the public internet, permitting any users to read,
download, copy, distribute, print, search, or link to the full texts of
these articles, crawl them for indexing, pass them as data to software,
or use them for any other lawful purpose, without financial, legal, or
technical barriers other than those inseparable from gaining access to
the internet itself. The only constraint on reproduction and
distribution, and the only role for copyright in this domain, should be
to give authors control over the integrity of their work and the right to
be properly acknowledged and cited.”
Budapest Open Access Initiative:
• Maximizes reuse and access
• Gives authors control over the integrity of their work and the right
to be properly acknowledged and cited.
• “Real” OA asks for no restrictions/limitations = CC-BY
19. • Review
• Data
• Software
• Models
• Pipelines
• Re-use…
= Credit
}
Credit where credit is overdue:
“One option would be to provide researchers who release data to public repositories with
a means of accreditation.”
“An ability to search the literature for all online papers that used a particular data set
would enable appropriate attribution for those who share. “
Nature Biotechnology 27, 579 (2009)
New incentives/credit
24. Rewarding open data & code
http://gigasciencejournal.com/
Ethos/Policies: ‘Impact' is subjective. Data is quantitive.
Since July 2012. Publishes “Data Notes” for CC0 data, “Tech Notes” for OSI software.
25. Integrated GigaDB repository. DataCite DOIs. No size limits, APC covers storage.
http://gigadb.org/
Rewarding open data & code
26. Rewarding & enabling interaction
Building tools (inc Jbrowse for genomes, sketchfab for 3D images) on top of datasets…
CodeOcean widgets for code, “compute capsule” (data+code+environment) run on AWS
[Insert Widget Here]
31. Workflows/Virtual Machines/containers
• Downloadable as virtual harddisk/available as Amazon Machine Image
• More and more container (docker, singularity…) submissions
• CodeOcean widgets for code, “compute capsule” run on AWS
32. First journal with deep integration with
Launched 2nd June 2016
Reward better handling of “wet” protocols…
• Create, share, modify forkeable protocols in repo.
• Download & run on smartphone app.
• Widgets embedded in GigaDB
• Get discoverability, credit, DOIs for sharing methods.
• Create your own, or let us set up & you claim.
https://www.protocols.io/groups/gigascience-journal
33. What is in it for you?
Open Science saves lives
Transparency to the rescue
35. To maximize its utility to the research community and aid those fighting
the current epidemic, genomic data is released here into the public domain
under a CC0 license. Until the publication of research papers on the
assembly and whole-genome analysis of this isolate we would ask you to
cite this dataset as:
Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang,
J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J;
Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X;
Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the
Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium
(2011)
Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI
Shenzhen. doi:10.5524/100001
http://dx.doi.org/10.5524/100001
Our first DOI:
To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to
Genomic Data from the 2011 E. coli outbreak. This work is published from: China.
Open Data to the rescue…
36.
37.
38.
39. Downstream consequences:
“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli
strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days
for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could
use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that
allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and
publish their work without wasting time on legal wrangling.”
1. Many Citations 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons
4. Example for faster & more open science
40. 1.3 The power of intelligently open data
The benefits of intelligently open data were powerfully
illustrated by events following an outbreak of a severe gastro-
intestinal infection in Hamburg in Germany in May 2011. This
spread through several European countries and the US,
affecting about 4000 people and resulting in over 50 deaths. All
tested positive for an unusual and little-known Shiga-toxin–
producing E. coli bacterium. The strain was initially analysed by
scientists at BGI-Shenzhen in China, working together with
those in Hamburg, and three days later a draft genome was
released under an open data licence. This generated interest
from bioinformaticians on four continents. 24 hours after the
release of the genome it had been assembled. Within a week
two dozen reports had been filed on an open-source site
dedicated to the analysis of the strain. These analyses
provided crucial information about the strain’s virulence and
resistance genes – how it spreads and which antibiotics are
effective against it. They produced results in time to help
contain the outbreak. By July 2011, scientists published papers
based on this work. By opening up their early sequencing
results to international collaboration, researchers in Hamburg
produced results that were quickly tested by a wide range of
experts, used to produce new knowledge and ultimately to
control a public health emergency.
42. A mnemonic to remember: FAIR
http://www.nature.com/articles/sdata201618
http://www.datafairport.org/
Require stewardship on top of access
43. A mnemonic to remember: FAIR
http://www.nature.com/articles/sdata201618
http://www.datafairport.org/
44. Beyond a mnemonic: FAIR ecosystems
FAIR metrics
https://www.go-fair.org/go-fair-initiative/
45. Beyond a mnemonic: FAIR Evaluation
Evaluating FAIR-Compliance Through an Objective, Automated, Community-Governed
Framework https://www.biorxiv.org/content/early/2018/09/16/418376
46. DTL/ELIXIR-NL
“Bring Your Own Data Party”
GigaScience/BGI HK
Metabolomics ISA-TAB athon v
More FAIR mnemonics: “BYODs”
48. Open Science, the final frontier:
democratising research for citizens
The Hong Kong example…
49. Inspiration: The “Peoples Parrot”
Puerto Rican Parrot Genome Project (Amazona vittata )
Rarest parrot, national bird of Puerto Rico
Community funded from artworks, fashion shows, beer brands, crowdfunding…
Genome annotated by students in community college as part of bioinformatics education
Paper and Data published in GigaScience and GigaDB
Taras K Oleksyk, et al., (2012) A Locally Funded Puerto Rican Parrot (Amazona vittata) Genome Sequencing Project Increases Avian Data and Advances Young
Researcher Education. GigaScience 2012, 1:14
Steven J. O’Brien. (2012): Genome empowerment for the Puerto Rican parrot – Amazona vittata. GigaScience 2012, 1:13
Oleksyk et al., (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience.
http://dx.doi.org/10.5524/100039
50. HK Botanical &
Afforestation Dept.
"The mysterious origin
of the tree & its
magnificent flowers at
once arrest the interest.
The Bauhinia Mystery?
1903
So far, all efforts to identify them with
any foreign species have failed"
60. Student power (MSc @ CUHK)
Education: teaching people with the data
Transcriptomes assembled & annotated by MSc students
Looked at GO/KEGG
& TCM compounds
Looked at parental links
(diversity, maternal/paternal)
B. Purpurea = motherB. Variegata = father
61. How far can we open science?
Teaching genomics to 7 year olds
62. Open Science = Science
• Science needed more than ever to tackle grave
environmental & public health challenges
• Need to escape from our ivory towers to regain
trust
• Take science back to standing on the shoulders of
giants, rather than unFAIR practices
• Choose evidence not branding
• Once we have Open Data, we then need FAIR data
stewardship
• New EU funder rules on open science/OA coming –
preempt FAIR assessment
63. www.gigasciencejournal.com
Give us your data, papers
& pipelines
Help GigaPanda
make it happen!
scott@gigasciencejournal.com
editorial@gigasciencejournal.com
database@gigasciencejournal.com
Contact us:
64. Thanks to:
Laurie Goodman, Editor in Chief
Nicole Nogoy, Editor
Hans Zauner, Assistant Editor
Hongling Zhao, Assistant Editor
Peter Li, Lead Data Manager
Chris Hunter, Lead BioCurator
Chris Armit, Data Scientist
Mary Ann Tulli, Data Ediitor
Xiao (Jesse) Si Zhe, Database Developer
Chen Qi, Shenzhen Office.
@GigaScience
facebook.com/GigaScience
http://gigasciencejournal.com/blog/
Follow us:
www.gigasciencejournal.com
www.gigadb.org
+
Weibo
& WeChat