Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Better Data for a Better World

723 views

Published on

How open data contribute to improving the world. The life science use case. The technical, social, ethical issues.

This was a talk given within the iGEM 2020 programme by the London Imperial College students group (https://2020.igem.org/Team:Imperial_College), in a webinar organised by the SOAPLab group on the topic of Ethics of Automation. Excellent Dr Brandon Sepulvado was the other speaker of the day.

Published in: Science
  • Be the first to comment

Better Data for a Better World

  1. 1. Marco Brandizi <marco.brandizi@rothamsted.ac.uk> Oct, 16th - iGEM 2020 webinar BetterDataforaBetterWorld Find this presentation on SlidesShare Background source: https://pxhere.com/en/photo/857152
  2. 2. Hello! • Geek since 1980s and C=64 times • Started working with Life Science Data 2003 • at Univ. of Milano-Bicocca, EMBL-EBI • and now Rothamsted Research • Meanwhile, (h)activism in open source, open data
  3. 3. A Long History Mankind and Data • Gather knowledge • Know how things work, make predictions • Improve our lives • (in addition to being good on itself) Egypt, 2500BC (https://brewminate.com/census-taking-in-the-ancient-world/)
  4. 4. In the past 20yrs or so Economist, 2010 (https://www.economist.com/node/21521548)
  5. 5. Why and How? In the past 20yrs or so
  6. 6. We advanced in • Gathering (eg, smartphones, IoT, 5G) • Stocking (eg, clouds) • Processing (eg, AI, Machine Learning) • Sharing (eg, web, standards, data portals) • Searching (eg, NoSQL, Indexing) • Visualising (eg, literature on HCI, data charts)... ...Data, Information, Knowledge duction Precision Farming TIM AgRA Present Future Conclusion References recision Farming [1] 13 / 42 Images Source: http://ieeeagra.com/ieeeagra/Downloads/20141204-Fernandez-Presentation.pdf and establish virtuous circles
  7. 7. (background: https://www.flickr.com/photos/kevinmgill/14676390490/in/photostream/) A World of Openness
  8. 8. The Cause for Open Data/Knowledge • Data portals, policies, standards • https://www.data.gov/, https://data.gov.uk/ • https://www.europeandataportal.eu/en • https://ec.europa.eu/digital-single-market/en/european-legislation-reuse-public-sector-information • https://joinup.ec.europa.eu/ • In science • https://fairsharing.org/ • https://www.nature.com/sdata/ • Data and activism • DBPedia, aka Wikipedia as data (https://wiki.dbpedia.org/about) • Wikidata (https://www.wikidata.org/) • Open Street Map (https://www.openstreetmap.org/about)
  9. 9. Open Data Cause: The Life Science Use Case https://evaprofecmc.jimdofree.com/unit-4-the-genetic-revolution/2-2-chromosomes-and-genes/
  10. 10. So, sequencing was (is) pretty much important... Source: https://boydfuturist.wordpress.com/tag/human-genome-project/ (also an interesting reading)
  11. 11. ...indeed • The race to sequence the human genome https://www.youtube.com/watch?v=AhsIF-cmoQQ • The Human Genome Project Race https://genomics-old.soe.ucsc.edu/research/hgp_race • How to sequence human genome https://www.youtube.com/watch?v=MvuYATh7Y74 Recommended:
  12. 12. Fast-forward to nowadays
  13. 13. Which integrates with a wealth of (open) data
  14. 14. And allows for Reuse and further Advancements
  15. 15. The Cause for Open Data • Allows for reuse • no need to regenerate • less expensive • Allows for integration between heterogeneous data • different entities (genes, proteins, chemistry, species, literature...) • different scales (cells, organs, individuals, populations) • New discoveries, novel uses • Reproducible science • and quality improvement Practical Reasons
  16. 16. The Cause for Open Data • Public-funded data are ours • Savings opportunities add up • (but giving them out for free has a cost) • Data are ours anyway (eg, genetic data) • Transparency (and again, reproducibility) • Public benefits outweigh private interests Ethical Reasons
  17. 17. But, how? Based on publications, which genes are related to yellow rust? In which biological processes are their encoded proteins involved? 1 2 3 4 5 6 1 2 3 4 5 6
  18. 18. Good Data Principles: Interoperability through Standards https://tinyurl.com/y5e6kfa2 https://doi.org/10.1186/s41074-019-0055-1 https://tinyurl.com/y3h9c65k https://tinyurl.com/y2wzlwbk
  19. 19. Data Standards: schema.org example https://www.bbcgoodfood.com/recipes/classic-potato-salad Source & recommended read: https://www.slideshare.net/NiallBeard/bioschemas-workshop
  20. 20. schema.org used for Knetminer and Agrifood Data github.com/Rothamsted/agri-schemas https://tinyurl.com/y44a5lj9
  21. 21. References • Brandizi et al, 2018, https://europepmc.org/ article/med/30085931 • IB2018 presentation https://tinyurl.com/ yaq8nt5e • AgriSchemas and data standards, IB 2019 • Reusing Knetminer data with Python/Jupyter • https://tinyurl.com/yyhnkuyk • https://tinyurl.com/y446y979
  22. 22. Good Data Principles: FAIR • Findable • ex, Give your dataset a DOI, which resolves to schema.org descriptor, register it on datasetsearch.research.google.com • Accessible • ex, resolvable DOI makes it accessible. Wrap with access control as needed • Interoperable • Eg, data described with schema.org, GO and other OBO ontologies • Query protocols/standards (eg, SPARQL, GraphQL APIs, JSON Schema APIs, JSON-LD APIs) • Reusable • Clear licence • Ideally, machine-readable licence (eg, CCREL) Source and recommended read: https://tinyurl.com/yxocd3b9
  23. 23. Issues: Easier to Say than to Do https://tinyurl.com/yxsftwvy https://xkcd.com/927/
  24. 24. Issues: Common Good vs Private Interests • ...Parts of the standard that are not priorities for Google are not well documented anywhere. If they are priorities for Google, however, Google itself provides excellent documentation about how information should be specified in schema.org so that Google can use it. Because schema.org’s documentation is poor, the focus of attention stays on Google. Time to end Google’s domination of schema.org, https://tinyurl.com/y6j7ke8u • Not everyone wants data published, eg, failed clinical trials • Balance needed between research needs and private lives, eg, • The Immortal Life of Henrietta, Rebecca Skloot • k-anonymity, mediation approaches (Brandizi et al, 2017, https://doi.org/10.1186/s12911-017-0424-6)
  25. 25. Issues: Data are Power http://www.tylervigen.com/spurious-correlations
  26. 26. Issues: Data are Power • My son was a typically developing toddler. ... He received his first MMR at 19 months of age. The change in him was almost immediate. He did not regress in development, but his social skills became extremely compromised. Noises became unbearable... MMR vaccine caused my son's autism, https://tinyurl.com/y2udlfcb It's sad, but it's a spurious correlation, vaccines do not cause autism
  27. 27. Issues: are We in Control? https://www.nature.com/articles/d41586-020-01874-9 https://tinyurl.com/yxay8w2j https://www.bbc.com/news/business-42959755 https://tinyurl.com/ydykjugt https://tinyurl.com/hu3lh32
  28. 28. And Which Control? https://tinyurl.com/y2yjrkpa https://tinyurl.com/y82zf8qu https://www.youtube.com/watch?v=ciBLsJkQ1WY
  29. 29. So... • Future is even more digital • And even more data-intensive • Everyone should at least have an idea • Especially if you want to become a scientist • About producing data (eg, FAIR, formats, standards) • And consuming data (eg, data resources, Graph DB query languages) • And more (eg, Python, Pandas, Graph DBs, APIs)https://tinyurl.com/y5rdq7qx
  30. 30. So... • Probably we need better management and (a bit of, international) regulation • of technical aspects (eg, PA standards, research data publishing) • of ethical aspects (eg, open access, algorithms, censorships) • But also more grassroots participation • we are all responsible, especially as scientists • Data science is cool! https://tinyurl.com/y5rdq7qx
  31. 31. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 Knetminer Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist
  32. 32. Acknowledgements Ajit Singh
 Software Engineer • Joseph Hearnshaw, software engineer • Samiul Haque, Ed Eyles, IT admins • Alice Minotto, Earlham Inst, hosting providers • William Brown, Ricardo Gregorio, IT admins • Monika Mistry, master Student, Data Curator • Sandeep Amberkar, bioinformatician, data curator • Madhu Donepudi, Richard Holland, ext contractors, developers Keywan Hassani-Pak
 KnetMiner Team Leader Chris Rawlings
 Head of Computational & Analytical Sciences Jeremy Parsons
 Bioinformatics Scientist AndYou!
  33. 33. Extras
  34. 34. The Cause for Open Data/Knowledge • Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control (https://en.wikipedia.org/wiki/Open_data) • Popularised by Obama in 2009 [1], Hans Rosling [3], Tim Berners Lee [2] (recommended readings/watches) • [1] https://www.govtech.com/data/What-Obama-Did-for-Tech-Transparency-and-Open-Data.html • [2] https://www.ted.com/talks/tim_berners_lee_the_next_web?language=en • [3] https://www.ted.com/talks/hans_rosling_the_best_stats_you_ve_ever_seen
  35. 35. IBM Watson • Not the first time that AI passed the Turing test (eg, Deep Blue and Chess, 1996) • But big milestone (in 2011) about knowledge management • Specialisations possible, e.g., IBM Watson Health Mini documentary at https://www.youtube.com/watch?v=P18EdAKuC1U
  36. 36. Surprising Data Insights • Couples who argue often are more likely to last long (90% accuracy) • If you want such a life... • Many other examples of surprising data: 9 Bizarre and Surprising Insights from Data Science (https://tinyurl.com/yywgr2rv) https://www.businessinsider.com/mathematical-secret-to-lasting-relationships-2015-6
  37. 37. Issues: Data are Power Source and recommended read: https://theconversation.com/five-maps-that-will-change-how-you-see-the-world-74967

×