SlideShare a Scribd company logo
1 of 94
The Seven Deadly Sins of Bioinformatics Professor Carole Goble [email_address] The University of Manchester, UK The myGrid project OMII-UK
Roadmap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Intractable Problems in Bioinformatics. Have we sinned? Are these part of the intractable problem?
The traditional sins…. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://en.wikipedia.org/wiki/Seven_deadly_sins [Stevens and Lord]
Methodology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
I am grateful to… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
They came up with more than seven. But I beat them into submission. Many are highly inter-related. Hopefully they are all too familiar.
Sins ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],Sin 1
Reinvention ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Comparative Genomics? Tisk! Its Comparative Bioinformatics Bioinformatics is about mapping one schema to another, one format to another, one id scheme to another. What a waste of time.  What a handy distraction from doing some Real Science™.
Names and Identity Crisis Q92983 O00275 O00276 O00277 O00278 O00279 O00280 O14865 O14866 P78507 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Q93038 = Tumor necrosis factor receptor superfamily member 25 precursor  P78515 Q93036  Q93037  Q99722  Q99830  Q99831  Q9BY86  Q9UME0  Q9UME1  Q9UME5 Annotation history:  http://www.expasy.org/uniprot/Q93038
Andy Law's Third Law ,[object Object],http://bioinformatics.roslin.ac.uk/lawslaws.html
The Selfish Scientist ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some causes of the Identity Crisis ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Pocock]
Id Reinvention ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],urn:lsid:uniprot.org:{db}:{id}     http:// purl.uniprot.org /{db }/{id}
Andy Law’s First (Format) Law ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://bioinformatics.roslin.ac.uk/lawslaws.html
[object Object],[object Object],[object Object]
Reinvention of Ontology tools ,[object Object],[object Object],The Montagues and The Capulets.. Let me get my bullet-proof vest …
The “Oh No” OBO Pragmatists Aesthetics Philosophers Life  Scientists Capulets Knowledge Representation Montagues A means to an end Content providers Theoreticians The end Mechanism providers Spiritual guides The Montagues and The Capulets …SOFG 2004, KCap 2005, Comparative and Functional Genomics  2004 Endurants, Perdurants, Being, Substance, Event
Yet another database … ,[object Object],[object Object],[object Object],FlyBase, WormBase, SGD, BeeBase and many other large and small community databases
BioBabel ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Integration ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
Any more ? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Reuse Rocks. Collaboration through  workflow and web services ,[object Object],[object Object],[object Object],[object Object]
Recycling, Reuse, Repurposing ,[object Object],[object Object],[object Object]
Warning! Reuse is Hard ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bullying and the Borg ,[object Object],[object Object],[object Object],[object Object]
Reinvention or Invention? Pre-dating ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A few months in the laboratory (or the computer) can save a few hours in the library (or on Google). Westheimer's Law (with additions).
No tool is an island… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
I know what it means... ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ AI limericks” by Henry Kautz http:// www.cs.washington.edu/homes/kautz/misc/limericks.html
Not just bioinformatics  Computer Science is Guilty!
Why don’t biologists modularise OWL ontologies properly? Er, well, like how should we do it “properly” and where are the tools to help us? We don’t know and we haven’t got any. But here are some vague guidelines.  W3C Semantic Web for Life Sciences mailing list, 2005
“ I don't blame them [MGED/PSI community] because to truly comprehend RDF/OWL is not an easy task, it takes not just the understand of technology itself but more so the vision on how things should and can work in SW.” “ One thing we have to remember is that biologists are building ontologies to do a job of work. They are not produced as some end of CS or SW research” “ Principles are all well and good, but we should know from decades of software engineering that saying "do it properly" isn't a solution. We need tooling and methodologies that do not in themselves hinder a domain specialist. In many cases it is easier to re-develop than re-use or even cut-and-paste from an existing ontology than it is to muck around “doing it properly”” “ There is actually a gap between the view of ontology for CS people and for biological people. The ontology in biologist's eyes are more of a treaty than logical representation, that in CS view is on the reverse of that view. It needs dialog to bring the view to a middle ground and mechanisms to stretch to both directions.”
Standards are boring (but important) ,[object Object],[object Object],[object Object],[object Object]
Self promotion ,[object Object],[object Object],[object Object],[object Object],Not all software and databases are equal.
Research – Production Confusion ,[object Object],[object Object],[object Object],[object Object]
Trust I don’t trust your code I don’t trust your data I don’t trust you will still be around in 1 year
Sin 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Biologist exceptionalism ,[object Object],[object Object],I’m different. We are all individuals.
Biological exceptionalism ,[object Object],[object Object],[object Object],[object Object],[object Object]
We are so much more complex… ,[object Object],[object Object],[object Object]
Other Sciences…. ,[object Object],[object Object],[object Object],[object Object]
Biology Exceptionalism ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sin 3 ,[object Object],[object Object],[object Object],[object Object]
Autonomy is death! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lincoln Stein said a while ago… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],… and he could say it again today.
Law's Second Law ,[object Object]
Workflow commodities ,[object Object],[object Object],[object Object],[object Object],[object Object]
The myGrid Semantic Sweatshop ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Semantic
The myGrid Semantic Sweatshop  notice how tired they look Franck Tanoh Katy Wolstencroft
Churn, Churn, Churn ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Churn, Churn, Churn ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sin 4 ,[object Object],[object Object],[object Object],[object Object],[object Object]
I know it all. ,[object Object],[object Object],[object Object],[object Object],[object Object],And what would you suggest, Mr. Smartie Pants?
Think like me!  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Misunderstanding and disrespecting users
A good User Experience outweighs smart features. Can I use it?  Is the user interface familiar? Does it fit with my needs?
Gain-Pain pay-off ,[object Object],Gain Pain Very BAD Good, but Unlikely Just right
Sin 5 ,[object Object],[object Object],[object Object],[object Object]
More, more, more! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Cameron]
The trouble with warehouses ,[object Object],[object Object],[object Object],[object Object],[object Object]
More More More  ,[object Object],[object Object],[object Object],[object Object]
Mash-Up Data Marshalling ,[object Object],[object Object],[object Object],[object Object],Mash Up Application User interface Protocol objects Protocol Protocol
Distributed Annotation System Mash-Up  http://www.biodas.org Reference Server AC003027 AC005122 M10154 Annotation Server Annotation Server AC003027 M10154 WI1029 AFM820 AFM1126 WI443 AC005122 Annotation Server
Sin 6 ,[object Object],[object Object],[object Object],[object Object],[object Object]
Ennui ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Its black and white ,[object Object],[object Object],[object Object],[object Object],[object Object]
Quality Delusions ,[object Object],[object Object],[object Object],[object Object]
Quality Delusions ,[object Object],[object Object],[object Object],[object Object]
Black Box Science ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“ No experiment is reproducible.”  Wyszowski's Law “ An experiment is reproducible until another laboratory tries to repeat it.”  Alexander Kohn
Sin 7 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.CartoonStock.com  .
Hackery ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“ I am sure one could reuse large parts of re-annotation for building transcriptome maps, if they only used workflows and ontologies”.   Marco Roos A Biologist and Bioinformatician VL-e Project, Amsterdam
“ Bioinformaticians have reached the standards of the 1980s, while computer scientists are working on the standards of the 2020s, leaving roughly 40 years to bridge.   Marco Roos A Biologist and Bioinformatician VL-e Project, Amsterdam
Blind faith in XML  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],XML
Blind Faith in Foo. ,[object Object],[object Object],[object Object],[object Object]
Pioneering development methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open Source Blinkers ,[object Object],[object Object],[object Object]
Sin Summary Maybe only one “original sin” in bioinformatics. Parochialism and Insularity Exceptionalism Autonomy or death! Vanity: Pride and Narcissism Monolith Meglomania   Scientific method Sloth Instant Gratification Reinvention Churn
Can we become less sinful?  Why do these sins exist? Are bioinformaticians particularly naughty? No naughtier than Computer Scientists. And its all very hard. Though they are naughty…
Why? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Luddism? Surely not! ,[object Object],[object Object],[object Object],[object Object],[Stevens]
Research – Production Confusion ,[object Object],[object Object],[object Object],[object Object]
Practical Steps? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FaceBook & Bazaar for  Workflow e-Scientists myexperiment.org Trials start  August 2007!
Delivery Bulge
Practical Steps for IT Platforms? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Practical Steps? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web 2.0 Design Patterns ,[object Object],26/2/2007  |  myExperiment  |  Slide  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Practical Steps? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Final Word Sin writes histories, goodness is silent.     Thomas Fuller

More Related Content

What's hot

Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the SingularityMark Wilkinson
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! TheContentMine
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Paolo Missier
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Chris Mungall
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 

What's hot (20)

Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika! ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...Experiences in the biosciences with the open biological ontologies foundry an...
Experiences in the biosciences with the open biological ontologies foundry an...
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019Ontology Development Kit: Bio-Ontologies 2019
Ontology Development Kit: Bio-Ontologies 2019
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 

Similar to The seven-deadly-sins-of-bioinformatics3960

Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biologyrobertstevens65
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Ontology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsOntology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsJie Bao
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012Mark Wilkinson
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10PICNIC Festival
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
Emerging Forms of Data and Analytics
Emerging Forms of Data and AnalyticsEmerging Forms of Data and Analytics
Emerging Forms of Data and AnalyticsDavid De Roure
 
download
downloaddownload
downloadbutest
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
myExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesmyExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesDavid De Roure
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 

Similar to The seven-deadly-sins-of-bioinformatics3960 (20)

Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Ontology - and Reloaded and Revolutions
Ontology - and Reloaded and RevolutionsOntology - and Reloaded and Revolutions
Ontology - and Reloaded and Revolutions
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Demo Presentation Wageningen Text Mining Workshop 2007
Demo Presentation Wageningen Text Mining Workshop 2007Demo Presentation Wageningen Text Mining Workshop 2007
Demo Presentation Wageningen Text Mining Workshop 2007
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Emerging Forms of Data and Analytics
Emerging Forms of Data and AnalyticsEmerging Forms of Data and Analytics
Emerging Forms of Data and Analytics
 
download
downloaddownload
download
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
myExperiment and the Rise of Social Machines
myExperiment and the Rise of Social MachinesmyExperiment and the Rise of Social Machines
myExperiment and the Rise of Social Machines
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Recently uploaded

Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowNehru place Escorts
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...narwatsonia7
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Modelssonalikaur4
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 

Recently uploaded (20)

Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowKolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Kolkata Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
Housewife Call Girls Bangalore - Call 7001305949 Rs-3500 with A/C Room Cash o...
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking ModelsMumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
Mumbai Call Girls Service 9910780858 Real Russian Girls Looking Models
 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 

The seven-deadly-sins-of-bioinformatics3960

  • 1. The Seven Deadly Sins of Bioinformatics Professor Carole Goble [email_address] The University of Manchester, UK The myGrid project OMII-UK
  • 2.
  • 3. Intractable Problems in Bioinformatics. Have we sinned? Are these part of the intractable problem?
  • 4.
  • 5.
  • 6.
  • 7. They came up with more than seven. But I beat them into submission. Many are highly inter-related. Hopefully they are all too familiar.
  • 8.
  • 9.
  • 10.
  • 11. Comparative Genomics? Tisk! Its Comparative Bioinformatics Bioinformatics is about mapping one schema to another, one format to another, one id scheme to another. What a waste of time. What a handy distraction from doing some Real Science™.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. The “Oh No” OBO Pragmatists Aesthetics Philosophers Life Scientists Capulets Knowledge Representation Montagues A means to an end Content providers Theoreticians The end Mechanism providers Spiritual guides The Montagues and The Capulets …SOFG 2004, KCap 2005, Comparative and Functional Genomics 2004 Endurants, Perdurants, Being, Substance, Event
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. A few months in the laboratory (or the computer) can save a few hours in the library (or on Google). Westheimer's Law (with additions).
  • 32.
  • 33.
  • 34. Not just bioinformatics Computer Science is Guilty!
  • 35. Why don’t biologists modularise OWL ontologies properly? Er, well, like how should we do it “properly” and where are the tools to help us? We don’t know and we haven’t got any. But here are some vague guidelines. W3C Semantic Web for Life Sciences mailing list, 2005
  • 36. “ I don't blame them [MGED/PSI community] because to truly comprehend RDF/OWL is not an easy task, it takes not just the understand of technology itself but more so the vision on how things should and can work in SW.” “ One thing we have to remember is that biologists are building ontologies to do a job of work. They are not produced as some end of CS or SW research” “ Principles are all well and good, but we should know from decades of software engineering that saying "do it properly" isn't a solution. We need tooling and methodologies that do not in themselves hinder a domain specialist. In many cases it is easier to re-develop than re-use or even cut-and-paste from an existing ontology than it is to muck around “doing it properly”” “ There is actually a gap between the view of ontology for CS people and for biological people. The ontology in biologist's eyes are more of a treaty than logical representation, that in CS view is on the reverse of that view. It needs dialog to bring the view to a middle ground and mechanisms to stretch to both directions.”
  • 37.
  • 38.
  • 39.
  • 40. Trust I don’t trust your code I don’t trust your data I don’t trust you will still be around in 1 year
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. The myGrid Semantic Sweatshop notice how tired they look Franck Tanoh Katy Wolstencroft
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. A good User Experience outweighs smart features. Can I use it? Is the user interface familiar? Does it fit with my needs?
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66. Distributed Annotation System Mash-Up http://www.biodas.org Reference Server AC003027 AC005122 M10154 Annotation Server Annotation Server AC003027 M10154 WI1029 AFM820 AFM1126 WI443 AC005122 Annotation Server
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73. “ No experiment is reproducible.” Wyszowski's Law “ An experiment is reproducible until another laboratory tries to repeat it.” Alexander Kohn
  • 74.
  • 75.
  • 76. “ I am sure one could reuse large parts of re-annotation for building transcriptome maps, if they only used workflows and ontologies”. Marco Roos A Biologist and Bioinformatician VL-e Project, Amsterdam
  • 77. “ Bioinformaticians have reached the standards of the 1980s, while computer scientists are working on the standards of the 2020s, leaving roughly 40 years to bridge. Marco Roos A Biologist and Bioinformatician VL-e Project, Amsterdam
  • 78.
  • 79.
  • 80.
  • 81.
  • 82. Sin Summary Maybe only one “original sin” in bioinformatics. Parochialism and Insularity Exceptionalism Autonomy or death! Vanity: Pride and Narcissism Monolith Meglomania Scientific method Sloth Instant Gratification Reinvention Churn
  • 83. Can we become less sinful? Why do these sins exist? Are bioinformaticians particularly naughty? No naughtier than Computer Scientists. And its all very hard. Though they are naughty…
  • 84.
  • 85.
  • 86.
  • 87.
  • 88. FaceBook & Bazaar for Workflow e-Scientists myexperiment.org Trials start August 2007!
  • 90.
  • 91.
  • 92.
  • 93.
  • 94. The Final Word Sin writes histories, goodness is silent.   Thomas Fuller

Editor's Notes

  1. Ide
  2. Identity Stability Social Technical
  3. Not sure these all apply So we asked some people
  4. An impression from all our panelists from all the papers and application notes they have rejected … Pride! and Sloth? Envy? Insularity. Even though it means more work in the end. 1. creating yet another identity scheme (identity crisis) 2. creating yet another representation mechanism for data (profusion of file formats) 30 different syntaxes for representing DNA / RNA and protein sequences
  5. How can the semantic web help? numerous identity schemes for identifying proteins, metabolites, genes etc, do we really need any more?
  6. Competitive advantage VO forming; sharing e-Science ideals; May refusing to move data off her disk and copywriting her workflows Collaborate when it is necessary in order to gain … competitive advantage. Sharing on HER terms – May’s workflows/ Scientists share because They are compelled to (funding agencies, economies of scale, projects, the nature of the problem, it is the nature of the community) It is in their best interest There are rewards.
  7. W3C Semantic Web Health Care and Life Sciences Interest Group identity wars Life Science Identifer vs URLs vs PURLs, Web Services vs REST services.
  8. You could argue that OBO-edit is reinventing Protege badly. But make sure you are wearing your bullet proof vest. Some people have argued that LSID reinvents HTTP and DNS badly. "Data Warehouse? More like Data Mortuary” Anon You can quote Usamma Fayyad from Yahoo! Research! Laboratories! on what they call "Data Tombs" "Our ability to capture and store data far outpaces our ability to process and exploit it.This growing challenge has produced a phenomenon we call the data tombs, or data stores that are effectively write-only; data is deposited to merely rest in peace, since in all likelihood it will never be accessed again. Data tombs also represent missed opportunities." See communications of the ACM: http:// portal.acm.org/citation.cfm?doid =545151.545174 Still with sin 1: EMBOSS lists more than 20 DIFFERENT SEQUENCE FORMATS !!! at http:// emboss.sourceforge.net/docs/themes/SequenceFormats.html
  9. GMOD is the a collection of software tools for creating and managing genome-scale biological databases. You can use it to create a small laboratory database of genome annotations, or a large web-accessible community database. GMOD tools are in use at FlyBase, WormBase, SGD, BeeBase and many other large and small community databases.
  10. Or multiple seq
  11. Picture of workflow
  12. Come to think of it, I am quite sure many people reinvent wheels in creating 'Transcriptional Units' ('genes' derived from ESTs and mRNA), within species, but certainly between species. I think this holds for many genome assembly related stuff: I also doubt whether genome data compilers for E. coli, Drosophila, Plant species, etcetera reuse each other's code. In most cases something new is added, but large parts could have been reused. I should look at some bioinformatics publications for more examples, but also have to prepare our own ISMB demonstration. Why can't time be reinvented? And better this time! To give a recent counter example of our own: text miners generally require synonyms and probably reinvent the wheel to get them in many cases. We recently reached 'instant collaboration' with Martijn Schuemie from Rotterdam through a web service that discloses their protein synonym data. He made that especially after seeing our poster that showed a workflow with our web services: 'collaboration through workflow'. Within VL-e we are now even exchanging services and (sub)workflows with food scientists. Web services make that very easy, although I see that creating web services is still a bottleneck. For quick solutions it is still seen as too much extra trouble. We intend to make Martijn's service part of our ISMB demonstration (on Tuesday 24, after you left  :'( ). Tomorrow I may come up with more when I have a look at your presentation (and find the time for it). Troubles with broken networks at home and at my provider (what are the odds?  :'(   ) prevent me from doing that now (I hope this e-mail goes anywhere).
  13. He made that especially after seeing our poster that showed a workflow with our web services: 'collaboration through workflow'. Within VL-e we are now even exchanging services and (sub)workflows with food scientists. Web services make that very easy, although I see that creating web services is still a bottleneck. For quick solutions it is still seen as too much extra trouble. We intend to make Martijn's service part of our ISMB demonstration (on Tuesday 24, after you left  :'( ).
  14. Confirmed by the biologists Worm Lady's name is Joanne Pennock and as far as I know she works for Prof. Richard K.Grencis. Description Trichuris muris - the mouse whipworm is a useful parasite model of the human parasite - Trichuris trichuria . Whipworms derive their name from their characteristic morphology. Adults occupy the large intestine with their anterior ends embedded in the cells lining the intestine. Transmission occurs by ingestion of contaminated material. Jo didn’t know about the tools; she didn’t know how to do it properly. REUSE Identified sex-dependant biological pathways involved in mouse model. The correlation of sex depandance and the ability of mice to expel the parasite had previously been hypothesised, however, had not been verified using conventional manual analysis techniques.
  15. A kind of exceptionalism and reinvention?
  16. Quicker to build it than find it? Quicker to build it than adapt or reuse something else? – designing reusable stuff is HARD.
  17. Interfaces to things
  18. Yeah? Semantics and formalisms matter 11,800
  19. Modularisation is important tHE RECENT EXCHANGE OF THE swls EMAIL LIST WAS GREAT. "WHY DON'T BIOLOGISTS DO IT PROPERLY?". "THEY DON'T DO IT PROPERLY BECAUSE sw PEOPLE DON'T KNOW HOW TO DO IT PROPERLY EITHER.aLSO YOU DON'T GIVE US MUCH IN THE WAY OF TOOLS...." THIS WAS  ALL ABOUT MODULARISING OWL ONTOLOGIES -- WE DON'T KNOW THE SEMANTICS; THERE ARE NO TOOLS; AND ALL THAT WAS ON OFFER WERE SOME VAGUE GUIDELINES AND THE INJUNCTION TO DO IT PROPERLY. "THERE ARE NO PROPER ONTOOGIES IN BIOLOGY" -- THAT IS, YOU DON'T MAKE ANY THAT USE ALL THE FEATURES OF OWL WE'VE INVENTED.... IT IS ALL SUMMED UP BY OBSERVING THAT THE AGENDA OF SW TECNOLOGISTS AND BIOOGISTS ARE NOT THE SAME. sw AT MOST, IS ONLY A MEANS TO AN END FOR BIOLGOISTS, BUT AN END IN ITSELF FOR sw TECHIES.
  20. One-off, roll your owns Nature contacted 89 databases listed in the Molecular Biology Database Collection (Nucl. Acids Res.28 1−7; 2000) to see how many still have funding five years on. Of these, 51 reported that they are struggling financially. Seven of these have closed; the rest are being updated sporadically in their owners' spare time. (Zeeya Merali and Jim Giles Nature 435, 1010-1011 (23 June 2005) doi: 10.1038/4351010a ) Publication and career driven: easier to get a paper or a promotion by building your own thing. We are to blame too!
  21. Oh, the only other thing is that I think some of the sins are caused when research outputs are confused with production products. You requires standards in the latter. You require bushyness in the former. However, neither the funding nore the social structures of bioinformatics allow us to treat these two differently in any principled manner - after all, how do you get funding for production sw other than claiming to be researching stuff? How do you get a publication out of a bit of research sw without claiming a potential user-base?
  22. Added after the talk.
  23. A cause of don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
  24. It would be better if I wrote the script I need so I know what it does, how it does it and how to modify it later because I haven’t specified what it was supposed to do in the first place don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
  25. don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
  26. This is linked to pride
  27. When Ensembl was getting going, they had the CERN people over to talk about managing schema change over time. CERN showed some realy nice UML meta-modeling stuff that allows them to migrate models over time without loosing data. Ewan sent them back to Europe because genes can have more than one transcript which can in turn re-use exons (in the Ensembl data model). The CERN people couldn't see how that was relevant to managing changing data models, but Ewan kept saying "Our data models are complicated - I don't think specifying them will help. We need to understand them instead." Of course, this was a few years ago and my memory is a little hazy.
  28. don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
  29. Autonomy and death: Biojava suffered from this over the first 2 releases. We hadn't worked out how to provide stable interfaces to unstable implementations back then, so each minor release tended to break end-user code. And they
  30. Do you understand crimap’s error messages?
  31. Scientist perspective for finding. Machinery perspective for validation. Readable and processable in OWL and RDF Readable and processable in OWL and RDF
  32. The Ensembl relational schema alters regularly. Often, it's because they are 'fixing' column naming that wasn't done according to their standards in the first place. Sometimes it is to add/remove fields. Since the perl API sits directly on this, usually the APIs change to track. May be different now, but they didn't used to provide any backwards compattibility glue. http://www.purl.org/ As an example for your 'Churn' slide: when I look for web services with Google I find mostly pages /about/ web services and how things should be approached, rather than actual web services (things are different when you include filetype:wsdl ). Another example may be related to the recent URI discussion on HCLS (that I didn't read yet): I think what Andy and I have been doing with upper ontologies is quite relevant, but I feel we are still in the middle of gaining experience with what is available. W3C Semantic Web Health Care and Life Sciences Interest Group identity wars Life Science Identifer vs URLs vs PURLs, Web Services vs REST services. Impact on everyone else who uses the previous mechanism. A few voices, very loud, vested interest, for their application, win. You know what? Why don’t we stick with something for a while and rally behind it? Or at least figure out the cost of change. Join the debate.
  33. The Ensembl relational schema alters regularly. Often, it's because they are 'fixing' column naming that wasn't done according to their standards in the first place. Sometimes it is to add/remove fields. Since the perl API sits directly on this, usually the APIs change to track. May be different now, but they didn't used to provide any backwards compattibility glue. http://www.purl.org/ As an example for your 'Churn' slide: when I look for web services with Google I find mostly pages /about/ web services and how things should be approached, rather than actual web services (things are different when you include filetype:wsdl ). Another example may be related to the recent URI discussion on HCLS (that I didn't read yet): I think what Andy and I have been doing with upper ontologies is quite relevant, but I feel we are still in the middle of gaining experience with what is available. W3C Semantic Web Health Care and Life Sciences Interest Group identity wars Life Science Identifer vs URLs vs PURLs, Web Services vs REST services. Impact on everyone else who uses the previous mechanism. A few voices, very loud, vested interest, for their application, win. You know what? Why don’t we stick with something for a while and rally behind it? Or at least figure out the cost of change. Join the debate.
  34. Picture.
  35. Thinking you are the user. Suits me.
  36. Added after the talk.
  37. Added after the talk in response to discussions.
  38. Find the natural lines of cleavage which minimise the number of “connections” Standardise the connections Under More, More, More, you may want to also mention end-user apps/libraries that try to be the 'emax' of bioinformatics. Not so much of a thing now, but there was a phaze of providing bioinformatics workbenches that had loads of crap bundled in, none of it kept up to date, none of it propperly integrated.
  39. Nobody uses my warehouse. http://research.microsoft.com/towards2020science/ You can quote Usamma Fayyad from Yahoo! Research! Laboratories! on what they call "Data Tombs" See communications of the ACM: http:// portal.acm.org/citation.cfm?doid =545151.545174
  40. no clue of testing during software development differentially expressed genes in microarray analyses. protein identifications using Mascot scores. there's another one like this - if a group is working in a field, you get shouted at for trying out something different - esp happens arround anything that covers the same space as the OBO crowd. Often, you are actually doing something different, but because you use some words in common... Comes out as "Why do this? It's already been solved by Foo - the massively unwieldy, slow-moving, monolythic, meeting paralized international effort for Things Mentioning Foo“
  41. (translated embl) Lets fix the quality.
  42. (translated embl) Lets fix the quality.
  43. UniGene is a good example of irreproducibility I think; at least it was a short two years ago when I looked into it. I asked the creators for a model or flow-chart to learn exactly what is happening during UniGene clustering, but they couldn't give me such. It doesn't seem to exist. 'Human' descriptions of what is done are available (via NCBI), but this is not exact. I was involved in a project that basically reclustered UniGene (leading to the Human Transcriptome Map), and I know many microarray analysts put a lot of efforts in re-annotating their clones using genome databases. (Btw I am sure one could reuse large parts of re-annotation for building transcriptome maps, if they only used workflows and ontologies.) Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location
  44. --
  45. All kinds of hackery Instant gratificatin
  46. Blind faith in ...: I've seen this with nearly every technology going. There's a new thing to use, we don't understand it yet, so it sucks up all the stuff we already know we don't understand leaving us with a system either side of it free from problems. Lack of appreciation about exactly what the new tech addresses *in itself* before trying to make it work *for us* .
  47. Conflicts with reinventing.
  48. There is hacking and HACKING
  49. Immaturity Build then think. Understanding the problem. But you never will.
  50. A sin set
  51. Why its very, very good: Lots of features for project management, file sharing, charting progress, recording “actions” Web based tool, designed for people split between many locations. Why there was little uptake Because we are naughty Because it took time to learn how to use it, so we all thought “OK, OK, I’ll do that later” Because it had jargon / language which we would have to learn and understand how each concept relates to our project Because it is a pre-designed recipe which might not fit the way we already work Because the system was particularly slow from Nairobi (possibly the slowness was the “authentication” step – we didn’t solve it, but maybe could have.) None of this reflects on Basecamp – it is a widely used tool which fits the needs of multi-site projects – perhaps we underestimated the “activation energy” needed to get this working. It is a solution which might have worked.
  52. Experimental object – related to the caData – in the wild. myExperiment makes it really easy for the next generation of scientists to contribute to a pool of scientific workflows, build communities and form relationships. myExperiment enables scientists to share, re-use and repurpose workflows and reduce time-to-experiment, share expertise and avoid reinvention. Their kids may have got there first but scientists will soon have their very own version of MySpace, where they will be able to share preliminary results, ideas and research tools. — New Scientist Tech , October 2006.   myExperiment introduces the concept of a workflow bazaar; a collaborative environment where scientists can safely publish their creations, share them with a wider group and find the workflows of others. Workflows can now be swapped, sorted and searched like photos and videos on the web. myExperiment is a Virtual Research Environment which makes it easy for people to share experiments and discuss them. We are currently working with our users to determine exactly how they want this site to work. We had a user meeting at the end of September 2006 to brainstorm myExperiment, and you can read some of the results from this meeting at our portal party wiki . Currently, a lightweight repository of workflows and the Taverna BioService Finder are available. Scientists should be able to swap workflows and publications as easily as citizens can share documents, photos and videos on the Web. myExperiment owes far more to social networking websites such as MySpace and YouTube than to the traditional portals of Grid computing, and is immediately familiar to the new generation of scientists. The myExperiment provides a personalised environment which enables users to share, re-use and repurpose experiments - reducing time-to-experiment. We expect to start with focused pilot myExperiment portals based upon case studies for the specific areas of Astronomy , Bioinformatics , Chemistry and Social Science .
  53. Add bernardo. Do not dis-stain the mundane! The delivery bulge Cost of really making this work. The cost had better be worth it And not just the cost of money but people and commitment So we had better be tackling the right bit of the problem. Papers do not equal usable systems. The devil is in the detail. Practicalities override Niceties. Who are your users? This is just for semantic web service provision. Put in pinar, software engineers, chris wroe, phil lord, mark wilkinson as a service provider. Each despises the other.
  54. Back to Basics But building for other people. Sandy Carter agility of solutions. Making the service to the business process.
  55. thE END OF THE BLACK BOX
  56. Workflows
  57. The only difference between the saint and the sinner is that every saint has a past, and every sinner has a future. Author:  Oscar Wilde Source:  None