SlideShare a Scribd company logo
1 of 23
You thought taxonomies were hard?
How to create a taxonomy
for “management buy-in"
Mary Chitty MSLS
mchitty@healthtech.com
https://www.cambridgeinnovationinstitute.com/
http://www.healthtech.com/
1
London, Oct 15, 2019
Cambridge
Healthtech
Needham MA USA
Where I’m coming from
1992
2000
2006-
2014
2016
2018
2019 Taxonomy now 1,600+ terms and
growing. Company is now mid-size.
2018 Signed contract with OntoForce
to use Disqover search software.
https://www.ontoforce.com/
Acquired Artificial Intelligence
and Internet of Things
companies. Hired several data
scientists.
Began informal
collaboration with
OntoForce, Belgian
semantic search
engine company.
Taxonomies &
Ontologies:
Many useful
applications
• E-commerce
• Data visualization
• Search, Semantic search and Search Engine
Optimization SEO
• Statistical analysis
• Text mining
Ontologies
are complex
Simpler is
sometimes
better
• “Ontologies offer advantages over other knowledge
systems—they enable both computational use and
human understanding, they can …include rich
vocabularies of labels, synonyms, and textual
definitions. If these are desirable selection criteria, then
an ontology should be considered. ”
• “Ontologies do also come with computational
overheads, however, and can be complex to understand.
Other resources such as a vocabulary do not offer the
sorts of classification and rich computational
descriptions of an ontology but are often much simpler
to understand. Let your requirements guide you;
ontologies are not a panacea—sometimes
one isn’t needed at all. ”
• Malone 2016
Value adds of taxonomies/ontologies:
Interoperability
Reproducibility
Data harmonization especially of named entities
Identification of data errors or inconsistencies
Search/data navigation improvements
Collaborative filtering/recommendation engines
Validation of correlations to examine possible causalities
Enabling previously unaskable questions/use cases
Information overload
Complexity
Data integration
Legacy data
Ambiguous and inconsistent data
Missing or unfindable data
Scaling up data processes
Sustainable maintenance – perhaps the biggest challenge of all!
Ongoing Challenges:
Primary challenges are as much cultural as technological
Life sciences challenges include:
Relatively sparse data compared to other domains such as financial
Highly dimensional data with many variables (complex to chaotic)
Inherently noisy biological data. (Increasingly studied at the single cell or
gene expression level).
Data on longitudinal health outcomes limited by HIPAA & other privacy
regulations, but crucial for evidence based medicine validation.
In our era of big data, the irony is we don’t have enough readily usable life
sciences data.
Reproduci
bility
challenges
• More than 70% of researchers have tried but
failed to reproduce experiments. More than
half failed to reproduce their own
experiments. Baker, Nature 2016
• “replication alone will get us only so far (and)
might actually make matters worse… an
essential protection against flawed ideas … is
the strategic use of multiple approaches to
address one question. Each approach has its
own unrelated assumptions, strengths and
weaknesses. Results that agree across different
methodologies are less likely to be artefacts”.
Wikipedia “replication crisis”
Cost benefit
Return On
Investment
ROI
Educating decision makers is an ongoing process, even
with CXOs who value taxonomies/ontologies.
But stakeholders are often skeptical about investing in
taxonomies or ontologies.
Minimum of 10.2 billion Euros per year.
Cost of not having FAIR research data:
Semantic
web people
have talked
about
terminology
challenges
for decades
2001
"The first layer of the semantic Web consists of ontologies and
taxonomies ... A huge amount of this is being done very desperately
in the realm of biotech, for the human genome and new drug
development.”
Tim Berners Lee, August 30, 2001 keynote at Software
Development East in Boston.
2005
“Semantics is fundamentally not an information technologies issue
…it originates out of the need for groups of individuals to work
together towards common goals … must agree upon a set of
meanings around terminologies, concepts, relations and actions …
a lot of confusion arises before people realise whether they are
talking about the same or different things."
Eric Neumann, Applying the semantic web to drug discovery and
development. Drug Discovery World Fall 2005
Business
cases for
taxonomies
or
ontologies
is not a new
problem
“a month in the lab can often save an hour in the
library” Attributed to chemist Frank Westheimer
1979 interview
“Institutions either underestimate the resources
needed to do this work , or they are daunted by
the entire prospect ... Honestly, very little data
will ever be reused. ” personal communication,
Juliane Schneider, eagle-i, Harvard Catalyst
Best Practical Advice I’ve come across
“One of my mantras is always start small. Show some win in some
small domain. Don’t under any circumstance start with saying I’ll just build you this
enormous ontology for the next two years …then your world will be better”. .. Just
say I’m going to build this tiny little ontology and enable this small application over here
… Always making sure my small ontology is enabling the small win.”
“Question: how do you encourage semantic modeling? Answer: First I compliment them,
and say what you’ve done is a great starting point – because they have actually started …
I try to find a couple of structured retrieval applications that they really want to do but
their current markup is not allowing ….find two compelling examples … make sure
that we’ve got a deliverable in a month or a short period of time where they can do the
one trial thing that adds value. Kind of get them on the slippery slope so that they’re’ the
owner and they want to do it themselves.” Deborah McGuinness keynote speech 2004
Taxonomy Use Cases
Amazon I spoke at Taxonomy Boot Camp
2017 in Washington DC and learned that
Amazon has taxonomists on 24 hour
emergency call, for when people can’t find
their products online.
Netflix “[Netflix]paid people to watch films
and tag them with all kinds of metadata.
This process is so sophisticated and precise
that taggers receive a 36-page training
document that teaches them how to rate
movies ...
“[Netflix] even offered a $1 million prize to the team that could design
an algorithm that would improve the company's ability to predict how
many stars users would give movies. It took years to improve the algorithm
by a mere 10 percent. The prize was awarded in 2009, but Netflix never
actually incorporated the new models. ” Madrigal, Atlantic 2014
Best business case for
taxonomy ever?
My Taxonomy Case Studies in-house
database
Map industry verticals to
company specific verticals
Question: Can we use any
existing taxonomies such as
NAICs codes and CrunchBase
to automate? How to
integrate existing in-house
database with newly
acquired company
databases? Work in
progress.
Job title functions and
seniorities for people in
database
Data scientist automated. At
least 80% assigned now.
Customizable for use by
various departments?
Phase 1 just completed. Still
reviewing and fine-tuning.
Job departments for people
in database.
Similar to job titles. Phase 1
just started.
Ontoforce internal data
ingestion
Project to enable improved
access to existing in-house
database. Uncovered
inconsistencies with labels
and tables. Identified data
quality issues to address.
Working on training users,
documenting workflows.
Need to add more
changeability to existing
taxonomy keywords
structure. Starting to see how
trend analysis may be
possible. Work in progress.
My
recommendations
for starting a
project
• Consider a pilot/proof of concept.
• Start small because that will be easier to evaluate
and validate.
Don’t try to “boil the ocean”.
• Choose a variety of data complexity. Think about
degrees of granularity when drafting categories.
• Which categories might you want to aggregate?
• Which related concepts might you want to segment
further? [Phase 2]
• Are there assumptions or implicit biases you might
be making without realizing?
• Solicit feedback from diverse stakeholders as an
ongoing process.
Don’t be surprised while building your pilot project :
Terminology consensus is challenging at best.
Taxonomies, ontologies,
naming, tagging, tables,
models
Many ways to express
these concepts.
“Biologists would rather
share a toothbrush than
gene names” Michael
Ashburner GeneSeer: A
sage for gene names and
genomic resources
"Biologists would rather
share a tooth brush than
data” Carol Goble
“purposely misquoting
Michael
Ashburner” Keynote EGEE
2006
Trying for consensus often
gets very emotional,
challenging – and
confusing.
https://www.go-fair.org/fair-principles/
Findable
•First step in
(re)using data is to
find them.
Metadata and data
should be easy to
find for
both humans and
computers. … an
essential
component of
the FAIRification
process.
Accessible
•Once the user
finds the required
data, she/he
needs to know
how can they be
accessed
Interoperable
•Data usually need
to be integrated
with other data …
need to
interoperate with
applications or
workflows.
Reusable
•Ultimate goal of
FAIR is to optimise
the reuse of
data… metadata
and data should
be well-described
so that they can be
replicated and/or
combined in
different settings.
Keep in mind for management presentation
FAIR data can help
European Commission and US National Institute of Health have allocated considerable resources to making data FAIRer.
Managemen
t may ask
“Why can’t I
just Google
it?”
• Search works best if you know what you’re looking
for exists AND what to call it.
• Taxonomies are more useful if you’re not sure
what you want exists, OR you don’t know what to
call it, AND/OR there are multiple ways to express
concept variations.
• Harness the power of serendipity with
taxonomies. They give people a sense of whether
the “scent of information” is promising.
Success
Factors
Business
case
Manageme
nt:
Look for and document
Cost reductions
Productivity increases
Employee time savings
Added competitive advantages
Sustainability
Risk mitigations
Assemble
Sponsors, champions and/or influencers
Clear executive summary, with KPIs Key Performance
Indicators, milestones, values, costs
1-2 pages Utilize url links if needed
Remember
Align early. Align often.
Ask for feedback – it’s a way of getting buy-in.
Leave room for suggestions.
Be aware of other company initiatives.
But what else needs to
change?
• Data-readiness
“[T]here is a lot of work that needs doing to prepare the data sets for these technologies
… a disproportionate amount being invested in the technologies as opposed to investing
in "data-readiness… It's just not a slam dunk to mash up a lot of data and think it
will work. … The AI solution may help accelerate some tasks, but human expertise may
be required for the broad scope of what is needed. “ Nicholas 2019
• Open Science
"Is any lifetime long enough these days to learn everything needed to get a drug to
market and keep it there? "
• More need than ever to collaborate to share knowledge, especially pre-competitively.
Key takeaways
1. Aim first for quick wins with low hanging fruit.
2. Bundle stakeholders valued wants with items you can expect they will
eventually need.
3. Seek out allies to get shared buy-in for sustainable justification.
4. Pareto Principle 80% of effects come from 20% of effort.
5. Expectations/change management are crucial skills to cultivate.
6. Collect metrics (quantitative/qualitative) to measure progress, so you know
when you’ve made some.
7. Recognize some challenges haven’t been resolved by anyone yet.
Resources to use todayChitty, Mary, Ontologies & Taxonomies glossary & taxonomy, 2019 with 40 plus ontology definitions, 15 taxonomy definitions http://www.genomicglossaries.com/content/ontologies.asp
Heath, Chip and Dan, Switch: How to Change Things When Change is Hard, 2010 https://heathbrothers.com/books/switch/
McGuinness, Deborah, Ontology Development 101, A Guide to creating your first ontology http://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html
Research Data Alliance https://www.rd-alliance.org/ a research community organization started in 2013 by the European Commission, US National Science Foundation, US National Institute of Standards and
Technology, Australian Department of Innovation.
Citation References
How Netflix Reverse-Engineered Hollywood, Atlantic, 2014 https://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/
Life Science specific
BioPortal https://bioportal.bioontology.org/ repository of biomedical ontologies has almost 800 ontologies, mapping from ontologies to I2B2 http://i2b2.bioontology.org/
Malone, James et. al. Ten Simple Rules for selecting a Bio-Ontology, PLoS Comput Biol 12(2), 2016: e1004743. https://doi.org/10.1371/journal.pcbi.1004743
National Center for Biomedical Ontologies NCBO BioPortal Ontology to i2b2 File Mappings http://i2b2.bioontology.org/
Pistoia Alliance, Ontologies Guidelines for Best Practices to support practical application and mapping, 2016 https://pistoiaalliance.atlassian.net/wiki/spaces/PUB/pages/43089942/Ontologies+Guidelines+for+Best+Practice
Berneres Lee 2001 http://www.sdgnews.com/sd2001es_006/sd2001es_006.htm no longer on web
Neumann, 2005 https://www.ddw-online.com/informatics/p148329-applying-the-semantic-web-to-drug-discovery-and-development.html
Michael Ashburner GeneSeer: A sage for gene names and genomic resourcesBMC Genomics. 2005; 6: 134. 2005 Sep 21. doi: 10.1186/1471-2164-6-134
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266031/
Carol Goble “purposely misquoting Michael Ashburner” Keynote EGEE 2006
Interoperability With Moby 1.0 - It's Better Than Sharing Your ...
Semantic alignment and data standardization are vital to solve if we are going to harness modern technologies such as machine learning”
Ian Harrow1 Rama Balakrishnan2 Ernesto Jimenez-Ruiz34 Simon Jupp5 Jane Lomax6 Jane Reed7 Martin Romacker8 Christian Senger9 Andrea Splendiani10 Jabe Wilson11 Peter Woollard12 Drug Discovery Today May 2019 https://www.sciencedirect.com/science/article/pii/S1359644618304215
Nicolas 2019 Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001
Acknowledgments
Many people have participated in this ongoing project. I’m grateful for their work, insights and
encouragement.
Cambridge Innovation
Institute CII
& Cambridge Healthtech
Institute CHI
•Phillips Kuhl, President
•Tonya Urquizo,
Knowledge Information
Services Analyst & IT
Liaison
Sanaye Bartlett, Data
Analyst & Project Manager
•Kaushik Chaudhuri,
Director of Product
Marketing
CII Disqover Team
•Kaitlyn Barago, Associate
Conference Producer
•Nancy Clarke, Data
Scientist
•Mike Croft,
Software Architect
•Ben Lakin,
Director New Initiatives
•Jaime Parlee, Director
Marketing Analytics
•Craig Wohlers, Manager
Knowledge Foundation
OntoForce
•Hans Constandt, CEO
President
•Filip Pattyn, Scientific Lead
•Niels Vanneste,
Customer Data Scientist
•Berenice Wulbrecht, Data
Science Director, Systems
Biology
•Fruitful conversations
Emails
•Ingrid Akerblom, IFA
Diversified Consulting
John Aubrey, Vertex
Mark Burfoot, Novartis
NIBR
Jane Lomax, SciBite
•Eric Neumann, Akidata LLC
•Terrell Russell,
iRODS Consortium
•Juliane Schneider, eagle-i,
Harvard Catalyst
•Ted Slater, PaaS, Elsevier
•John Wilbanks,
Sage Bionetworks

More Related Content

What's hot

Writing Analytics for Epistemic Features of Student Writing #icls2016 talk
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkWriting Analytics for Epistemic Features of Student Writing #icls2016 talk
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkSimon Knight
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationChasity Gibson
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonOpenAIRE
 
Behavioural Modelling Outcomes prediction using Casual Factors
Behavioural Modelling Outcomes prediction using Casual  FactorsBehavioural Modelling Outcomes prediction using Casual  Factors
Behavioural Modelling Outcomes prediction using Casual FactorsIJMER
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
User Research 101 Handout
User Research 101 HandoutUser Research 101 Handout
User Research 101 HandoutCarol Smith
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...Naveen Agarwal
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyeroiisdp
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...Pistoia Alliance
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics DataMicah Altman
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Andrew Su
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 

What's hot (19)

Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Writing Analytics for Epistemic Features of Student Writing #icls2016 talk
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkWriting Analytics for Epistemic Features of Student Writing #icls2016 talk
Writing Analytics for Epistemic Features of Student Writing #icls2016 talk
 
From byte to mind
From byte to mindFrom byte to mind
From byte to mind
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the Organization
 
Science as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey BoultonScience as an Open Enterprise – Geoffrey Boulton
Science as an Open Enterprise – Geoffrey Boulton
 
Behavioural Modelling Outcomes prediction using Casual Factors
Behavioural Modelling Outcomes prediction using Casual  FactorsBehavioural Modelling Outcomes prediction using Casual  Factors
Behavioural Modelling Outcomes prediction using Casual Factors
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
User Research 101 Handout
User Research 101 HandoutUser Research 101 Handout
User Research 101 Handout
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
 
Ralph schroeder and eric meyer
Ralph schroeder and eric meyerRalph schroeder and eric meyer
Ralph schroeder and eric meyer
 
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI  Webina...
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...
 
Best Practices for Sharing Economics Data
Best Practices for Sharing Economics DataBest Practices for Sharing Economics Data
Best Practices for Sharing Economics Data
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
UCKIZ
UCKIZUCKIZ
UCKIZ
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 

Similar to How to create a taxonomy for management buy-in

Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318Tim Maurer
 
Innovation Framework Best Practices #4
Innovation Framework Best Practices #4Innovation Framework Best Practices #4
Innovation Framework Best Practices #4Jon Hirst
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Paul Gilbreath
 
The Web and the Collective Intelligence - How to use Collective Intelligence ...
The Web and the Collective Intelligence - How to use Collective Intelligence ...The Web and the Collective Intelligence - How to use Collective Intelligence ...
The Web and the Collective Intelligence - How to use Collective Intelligence ...Hélio Teixeira
 
Practical Machine Ethics @ SXSW2019
Practical Machine Ethics @ SXSW2019Practical Machine Ethics @ SXSW2019
Practical Machine Ethics @ SXSW2019Jesus Ramos
 
Team building insights from artificial intelligence
Team building insights from artificial intelligenceTeam building insights from artificial intelligence
Team building insights from artificial intelligenceRobert Roan
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentivesElena Simperl
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data scienceJoe Keating
 
Organizations in a Future with Generative AI
Organizations in a Future with Generative AIOrganizations in a Future with Generative AI
Organizations in a Future with Generative AIKye Andersson
 
Content Strategy: Killing Time Between Redesigns
Content Strategy: Killing Time Between RedesignsContent Strategy: Killing Time Between Redesigns
Content Strategy: Killing Time Between RedesignsDave Coustan
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptxISSIP
 
IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...Western Illinois University
 
Elder Abuse Research
Elder Abuse ResearchElder Abuse Research
Elder Abuse ResearchLaura Torres
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013mrkwr
 
Chitty taxo cleveland 2019 june
Chitty taxo cleveland 2019 june Chitty taxo cleveland 2019 june
Chitty taxo cleveland 2019 june Mary Chitty
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Stuart Shulman
 
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Stijn (Stan) Christiaens
 

Similar to How to create a taxonomy for management buy-in (20)

Innovation series 112318
Innovation series 112318Innovation series 112318
Innovation series 112318
 
Innovation Framework Best Practices #4
Innovation Framework Best Practices #4Innovation Framework Best Practices #4
Innovation Framework Best Practices #4
 
Elsevier
ElsevierElsevier
Elsevier
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
 
The Web and the Collective Intelligence - How to use Collective Intelligence ...
The Web and the Collective Intelligence - How to use Collective Intelligence ...The Web and the Collective Intelligence - How to use Collective Intelligence ...
The Web and the Collective Intelligence - How to use Collective Intelligence ...
 
Practical Machine Ethics @ SXSW2019
Practical Machine Ethics @ SXSW2019Practical Machine Ethics @ SXSW2019
Practical Machine Ethics @ SXSW2019
 
Team building insights from artificial intelligence
Team building insights from artificial intelligenceTeam building insights from artificial intelligence
Team building insights from artificial intelligence
 
Social machines: theory design and incentives
Social machines: theory design and incentivesSocial machines: theory design and incentives
Social machines: theory design and incentives
 
Why Quertle?
Why Quertle?Why Quertle?
Why Quertle?
 
Joe keating - world legal summit - ethical data science
Joe keating  - world legal summit - ethical data scienceJoe keating  - world legal summit - ethical data science
Joe keating - world legal summit - ethical data science
 
Organizations in a Future with Generative AI
Organizations in a Future with Generative AIOrganizations in a Future with Generative AI
Organizations in a Future with Generative AI
 
Model bias in AI
Model bias in AIModel bias in AI
Model bias in AI
 
Content Strategy: Killing Time Between Redesigns
Content Strategy: Killing Time Between RedesignsContent Strategy: Killing Time Between Redesigns
Content Strategy: Killing Time Between Redesigns
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
 
IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...IETC 2011-Making Information Work-Applying competency standards to improve te...
IETC 2011-Making Information Work-Applying competency standards to improve te...
 
Elder Abuse Research
Elder Abuse ResearchElder Abuse Research
Elder Abuse Research
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
Chitty taxo cleveland 2019 june
Chitty taxo cleveland 2019 june Chitty taxo cleveland 2019 june
Chitty taxo cleveland 2019 june
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

How to create a taxonomy for management buy-in

  • 1. You thought taxonomies were hard? How to create a taxonomy for “management buy-in" Mary Chitty MSLS mchitty@healthtech.com https://www.cambridgeinnovationinstitute.com/ http://www.healthtech.com/ 1 London, Oct 15, 2019 Cambridge Healthtech Needham MA USA
  • 2. Where I’m coming from 1992 2000 2006- 2014 2016 2018 2019 Taxonomy now 1,600+ terms and growing. Company is now mid-size. 2018 Signed contract with OntoForce to use Disqover search software. https://www.ontoforce.com/ Acquired Artificial Intelligence and Internet of Things companies. Hired several data scientists. Began informal collaboration with OntoForce, Belgian semantic search engine company.
  • 3. Taxonomies & Ontologies: Many useful applications • E-commerce • Data visualization • Search, Semantic search and Search Engine Optimization SEO • Statistical analysis • Text mining
  • 4. Ontologies are complex Simpler is sometimes better • “Ontologies offer advantages over other knowledge systems—they enable both computational use and human understanding, they can …include rich vocabularies of labels, synonyms, and textual definitions. If these are desirable selection criteria, then an ontology should be considered. ” • “Ontologies do also come with computational overheads, however, and can be complex to understand. Other resources such as a vocabulary do not offer the sorts of classification and rich computational descriptions of an ontology but are often much simpler to understand. Let your requirements guide you; ontologies are not a panacea—sometimes one isn’t needed at all. ” • Malone 2016
  • 5. Value adds of taxonomies/ontologies: Interoperability Reproducibility Data harmonization especially of named entities Identification of data errors or inconsistencies Search/data navigation improvements Collaborative filtering/recommendation engines Validation of correlations to examine possible causalities Enabling previously unaskable questions/use cases
  • 6. Information overload Complexity Data integration Legacy data Ambiguous and inconsistent data Missing or unfindable data Scaling up data processes Sustainable maintenance – perhaps the biggest challenge of all! Ongoing Challenges:
  • 7. Primary challenges are as much cultural as technological Life sciences challenges include: Relatively sparse data compared to other domains such as financial Highly dimensional data with many variables (complex to chaotic) Inherently noisy biological data. (Increasingly studied at the single cell or gene expression level). Data on longitudinal health outcomes limited by HIPAA & other privacy regulations, but crucial for evidence based medicine validation. In our era of big data, the irony is we don’t have enough readily usable life sciences data.
  • 8. Reproduci bility challenges • More than 70% of researchers have tried but failed to reproduce experiments. More than half failed to reproduce their own experiments. Baker, Nature 2016 • “replication alone will get us only so far (and) might actually make matters worse… an essential protection against flawed ideas … is the strategic use of multiple approaches to address one question. Each approach has its own unrelated assumptions, strengths and weaknesses. Results that agree across different methodologies are less likely to be artefacts”. Wikipedia “replication crisis”
  • 9. Cost benefit Return On Investment ROI Educating decision makers is an ongoing process, even with CXOs who value taxonomies/ontologies. But stakeholders are often skeptical about investing in taxonomies or ontologies. Minimum of 10.2 billion Euros per year. Cost of not having FAIR research data:
  • 10. Semantic web people have talked about terminology challenges for decades 2001 "The first layer of the semantic Web consists of ontologies and taxonomies ... A huge amount of this is being done very desperately in the realm of biotech, for the human genome and new drug development.” Tim Berners Lee, August 30, 2001 keynote at Software Development East in Boston. 2005 “Semantics is fundamentally not an information technologies issue …it originates out of the need for groups of individuals to work together towards common goals … must agree upon a set of meanings around terminologies, concepts, relations and actions … a lot of confusion arises before people realise whether they are talking about the same or different things." Eric Neumann, Applying the semantic web to drug discovery and development. Drug Discovery World Fall 2005
  • 11. Business cases for taxonomies or ontologies is not a new problem “a month in the lab can often save an hour in the library” Attributed to chemist Frank Westheimer 1979 interview “Institutions either underestimate the resources needed to do this work , or they are daunted by the entire prospect ... Honestly, very little data will ever be reused. ” personal communication, Juliane Schneider, eagle-i, Harvard Catalyst
  • 12. Best Practical Advice I’ve come across “One of my mantras is always start small. Show some win in some small domain. Don’t under any circumstance start with saying I’ll just build you this enormous ontology for the next two years …then your world will be better”. .. Just say I’m going to build this tiny little ontology and enable this small application over here … Always making sure my small ontology is enabling the small win.” “Question: how do you encourage semantic modeling? Answer: First I compliment them, and say what you’ve done is a great starting point – because they have actually started … I try to find a couple of structured retrieval applications that they really want to do but their current markup is not allowing ….find two compelling examples … make sure that we’ve got a deliverable in a month or a short period of time where they can do the one trial thing that adds value. Kind of get them on the slippery slope so that they’re’ the owner and they want to do it themselves.” Deborah McGuinness keynote speech 2004
  • 13. Taxonomy Use Cases Amazon I spoke at Taxonomy Boot Camp 2017 in Washington DC and learned that Amazon has taxonomists on 24 hour emergency call, for when people can’t find their products online. Netflix “[Netflix]paid people to watch films and tag them with all kinds of metadata. This process is so sophisticated and precise that taggers receive a 36-page training document that teaches them how to rate movies ... “[Netflix] even offered a $1 million prize to the team that could design an algorithm that would improve the company's ability to predict how many stars users would give movies. It took years to improve the algorithm by a mere 10 percent. The prize was awarded in 2009, but Netflix never actually incorporated the new models. ” Madrigal, Atlantic 2014 Best business case for taxonomy ever?
  • 14. My Taxonomy Case Studies in-house database Map industry verticals to company specific verticals Question: Can we use any existing taxonomies such as NAICs codes and CrunchBase to automate? How to integrate existing in-house database with newly acquired company databases? Work in progress. Job title functions and seniorities for people in database Data scientist automated. At least 80% assigned now. Customizable for use by various departments? Phase 1 just completed. Still reviewing and fine-tuning. Job departments for people in database. Similar to job titles. Phase 1 just started. Ontoforce internal data ingestion Project to enable improved access to existing in-house database. Uncovered inconsistencies with labels and tables. Identified data quality issues to address. Working on training users, documenting workflows. Need to add more changeability to existing taxonomy keywords structure. Starting to see how trend analysis may be possible. Work in progress.
  • 15. My recommendations for starting a project • Consider a pilot/proof of concept. • Start small because that will be easier to evaluate and validate. Don’t try to “boil the ocean”. • Choose a variety of data complexity. Think about degrees of granularity when drafting categories. • Which categories might you want to aggregate? • Which related concepts might you want to segment further? [Phase 2] • Are there assumptions or implicit biases you might be making without realizing? • Solicit feedback from diverse stakeholders as an ongoing process.
  • 16. Don’t be surprised while building your pilot project : Terminology consensus is challenging at best. Taxonomies, ontologies, naming, tagging, tables, models Many ways to express these concepts. “Biologists would rather share a toothbrush than gene names” Michael Ashburner GeneSeer: A sage for gene names and genomic resources "Biologists would rather share a tooth brush than data” Carol Goble “purposely misquoting Michael Ashburner” Keynote EGEE 2006 Trying for consensus often gets very emotional, challenging – and confusing.
  • 17. https://www.go-fair.org/fair-principles/ Findable •First step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. … an essential component of the FAIRification process. Accessible •Once the user finds the required data, she/he needs to know how can they be accessed Interoperable •Data usually need to be integrated with other data … need to interoperate with applications or workflows. Reusable •Ultimate goal of FAIR is to optimise the reuse of data… metadata and data should be well-described so that they can be replicated and/or combined in different settings. Keep in mind for management presentation FAIR data can help European Commission and US National Institute of Health have allocated considerable resources to making data FAIRer.
  • 18. Managemen t may ask “Why can’t I just Google it?” • Search works best if you know what you’re looking for exists AND what to call it. • Taxonomies are more useful if you’re not sure what you want exists, OR you don’t know what to call it, AND/OR there are multiple ways to express concept variations. • Harness the power of serendipity with taxonomies. They give people a sense of whether the “scent of information” is promising.
  • 19. Success Factors Business case Manageme nt: Look for and document Cost reductions Productivity increases Employee time savings Added competitive advantages Sustainability Risk mitigations Assemble Sponsors, champions and/or influencers Clear executive summary, with KPIs Key Performance Indicators, milestones, values, costs 1-2 pages Utilize url links if needed Remember Align early. Align often. Ask for feedback – it’s a way of getting buy-in. Leave room for suggestions. Be aware of other company initiatives.
  • 20. But what else needs to change? • Data-readiness “[T]here is a lot of work that needs doing to prepare the data sets for these technologies … a disproportionate amount being invested in the technologies as opposed to investing in "data-readiness… It's just not a slam dunk to mash up a lot of data and think it will work. … The AI solution may help accelerate some tasks, but human expertise may be required for the broad scope of what is needed. “ Nicholas 2019 • Open Science "Is any lifetime long enough these days to learn everything needed to get a drug to market and keep it there? " • More need than ever to collaborate to share knowledge, especially pre-competitively.
  • 21. Key takeaways 1. Aim first for quick wins with low hanging fruit. 2. Bundle stakeholders valued wants with items you can expect they will eventually need. 3. Seek out allies to get shared buy-in for sustainable justification. 4. Pareto Principle 80% of effects come from 20% of effort. 5. Expectations/change management are crucial skills to cultivate. 6. Collect metrics (quantitative/qualitative) to measure progress, so you know when you’ve made some. 7. Recognize some challenges haven’t been resolved by anyone yet.
  • 22. Resources to use todayChitty, Mary, Ontologies & Taxonomies glossary & taxonomy, 2019 with 40 plus ontology definitions, 15 taxonomy definitions http://www.genomicglossaries.com/content/ontologies.asp Heath, Chip and Dan, Switch: How to Change Things When Change is Hard, 2010 https://heathbrothers.com/books/switch/ McGuinness, Deborah, Ontology Development 101, A Guide to creating your first ontology http://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html Research Data Alliance https://www.rd-alliance.org/ a research community organization started in 2013 by the European Commission, US National Science Foundation, US National Institute of Standards and Technology, Australian Department of Innovation. Citation References How Netflix Reverse-Engineered Hollywood, Atlantic, 2014 https://www.theatlantic.com/technology/archive/2014/01/how-netflix-reverse-engineered-hollywood/282679/ Life Science specific BioPortal https://bioportal.bioontology.org/ repository of biomedical ontologies has almost 800 ontologies, mapping from ontologies to I2B2 http://i2b2.bioontology.org/ Malone, James et. al. Ten Simple Rules for selecting a Bio-Ontology, PLoS Comput Biol 12(2), 2016: e1004743. https://doi.org/10.1371/journal.pcbi.1004743 National Center for Biomedical Ontologies NCBO BioPortal Ontology to i2b2 File Mappings http://i2b2.bioontology.org/ Pistoia Alliance, Ontologies Guidelines for Best Practices to support practical application and mapping, 2016 https://pistoiaalliance.atlassian.net/wiki/spaces/PUB/pages/43089942/Ontologies+Guidelines+for+Best+Practice Berneres Lee 2001 http://www.sdgnews.com/sd2001es_006/sd2001es_006.htm no longer on web Neumann, 2005 https://www.ddw-online.com/informatics/p148329-applying-the-semantic-web-to-drug-discovery-and-development.html Michael Ashburner GeneSeer: A sage for gene names and genomic resourcesBMC Genomics. 2005; 6: 134. 2005 Sep 21. doi: 10.1186/1471-2164-6-134 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1266031/ Carol Goble “purposely misquoting Michael Ashburner” Keynote EGEE 2006 Interoperability With Moby 1.0 - It's Better Than Sharing Your ... Semantic alignment and data standardization are vital to solve if we are going to harness modern technologies such as machine learning” Ian Harrow1 Rama Balakrishnan2 Ernesto Jimenez-Ruiz34 Simon Jupp5 Jane Lomax6 Jane Reed7 Martin Romacker8 Christian Senger9 Andrea Splendiani10 Jabe Wilson11 Peter Woollard12 Drug Discovery Today May 2019 https://www.sciencedirect.com/science/article/pii/S1359644618304215 Nicolas 2019 Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype” Francois Nicolas and comment by Christy Wilson https://www.lifescienceleader.com/doc/ai-in-life-sciences-seeing-past-the-hype-0001
  • 23. Acknowledgments Many people have participated in this ongoing project. I’m grateful for their work, insights and encouragement. Cambridge Innovation Institute CII & Cambridge Healthtech Institute CHI •Phillips Kuhl, President •Tonya Urquizo, Knowledge Information Services Analyst & IT Liaison Sanaye Bartlett, Data Analyst & Project Manager •Kaushik Chaudhuri, Director of Product Marketing CII Disqover Team •Kaitlyn Barago, Associate Conference Producer •Nancy Clarke, Data Scientist •Mike Croft, Software Architect •Ben Lakin, Director New Initiatives •Jaime Parlee, Director Marketing Analytics •Craig Wohlers, Manager Knowledge Foundation OntoForce •Hans Constandt, CEO President •Filip Pattyn, Scientific Lead •Niels Vanneste, Customer Data Scientist •Berenice Wulbrecht, Data Science Director, Systems Biology •Fruitful conversations Emails •Ingrid Akerblom, IFA Diversified Consulting John Aubrey, Vertex Mark Burfoot, Novartis NIBR Jane Lomax, SciBite •Eric Neumann, Akidata LLC •Terrell Russell, iRODS Consortium •Juliane Schneider, eagle-i, Harvard Catalyst •Ted Slater, PaaS, Elsevier •John Wilbanks, Sage Bionetworks

Editor's Notes

  1. Making a business case for taxonomies
  2. I’ve been working with taxonomies for decades, with support from the company President , though it took awhile to call what we had a taxonomy. Still working on making data optimal for data scientists. Recently realized we have multiple taxonomies, with varying degrees of documentation.
  3. Life Science specific applications include Drug discovery, Drug repurposing, Pharmacovigilance/adverse effects monitoring, Real world Evidence, Mapping complex disparate relationships (such as model organisms' data). I’m particularly interested in predictive analytics and trend analysis, with my projects still in very early stages.
  4. I’m focusing on similarities between taxonomies and ontologies today, not the differences. I’m not going to talk about whether you should be using taxonomies or ontologies, or linked data or knowledge graphs. These are important discussions and other talked will consider them.. I’m also looking forward to hearing talks on machine learning, automatic tagging and governance as well.
  5. Questions I most want to ask? Ones where the answer will surprise me.
  6. The more we learn the more we realize we still have to understand. New insights don’t always invalidate old knowledge – but understanding becomes more and more granular. Taxonomies alone won’t solve all of these challenges, but they are an important part of the process.
  7. I Some of my examples are life science specific, but a fair amount of my taxonomy work is more general as well.
  8. A study of 500 plus biology papers published in a 20-year span suggested that up to 80% of raw data collected for studies in the early 1990s is lost, “mostly because no one knows where to find it.”  Current Biology 2014 Digital data are ephemeral ... “’homeless’ data quickly become no data at all.” Berman, Science 2019 Monya Baker, Nature 2016 survey of  researchers.  https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 Current Biology Jan 6 2014, described in Wiener-Bronner 2013 Atlantic article https://www.theatlantic.com/national/archive/2013/12/scientific-data-lost-forever/356422/  Who Will Pay for Public Access to Research Data?  Francine Berman1,  Vint Cerf2  Science  09 Aug 2013: Vol. 341, Issue 6146, pp. 616-617  DOI: 10.1126/science.1241625 https://science.sciencemag.org/content/341/6146/616.summary Keep in mind the dangers of “overfitting” data.
  9. “we are confident that the true cost is much higher than the estimated” PWC Study looked at time spent, cost of storage, license costs, research retraction, double funding, interdisciplinarity and potential economic growth. Return on Investment of Natural Language Processing, Linguamatics paper cites time savings of 10x to 1000x or 1 FTE a year for every 10-12 drugs monitored, and costs savings of $40,000 by not investing in a project that would have had a negative outcome, or an improvement of $50-100K in risk adjusted revenue per disease area , productivity gains, discovery of 33 novel drug targets, applications for clinical trials and pharmacovigilance, automation of manual curation of publications and clinical trial endpoints, and re-use of existing data from clinical trials to speed up drug development. https://www.linguamatics.com/products/return-investment
  10. And it’s even harder than I realized when I wrote the proposal for this conference. Learned at BioIT May 2019 just how difficult it is to engage the C-Suite.
  11. Am using some of these sentences as scripts for in-house conversations now. 
  12. Some people want to rely only on algorithms and automation.  I’m still advocating a hybrid approach, with varying success.   
  13. Consider collecting data on extent of existing problems with finding or reusing data.
  14. People use an amazing variety of terms to describe terminology functions.
  15. See the Go Fair website for more information on this set of principles with guidelines on how to make data FAIRer.
  16. Many of the concepts I’m working with are cutting- to bleeding- edge and terminology is evolving organically. Lots of uncertainty  Think of the image of shooting at a moving target –  or as hockey great Wayne Gretzky said. “I just try to skate to where the puck is going to be.”
  17. Do your homework. Know your audience. Think strategically. Are there detractors or skeptics? How can you address them?
  18. Right now people seem willing to throw millions into machine learning or artificial intelligence – but are reluctant to invest in data readiness and data quality efforts. We’ve got to collaborate! We’ve got to share!
  19. “Expect things to take even longer than you anticipate. We know less today than we will tomorrow (which means we know the least when we start” [Thanks to Terrell Russell of the iRODS Consortium Understand the workflows of the people whose problems you are trying to solve. Drastic workflow changes often means change will never happen. Users are much better at telling you what they don’t like than knowing what they really want, but may not be able to envision. Thank people for this feedback. Focus on 80/20 - actually 20/80. Definitely don’t aim for 100%.
  20. Nature 2016 survey of  researchers.  https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 Current Biology Jan 6 2014, described in Atlantic article https://www.theatlantic.com/national/archive/2013/12/scientific-data-lost-forever/356422/  Who Will Pay for Public Access to Research Data?  Francine Berman1,  Vint Cerf2  Science  09 Aug 2013: Vol. 341, Issue 6146, pp. 616-617  DOI: 10.1126/science.1241625 https://science.sciencemag.org/content/341/6146/616.summary Life Science Leader 2019 March 1, “AI In Life Sciences: Seeing past the Hype”  Francois Nicolas and comment by Christy Wilson Pistoia Alliance Ontologies Mapping https://www.pistoiaalliance.org/projects/current-projects/ontologies-mapping/ Ontology mapping for semantically enabled applications Summary of recent progress from thesauri, taxonomies to ontologies  “when biomedical research is under a deluge of an increasing amount and variety of data…  Semantic alignment and data standardization are vital to solve if we are going to harness modern technologies such as machine learning"  Drug Discovery Today May 2019