Samiul Hasan discusses using question-driven problem solving and data analytics to improve drug discovery at GSK. He outlines aspirations to efficiently organize hypotheses and ensure scientists have access to relevant data and knowledge. Inconsistent language use can cause problems, so self-learning questionnaires could help tag metadata and search literature to improve consistency. Examples of possible algorithms include named entity recognition to determine context, document classification to present similar content, and trigger event detection to alert authors. A pilot found the approach uncovered missed evidence and hypotheses. Hasan concludes technology can help questions drive problem solving if applied persistently and patiently.
Powering Question-Driven Problem Solving to Improve the Chances of Finding New Medicines
1. Powering question driven problem solving to
improve the chances of finding new medicines
Samiul Hasan,
Data Analytics and Visualization Director,
GSK Data & Computational Sciences
Connected Data London
4th October 2019
3. Aspirations of scientific knowledge management
1) Persistence
– Efficient organization.
– The hypotheses that we validate/invalidate today need to be
revisited by the next generation of scientists.
2) Vigilance
– Effective organization.
– Without access to the right data and prior knowledge at the right
time, we risk making very costly, avoidable business decisions.
Andrew Witty,
GSK CEO 2008-2017
4. Inconsistent use of language at source =
Serious downstream problems
SDS
– Serine dehydratase
– Sodium dodecyl sulfate
– Shwachman–Diamond syndrome
– Safety data sheet
GSK
– GlaxoSmithKline
– Glycogen synthase kinase
5. Data & knowledge capture forms:
Regulatory in purpose but what about
reward in design?
7. Self-learning questionnaires: Concept
“Auto-suggest” metadata tagging [AUTHOR
ACTION] & auto literature evidence searching
[AUTHOR REWARD] to improve language
consistency1 at source and findability2 of
reported evidence [OUTCOME]
1Great for improving search engines
2Great for making scientists effective
9. Examples
1. Determine from the context of the sentence whether author meant
“GlaxoSmithKline” or “Glycogen Synthase Kinase” when he/she
wrote “GSK”
2. Classify and present sentences (+link to documents) with most
similar metadata content to expected answers
3. Recommend predominant synonyms being used by individual
departments e.g. is a particular department really working on
“Glycogen Synthase Kinase” and using the synonym “GSK”?
4. If a significant efficacy or safety event is reported in a “Clinical”
questionnaire, automatically alert the author whether the
outcome/risk was predicted earlier in a “Pre-clinical”
questionnaire.
1Named entity recognition, 2Document classification,
3Reinforcement learning, 4Trigger event detection
10. Scoring “everything” – does it make sense to do it now or once
we actually have enough labelled training sets?
e.g. email spam
filtering
22. 1. Found evidence from rare
disease clinical trial missed by
project team
2. Found mechanistic hypothesis
that a program team had not
considered
3. Identified plausible mechanism
for lab observation
Impact
23. • It’s all about the questions
• Technology can help
overcome cultural challenges
• Persistence and patience key
Summary