SlideShare a Scribd company logo
1 of 33
Download to read offline
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Vishnu Vettrivel, Wisecube AI
Drug Discovery and
Development using AI
#UnifiedDataAnalytics #SparkAISummit
About me
• Vishnu Vettrivel - vishnu@wisecube.ai
• Data Science/AI platform Architect
• NOT a Molecular Biologist or a Medicinal
Chemist !
• Will be talking about things learnt mostly on
the job
• Have been working with a Molecular biologist
in a Biotech research firm to help accelerate
drug discovery using Machine learning
Agenda
• History
– Nature as Source
– Recent efforts
• Rational drug
discovery
– Drug targeting
– Screening
– Drug Discovery
Cycle
• Economics
¡ Computer-aided Drug
Design
¡ Molecular Representation
¡ Drug safety assessment
¡ Demo
¡ Tools and DBs
¡ Resources
¡ Summary
History of
drug
discovery
Ancient methods: Nature as a source
• Search for Drugs not new:
– Traditional Chinese medicine
and Ayurveda both several
thousand years old
• Many compounds now being
studied
– Aspirin’s chemical forefather
known to Hippocrates
– Even inoculation at least
2000 years old
– But also resulted in many
ineffective drugs
source: https://amhistory.si.edu/polio/virusvaccine/history.htm
More recent efforts
• In 1796, Jenner finds first
vaccine: cowpox prevents
smallpox
• 1 century later, Pasteur makes
vaccines against anthrax and
rabies
• Sulfonamides developed for
antibacterial purposes in 1930s
• Penicillin: the “miracle drug”
• 2nd half of 20th century: use of
modern chemical techniques to
create explosion of medicines
Rational drug discovery
PROCESS OF FINDING NEW
MEDICATIONS BASED ON THE
KNOWLEDGE OF A BIOLOGICAL
TARGET.
MOST COMMONLY AN ORGANIC
SMALL MOLECULE THAT
ACTIVATES OR INHIBITS THE
FUNCTION OF A PROTEIN
INVOLVES THE DESIGN OF
MOLECULES THAT ARE
COMPLEMENTARY IN SHAPE AND
CHARGE TO THE BIOMOLECULAR
TARGET
Drug target identification
• Different approaches to
look for drug targets
– Phenotypic screening
– gene association studies
– chemo proteomics
– Transgenetic organisms
– Imaging
– Biomarkers
Source: https://www.roche.com/research_and_development/drawn_to_science/target_identification.htm
Target to drug
cycle
source:
https://www.researchgate.net/publication/294679594_DRUG_DISCOVERY_HIT_TO_LEAD
Screening
• High Throughput Screening
– Implemented in 1990s, still going strong
– Allows scientists to test 1000’s of potential
targets
– Library size is around 1 million compounds
– Single screen program cost ~$75,000
– Estimated that only 4 small molecules with
roots in combinatorial chemistry made it to
clinical development by 2001
– Can make library even bigger if you spend
more, but can’t get comprehensive coverage
• Similarity paradox
– Slight change can mean difference between
active and inactive
Hit to lead
optimization
source: http://www.sbw.fi/lead-optimization/
Drug discovery cycle
Involves the identification of
screening hits using medicinal
chemistry and optimization of
those hits to increase:
– Affinity
– Selectivity (to
reduce the potential of
side effects),
– Efficacy/potency
– Druglikeness
Photo by Boghog / CC BY-SA 4.0
economics
source: https://www.nature.com/articles/nrd3681
Eroom’s Law: Opposite of
Moore’s Law – Signals
worrying trends in number
and cost of Drugs to
Market for the Pharma
industry
Drug
discovery
timeline
source: https://www.innoplexus.com/blog/five-reasons-to-embrace-data-driven-drug-development/
Computer-
Aided drug
design
source:
http://poster123.info/?u=Pharmacological+Strategies+To+Contend+Against+Myocardial
Molecular
represent
ation
1-D Descriptors
• Molecular properties often used for
rough classifications
– molecular weight, solubility, charge,
number of rotatable bonds, atom
types, topological polar surface area
etc.
• Molecular properties like partition
coefficient, or logP, which measures
the ratio of solubilities in two different
substances.
• The Lipinski rule of 5 is a simple rule of
thumb that is often used to pre-filter
drug candidates
Source: chemical Reactivity, Drug-Likeness and Structure Activity/Property Relationship Studies of 2,1,3-Benzoxadiazole Derivatives as Anti-Cancer Activity
2-D Descriptors
• A common way of mapping variably
structures molecules into a fixed-size
descriptor vector is “fingerprinting”
• circular fingerprints are in more widespread
use today.
• A typical size of the bit vector is 1024
• The similarity between two molecules can be
estimated using the Tanimoto coefficient
• One standard implementation are extended
circular fingerprints (termed ECFPx,with a
number x designating the maximum
diameter; e,g, ECFP4 for a radius of 2
bonds)
qsar
• Predictive statistical models correlating one or
more piece of response data about chemicals
• Statistical tools, including regression and
classification-based strategies, are used to
analyze the response and chemical data and
their relationship
• Have been part of scientific study for many
years. As early as 1863, Cros found that the
toxicity of alcohols increased with decreasing
aqueous solubility
• Machine learning tools are also very effective in
developing predictive models, particularly when
handling high-dimensional and complex chemical
data showing a nonlinear relationship with the
responses of the chemicals
SMILE string
• SMILES (“Simplified molecular-input line-entry system”)
• Represents molecules in the form of ASCII character
strings
• Several equivalent ways to write the same compound
– Workaround is to use the canonical version of SMILE
• SMILES are reasonably human-readable
Neural fingerprints
• Hash function can be replaced by a
neural network
– Final fingerprint vector is the sum over
a number of atom-wise softmax
operations
– Similar to the pooling operation in
standard neural networks
– Can be more smooth than predefined
circular fingerprints
• Auto-encoders are also used to find
compact latent representations
– converts discrete representations of
molecules to and from a
multidimensional continuous
representation
Drug safety assessment
• According to Tufts Center for the Study of Drug
Development (CSDD) the three main causes of failures
in Phase III trials:
– Efficacy (or rather lack thereof) — i.e., failure to
meet the primary efficacy endpoint
– Safety (or lack thereof) — i.e., unexpected
adverse or serious adverse events
– Commercial / financial — i.e., failure to
demonstrate value compared to existing therapy
• According to another study by Yale School of Medicine
– 71 of the 222 drugs approved in the first decade
of the millennium were withdrawn
– Took a median of 4.2 years after the drugs were
approved for these safety concerns to come to
light
– Drugs ushered through the FDA's accelerated
approval process were among those that had
higher rates of safety interventions
Tox21 challenge
• Challenge was designed to help scientists
understand the potential of the chemicals
and compounds being tested
• The goal was to "crowdsource" data analysis
by independent researchers to reveal how
well they can predict compounds'
interference in biochemical pathways using
only chemical structure data.
• The computational models produced from
the challenge would become decision-
making tools for government agencies
• NCATS provided assay activity data and
chemical structures on the Tox21 collection
of ~10,000 compounds (Tox21 10K).
Deeptox
• Normalizes the chemical representations of the
compounds
• Computes a large number of chemical descriptors that
are used as input to machine learning methods
• Trains models, evaluates them, and combines the best
of them to ensembles
• Predicts the toxicity of new compounds
• Had the highest performance of all computational
methods
• Outperformed naive Bayes, SVM, and random forests
Multi-task
learning
• They were able to apply multi-
task learning in the Tox21
challenge because most of the
compounds were labeled for
several tasks
• Multi-task learning has been
shown to enhance the
performance of DNNs when
predicting biological activities
at the protein level
• Since the twelve different
tasks of the Tox21 challenge
data were highly correlated,
they implemented multi-task
learning in the DeepTox
pipeline.
•
Associations to toxicophores
• The histogram (A) shows the
fraction of neurons in a layer
that yield significant
correlations to a toxicophore.
With an increasing level of the
layer, the number of neurons
with significant correlation
decreases.
• The histogram shows the
number of neurons in a layer
that exceed a correlation
threshold of 0.6 to their best
correlated toxicophore.
Contrary to (A) the number of
neurons increases with the
network layer. Note that each
layer consisted of the same
number of neurons.
Feature
Construction by
Deep Learning.
• Neurons that have learned to
detect the presence of
toxicophores.
• Each row shows a particular
hidden unit in a learned network
that correlates highly with a
particular known toxicophore
feature.
• The row shows the three
chemical compounds that had the
highest activation for that neuron.
• Indicated in red is the toxicophore
structure from the literature that
the neuron correlates with. The
first row and the second row are
from the first hidden layer, the
third row is from a higher-level
layer.
Demo
Tools and
databases
• Rdkit collection of cheminformatics and machine-
learning software written in C++ and Python.
• DeepChem is an integrated python library for
chemistry and drug discovery; it comes with a
collection of implementations for many deep learning
based algorithms.
• Chembl is a public database containing millions of
bioactive molecules and assay results. The data has
been manually transcribed and curated from
publications. Chembl is an invaluable source, but has
its share of errors — e.g., sometimes affinities are off
by exactly 3 or 6 orders of magnitude due to wrongly
transcribed units (micromols instead of nanomols).
• PDBbind is another frequently used database, which
contains protein-ligand co-crystal structures together
with binding affinity values. Again, while certainly very
valuable, PDBbind has some well-known data
problems.
• https://www.click2drug.org/ website containing a
comprehensive list of computer-aided drug design
(CADD) software, databases and web services.
Resources
• Lima, Angélica Nakagawa, Eric Allison Philot, Gustavo Henrique Goulart
Trossini, Luis Paulo Barbour Scott, Vinícius Gonçalves Maltarollo, and
Kathia Maria Honorio. "Use of Machine Learning Approaches for Novel
Drug Discovery." Expert Opinion on Drug Discovery. 2016. Accessed April
23, 2019. https://www.ncbi.nlm.nih.gov/pubmed/26814169.
• Khamis, Mohamed A., Walid Gomaa, and Walaa F. Ahmed. "Machine
Learning in Computational Docking." Artificial Intelligence in Medicine.
March 2015. Accessed April 23, 2019.
https://www.ncbi.nlm.nih.gov/pubmed/25724101.
• Lima, Angélica Nakagawa, Eric Allison Philot, Gustavo Henrique Goulart
Trossini, Luis Paulo Barbour Scott, Vinícius Gonçalves Maltarollo, and
Kathia Maria Honorio. "Use of Machine Learning Approaches for Novel
Drug Discovery." Expert Opinion on Drug Discovery. 2016. Accessed April
23, 2019. https://www.ncbi.nlm.nih.gov/pubmed/26814169.
• Mayr, Andreas, Klambauer, Günter, Thomas, Hochreiter, and Sepp.
"DeepTox: Toxicity Prediction Using Deep Learning." Frontiers. December
04, 2015. Accessed April 21, 2019.
https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080/full
Summary
• Increasing pressure is forcing Pharma
industry to turn to AI based techniques to
reduce time, costs and increase success rates
of new drugs to market
• Drug Safety is one of the top reasons for
failures in FDA approvals of new drugs and
recalls
• AI and Deep learning techniques have show
lot of promise compared to traditional
techniques in drug discovery and safety
• The race for using AI is on and over 100 new
startups are now pursuing this line of inquiry
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryDr. Gerry Higgins
 
Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry Mounika Mouni
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Pistoia Alliance
 
Artificial intelligence and its applications in healthcare and pharmacy
Artificial intelligence and its applications in healthcare and pharmacyArtificial intelligence and its applications in healthcare and pharmacy
Artificial intelligence and its applications in healthcare and pharmacyAtul Adhikari
 
How Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming PharmaHow Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming PharmaTyrone Systems
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AIDatabricks
 
Artificial Intelligence in Pharmaceutical Science
Artificial Intelligence in Pharmaceutical ScienceArtificial Intelligence in Pharmaceutical Science
Artificial Intelligence in Pharmaceutical ScienceAhmed Obaidullah
 
Computational Drug Design
Computational Drug DesignComputational Drug Design
Computational Drug Designbaoilleach
 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug designN K
 
Rational drug design
Rational drug designRational drug design
Rational drug designNaresh Juttu
 
Challenges and drawbacks of drug discovery and development
Challenges and drawbacks of drug discovery and developmentChallenges and drawbacks of drug discovery and development
Challenges and drawbacks of drug discovery and developmentGaurav Aggarwal
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Molecular docking
Molecular dockingMolecular docking
Molecular dockingRahul B S
 
Role of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery ProcessRole of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery ProcessPallavi Duggal
 
Structure based drug designing
Structure based drug designingStructure based drug designing
Structure based drug designingSeenam Iftikhar
 
threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methodsmohammed muzammil
 

What's hot (20)

Very brief overview of AI in drug discovery
Very brief overview of AI in drug discoveryVery brief overview of AI in drug discovery
Very brief overview of AI in drug discovery
 
Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Genomics & Proteomics Based Drug Discovery
Genomics & Proteomics Based Drug DiscoveryGenomics & Proteomics Based Drug Discovery
Genomics & Proteomics Based Drug Discovery
 
Artificial intelligence and its applications in healthcare and pharmacy
Artificial intelligence and its applications in healthcare and pharmacyArtificial intelligence and its applications in healthcare and pharmacy
Artificial intelligence and its applications in healthcare and pharmacy
 
How Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming PharmaHow Artificial Intelligence in Transforming Pharma
How Artificial Intelligence in Transforming Pharma
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 
Artificial Intelligence in Pharmaceutical Science
Artificial Intelligence in Pharmaceutical ScienceArtificial Intelligence in Pharmaceutical Science
Artificial Intelligence in Pharmaceutical Science
 
Computational Drug Design
Computational Drug DesignComputational Drug Design
Computational Drug Design
 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug design
 
Rational drug design
Rational drug designRational drug design
Rational drug design
 
Challenges and drawbacks of drug discovery and development
Challenges and drawbacks of drug discovery and developmentChallenges and drawbacks of drug discovery and development
Challenges and drawbacks of drug discovery and development
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Role of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery ProcessRole of Target Identification and Target Validation in Drug Discovery Process
Role of Target Identification and Target Validation in Drug Discovery Process
 
Structure based drug designing
Structure based drug designingStructure based drug designing
Structure based drug designing
 
threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methods
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 

Similar to Drug Discovery and Development Using AI

druggggggggggggggjjjhjgjgygygjhfggfdgfdgdfppt
druggggggggggggggjjjhjgjgygygjhfggfdgfdgdfpptdruggggggggggggggjjjhjgjgygygjhfggfdgfdgdfppt
druggggggggggggggjjjhjgjgygygjhfggfdgfdgdfppttaoufikakabli1
 
High Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfHigh Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfsayedjannatfatema72
 
Drug Discovery subject (clinical research)
Drug Discovery subject (clinical research)Drug Discovery subject (clinical research)
Drug Discovery subject (clinical research)Jannat985397
 
High Throughput Screening Technology
High Throughput Screening TechnologyHigh Throughput Screening Technology
High Throughput Screening TechnologyUniversity Of Swabi
 
HIGH THROUGHPUT SCREENING Technology
HIGH THROUGHPUT SCREENING  TechnologyHIGH THROUGHPUT SCREENING  Technology
HIGH THROUGHPUT SCREENING TechnologyUniversity Of Swabi
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
Chemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsunyil96
 

Similar to Drug Discovery and Development Using AI (20)

Sparsh bioinfo.ppt
Sparsh bioinfo.pptSparsh bioinfo.ppt
Sparsh bioinfo.ppt
 
druggggggggggggggjjjhjgjgygygjhfggfdgfdgdfppt
druggggggggggggggjjjhjgjgygygjhfggfdgfdgdfpptdruggggggggggggggjjjhjgjgygygjhfggfdgfdgdfppt
druggggggggggggggjjjhjgjgygygjhfggfdgfdgdfppt
 
Drug design
Drug design Drug design
Drug design
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
High Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdfHigh Throughput Screening drugg discovery.pdf
High Throughput Screening drugg discovery.pdf
 
Computer aided drug design
Computer aided drug designComputer aided drug design
Computer aided drug design
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Drug Discovery subject (clinical research)
Drug Discovery subject (clinical research)Drug Discovery subject (clinical research)
Drug Discovery subject (clinical research)
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
High Throughput Screening Technology
High Throughput Screening TechnologyHigh Throughput Screening Technology
High Throughput Screening Technology
 
HIGH THROUGHPUT SCREENING Technology
HIGH THROUGHPUT SCREENING  TechnologyHIGH THROUGHPUT SCREENING  Technology
HIGH THROUGHPUT SCREENING Technology
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Chemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientists
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 

Recently uploaded (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 

Drug Discovery and Development Using AI

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Vishnu Vettrivel, Wisecube AI Drug Discovery and Development using AI #UnifiedDataAnalytics #SparkAISummit
  • 3. About me • Vishnu Vettrivel - vishnu@wisecube.ai • Data Science/AI platform Architect • NOT a Molecular Biologist or a Medicinal Chemist ! • Will be talking about things learnt mostly on the job • Have been working with a Molecular biologist in a Biotech research firm to help accelerate drug discovery using Machine learning
  • 4. Agenda • History – Nature as Source – Recent efforts • Rational drug discovery – Drug targeting – Screening – Drug Discovery Cycle • Economics ¡ Computer-aided Drug Design ¡ Molecular Representation ¡ Drug safety assessment ¡ Demo ¡ Tools and DBs ¡ Resources ¡ Summary
  • 6. Ancient methods: Nature as a source • Search for Drugs not new: – Traditional Chinese medicine and Ayurveda both several thousand years old • Many compounds now being studied – Aspirin’s chemical forefather known to Hippocrates – Even inoculation at least 2000 years old – But also resulted in many ineffective drugs source: https://amhistory.si.edu/polio/virusvaccine/history.htm
  • 7. More recent efforts • In 1796, Jenner finds first vaccine: cowpox prevents smallpox • 1 century later, Pasteur makes vaccines against anthrax and rabies • Sulfonamides developed for antibacterial purposes in 1930s • Penicillin: the “miracle drug” • 2nd half of 20th century: use of modern chemical techniques to create explosion of medicines
  • 8. Rational drug discovery PROCESS OF FINDING NEW MEDICATIONS BASED ON THE KNOWLEDGE OF A BIOLOGICAL TARGET. MOST COMMONLY AN ORGANIC SMALL MOLECULE THAT ACTIVATES OR INHIBITS THE FUNCTION OF A PROTEIN INVOLVES THE DESIGN OF MOLECULES THAT ARE COMPLEMENTARY IN SHAPE AND CHARGE TO THE BIOMOLECULAR TARGET
  • 9. Drug target identification • Different approaches to look for drug targets – Phenotypic screening – gene association studies – chemo proteomics – Transgenetic organisms – Imaging – Biomarkers Source: https://www.roche.com/research_and_development/drawn_to_science/target_identification.htm
  • 11. Screening • High Throughput Screening – Implemented in 1990s, still going strong – Allows scientists to test 1000’s of potential targets – Library size is around 1 million compounds – Single screen program cost ~$75,000 – Estimated that only 4 small molecules with roots in combinatorial chemistry made it to clinical development by 2001 – Can make library even bigger if you spend more, but can’t get comprehensive coverage • Similarity paradox – Slight change can mean difference between active and inactive
  • 12. Hit to lead optimization source: http://www.sbw.fi/lead-optimization/
  • 13. Drug discovery cycle Involves the identification of screening hits using medicinal chemistry and optimization of those hits to increase: – Affinity – Selectivity (to reduce the potential of side effects), – Efficacy/potency – Druglikeness Photo by Boghog / CC BY-SA 4.0
  • 14. economics source: https://www.nature.com/articles/nrd3681 Eroom’s Law: Opposite of Moore’s Law – Signals worrying trends in number and cost of Drugs to Market for the Pharma industry
  • 18. 1-D Descriptors • Molecular properties often used for rough classifications – molecular weight, solubility, charge, number of rotatable bonds, atom types, topological polar surface area etc. • Molecular properties like partition coefficient, or logP, which measures the ratio of solubilities in two different substances. • The Lipinski rule of 5 is a simple rule of thumb that is often used to pre-filter drug candidates Source: chemical Reactivity, Drug-Likeness and Structure Activity/Property Relationship Studies of 2,1,3-Benzoxadiazole Derivatives as Anti-Cancer Activity
  • 19. 2-D Descriptors • A common way of mapping variably structures molecules into a fixed-size descriptor vector is “fingerprinting” • circular fingerprints are in more widespread use today. • A typical size of the bit vector is 1024 • The similarity between two molecules can be estimated using the Tanimoto coefficient • One standard implementation are extended circular fingerprints (termed ECFPx,with a number x designating the maximum diameter; e,g, ECFP4 for a radius of 2 bonds)
  • 20. qsar • Predictive statistical models correlating one or more piece of response data about chemicals • Statistical tools, including regression and classification-based strategies, are used to analyze the response and chemical data and their relationship • Have been part of scientific study for many years. As early as 1863, Cros found that the toxicity of alcohols increased with decreasing aqueous solubility • Machine learning tools are also very effective in developing predictive models, particularly when handling high-dimensional and complex chemical data showing a nonlinear relationship with the responses of the chemicals
  • 21. SMILE string • SMILES (“Simplified molecular-input line-entry system”) • Represents molecules in the form of ASCII character strings • Several equivalent ways to write the same compound – Workaround is to use the canonical version of SMILE • SMILES are reasonably human-readable
  • 22. Neural fingerprints • Hash function can be replaced by a neural network – Final fingerprint vector is the sum over a number of atom-wise softmax operations – Similar to the pooling operation in standard neural networks – Can be more smooth than predefined circular fingerprints • Auto-encoders are also used to find compact latent representations – converts discrete representations of molecules to and from a multidimensional continuous representation
  • 23. Drug safety assessment • According to Tufts Center for the Study of Drug Development (CSDD) the three main causes of failures in Phase III trials: – Efficacy (or rather lack thereof) — i.e., failure to meet the primary efficacy endpoint – Safety (or lack thereof) — i.e., unexpected adverse or serious adverse events – Commercial / financial — i.e., failure to demonstrate value compared to existing therapy • According to another study by Yale School of Medicine – 71 of the 222 drugs approved in the first decade of the millennium were withdrawn – Took a median of 4.2 years after the drugs were approved for these safety concerns to come to light – Drugs ushered through the FDA's accelerated approval process were among those that had higher rates of safety interventions
  • 24. Tox21 challenge • Challenge was designed to help scientists understand the potential of the chemicals and compounds being tested • The goal was to "crowdsource" data analysis by independent researchers to reveal how well they can predict compounds' interference in biochemical pathways using only chemical structure data. • The computational models produced from the challenge would become decision- making tools for government agencies • NCATS provided assay activity data and chemical structures on the Tox21 collection of ~10,000 compounds (Tox21 10K).
  • 25. Deeptox • Normalizes the chemical representations of the compounds • Computes a large number of chemical descriptors that are used as input to machine learning methods • Trains models, evaluates them, and combines the best of them to ensembles • Predicts the toxicity of new compounds • Had the highest performance of all computational methods • Outperformed naive Bayes, SVM, and random forests
  • 26. Multi-task learning • They were able to apply multi- task learning in the Tox21 challenge because most of the compounds were labeled for several tasks • Multi-task learning has been shown to enhance the performance of DNNs when predicting biological activities at the protein level • Since the twelve different tasks of the Tox21 challenge data were highly correlated, they implemented multi-task learning in the DeepTox pipeline. •
  • 27. Associations to toxicophores • The histogram (A) shows the fraction of neurons in a layer that yield significant correlations to a toxicophore. With an increasing level of the layer, the number of neurons with significant correlation decreases. • The histogram shows the number of neurons in a layer that exceed a correlation threshold of 0.6 to their best correlated toxicophore. Contrary to (A) the number of neurons increases with the network layer. Note that each layer consisted of the same number of neurons.
  • 28. Feature Construction by Deep Learning. • Neurons that have learned to detect the presence of toxicophores. • Each row shows a particular hidden unit in a learned network that correlates highly with a particular known toxicophore feature. • The row shows the three chemical compounds that had the highest activation for that neuron. • Indicated in red is the toxicophore structure from the literature that the neuron correlates with. The first row and the second row are from the first hidden layer, the third row is from a higher-level layer.
  • 29. Demo
  • 30. Tools and databases • Rdkit collection of cheminformatics and machine- learning software written in C++ and Python. • DeepChem is an integrated python library for chemistry and drug discovery; it comes with a collection of implementations for many deep learning based algorithms. • Chembl is a public database containing millions of bioactive molecules and assay results. The data has been manually transcribed and curated from publications. Chembl is an invaluable source, but has its share of errors — e.g., sometimes affinities are off by exactly 3 or 6 orders of magnitude due to wrongly transcribed units (micromols instead of nanomols). • PDBbind is another frequently used database, which contains protein-ligand co-crystal structures together with binding affinity values. Again, while certainly very valuable, PDBbind has some well-known data problems. • https://www.click2drug.org/ website containing a comprehensive list of computer-aided drug design (CADD) software, databases and web services.
  • 31. Resources • Lima, Angélica Nakagawa, Eric Allison Philot, Gustavo Henrique Goulart Trossini, Luis Paulo Barbour Scott, Vinícius Gonçalves Maltarollo, and Kathia Maria Honorio. "Use of Machine Learning Approaches for Novel Drug Discovery." Expert Opinion on Drug Discovery. 2016. Accessed April 23, 2019. https://www.ncbi.nlm.nih.gov/pubmed/26814169. • Khamis, Mohamed A., Walid Gomaa, and Walaa F. Ahmed. "Machine Learning in Computational Docking." Artificial Intelligence in Medicine. March 2015. Accessed April 23, 2019. https://www.ncbi.nlm.nih.gov/pubmed/25724101. • Lima, Angélica Nakagawa, Eric Allison Philot, Gustavo Henrique Goulart Trossini, Luis Paulo Barbour Scott, Vinícius Gonçalves Maltarollo, and Kathia Maria Honorio. "Use of Machine Learning Approaches for Novel Drug Discovery." Expert Opinion on Drug Discovery. 2016. Accessed April 23, 2019. https://www.ncbi.nlm.nih.gov/pubmed/26814169. • Mayr, Andreas, Klambauer, Günter, Thomas, Hochreiter, and Sepp. "DeepTox: Toxicity Prediction Using Deep Learning." Frontiers. December 04, 2015. Accessed April 21, 2019. https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080/full
  • 32. Summary • Increasing pressure is forcing Pharma industry to turn to AI based techniques to reduce time, costs and increase success rates of new drugs to market • Drug Safety is one of the top reasons for failures in FDA approvals of new drugs and recalls • AI and Deep learning techniques have show lot of promise compared to traditional techniques in drug discovery and safety • The race for using AI is on and over 100 new startups are now pursuing this line of inquiry
  • 33. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT