OECD bibliometric indicators: Selected highlights, April 2024
Correct drug structures for pharmacology
1. How can pharmacologists know which
drug structures are correct?
Christopher Southan, Elena Faccenda, Simon J. Harding,
Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies
IUPHAR/BPS Guide to Pharmacology (GtoPdb)
University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK.
Presentation for BPS | Pharmacology 2016, London
Scheduled for Wed, Dec14, 2:15 PM
1
http://www.slideshare.net/cdsouthan/correct-drug-structures-for-pharmacology
2. Abstract
(will not be shown, should be online at BPS)
2
Introduction: Human medicines represent the crown jewels of pharmacology. Paradoxically however; there is neither
any “Gold Standard” set of approved chemical structures, nor agreement on totals. A 2009 comparison of three sets of
approved drugs recorded only 807 exact structures-in-common from the expected ~1200 [1].The IUPHAR/BPSGuide to
Pharmacology (GtoPdb) team have grappled with this discordance issue for curating approved drugs and all ~ 6000
small-molecule ligands we deposit into PubChem [2]. Users have the same challenge of deciding correct structures
when procuring compounds for experiments or navigating links between journals and databases.This work examines
the problems and partial solutions.
Methods:We used PubChem to explore relationships for selected drugs already curated into GtoPdb.Tools included the
“same connectivity” operator that records distinct compound record (CID) representations of the same carbon
backbone.We divided structural multiplexing causes between stereo differences, mixtures and isotopic derivatives. We
then performedVenn-type comparisons between DrugBank,ChEMBL, and theTherapeuticTarget Database. Additional
metrics were generated to dissect contributing factors to discordance between these three and other sources.
Results:Atorvastatin has 51 different single representations in PubChem and 248 mixtures with paclitaxel (taxol) having
142 and 330, respectively. Comparing three manually curated drug sets mentioned above inside PubChem showed the
consensus was only 25% of the sum. Results comparing other drug sources also showed discordance. Causes for CID
multiplexing discordance will be presented. Using PubChem tools we assessed a curation strategy of selecting CIDs with
structures supported by the majority of submitting sources. While not infallible, comparison with INN documentation
indicated its effectiveness. We will also show how tagging our own approved drug records facilitates easy retrieval of
just these entries from PubChem but that vendor drug names sometimes mapped to different structures.
Conclusion:As PubChem pushes towards 100 million, we have examined problems of choosing correct structures of
pharmacologically active compounds.The constitutive challenges of chemical representation and high levels of
discordances we recorded indicate that definitive drug lists (even our own) will remain elusive until pharmaceutical
companies submit their own records directly to open databases. In the meantime, we have optimised our GtoPdb
curation for the submission of our own 1088 approved CID entries as both a partial solution and trusted reference set for
the pharmacology community.
References: [1] Southan et al. (2009) J Cheminform. 1:1-10. [2] Southan et al. (2016). Nucl. Acids Res. 44 (Database
Issue): D1054-68.
3. Outline
• Introduction to GtoPdb
• Context of the study
• Database chemistry and approved drug counts
• Intersecting curated drugs in PubChem
• Fuzzy drug structure relationships
• GtoPdb approved drugs
• GtoPdb structures in PubChem
• Conclusions
• References
3
4. Introduction to IUPHAR/BPS Guide to
Pharmacology (GtoPdb)
• IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British
Pharmacological Society
• Formerly know as IUPHAR-DB for receptors and channels since 2009
• Since 2012 funded byWellcomeTrust to cover all targets in the human genome
• Curated molecular mechanism of action (mmoa) as quantitative activity
mapping to primary targets, including IUPHAR nomenclature
• 1429 human proteins, 14701 interactions, 8674 ligands
• Described in four Nucleic Acids ResearchAnnual Database issues, PMIDs
26464438 (2016), 24234439 (2014), 23087376 (2013) and 21087994 (2011)
• Distilled into bi-annual British Journal of Pharmacology “Concise Guide to
PHARMACOLOGY” as a nine-paper series
• Presents users with the best compounds for pharmacology research in silico, in
vitro, in cellulo, in vivo, or in clinico
4
http://www.guidetopharmacology.org/
5. Context of presentation
• In the last few years the GtoPdb team has been finding structure space around
lead compounds, probes and drugs increasingly “fuzzy”
• Curatorial choices are consequently becoming more difficult
• We needed a molecular perspective on the causes of this “fuzz”
• We have increased our exploration of PubChem chemical structural
neighbourhoods to gain this perspective
• This presentation distils key points
5
7. Approved drug structure counts: take your pick
7
Source Year Total Reference Notes
GVKBIO Drug Database 2013 4750Slideshare Global approved
NCATS Pharmaceutical Collection 2011 2356PMID 21525397 FDA, from global 3936
Therapeutic Target Database 2015 2071PMID 26578601 Small-molecule FDA
DrugCentral 2016 2021PMID 27789690 FDA, from 4456 APIs
DrugBank 5.0 2016 2004PMID 24203711 App. small-molecule, from 2225
ChEMBL 22 2016 1855PMID 24214965 SMILES from 2260 Phase 4
Drug3D db 2015 1790PMID 22539672 Small-molecule FDA
Cfam Chemical Families db 2015 1691PMID 25414339 Approved
Map of molecular drug targets 2016 1578PMID 27910877 FDA approved
FDA approved NME overview 2013 1543PMID 24680947 Small-molecule FDA, no strucs.
Network analysis of FDA drugs 2007 1471PMID 17516560 26th Orange Book, no strucs.
SWEETLEAD db 2013 1427PMID 24223973 FDA, from global 2836
FDA recommended dose db 2004 1309PMID 15546675 Small-molecule FDA
Guide to PHARMACOLOGY 2016.4 2016 1291PMID 26464438 Approved, selective curation
8. Discordance of curated drug sets within PubChem
8http://www.slideshare.net/cdsouthan/will-the-correct-drugs-please-stand-up-68239021
• Good news: 1361 structures with at least 3-way agreement
• Bad news: no“Gold Standard” set (but the 459 4-way would do)
• Details below
NPC = National
Centre for
Advancing
Translational
Sciences
(NCATS)
Pharmaceutical
Collection
9. Exploring “fuzz” via PubChem:
Which of 51 atorvastatins is correct?
9
• Powerful structural
relationship navigation
• Needs cheminformatics
expertise
10. Which of 145 taxols is correct?
10
145 distinct structures in PubChem
12 have BioAssay results
34 have vendors
11. GtoPdb approved drug curation
• Our approach is stringent and parsimonious (i.e. not a pharmacopeia)
• Usually select the best-supported PubChem CID
• We “fuzz” check for chirality, strip salts and cross-check INN PDFs
• Focus on human diseases
• No inorganics (except Li), nutraceuticals or metabolites
• Mainly FDA and EMA
• Withdrawn or discontinued are flagged
• Cross-pointers to approved salt forms, active metabolites, drug > prodrug
• Every entry has curator’s note
• Grateful for feedback and corrections
11
12. GtoPdb
drugs
• The PubChem query (approved[comment]AND "IUPHAR/BPSGuide to
PHARMACOLOGY"[SourceName]) retrieves just our 1291 substances (SIDs)
• These convert to 1174 distinct compound entries (CIDs)
• 96% vendor matches in PubChem
• The 117 SID difference is mainly antibodies 12
Approved set
now a clean
PubChem
select
14. Conclusions
• Chemistry database coverage and annotation depth has expanded
• But so has the “fuzz”
• Ligand choices for pharmacology experiments can be challenging
• Controlling these factors is crucial for experimental reproducibility
• GtoPdb is a good “first-stop-shop” choice
• “Gold Standard” is illusory but we do our best to select the correct structures
• Feedback welcome on coverage gaps or structural equivocality
• We can assist with complex choices
• Explore PubChem as “second-stop-shop”
• Get acquainted with medicinal chemists and/or cheminformaticians
14
15. Thank you; questions welcome
15
Find out more at the BPS stand
PMID: 26464438, PMCID: PMC4702778