Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Open Source Bioinformatics 

for Data Scientists
Amanda Schierz
Recent Projects
! Druggability prediction
! 3D structure
! Protein Sequence
! Predict a protein’s druggability based on it...
Drug Discovery Process
Early-stage:
Discovery
Optimisation ADMET
Clinical
Trials
Paperwork
• Target Evaluation
• Compound
...
Biology 101
! There is a many to many relationship between Gene and Protein
! A Protein is a large molecule; a Drug is a s...
! TCGGTCAGGCTAGCCGTTACAGGG
Target Identification
! Prediction of disease-associated genes
! patient level
! gene / protein level
! network
! Predicti...
Druggability Prediction
! Drugs – FDA Approved ~350 Very strict – know
therapeutic benefit
! Drugbank – loose – binds but ...
Druggability Prediction
! Sequence Analysis
- Amino Acid motifs and composition
- Physicochemical descriptors
- infinite a...
Druggability Prediction
! 3D structure
- Pockets, surface area
- Ligand interaction fingerprints
- Supervised classificati...
3D Structure
! PDB, ProtDCal, PockDrug
Druggability Prediction
! Interaction Network
! Many use cases
! Data from EBI and Y2H
! List of binary interactions
! Bec...
Drug Resistance
Drug Resistance
Compound Bioactivity
! Brute force mass screening
! 1000s compounds screened in batches
! Primary Assays; Secondary / conf...
Compound ADMET
! Many use cases
! ADMET of hits
! Absorption
! Distribution
! Metabolism
! Excretion
! Toxicity
! Mutagene...
General Resources
! EBI European Bioinformatics Institute / Pubchem
! API
! Integrates several downloadable Data Sources (...
General Resources
! canSAR database
! Integration of biological, pharmacological, chemical, structural
biology and protein...
Beware 101
! Non-standard Gene names
! Some experiments Genes, some are Proteins
! We need new Drug Targets, different fro...
Therapeutic Opportunities
! Approximately only 350 - 400 protein targets
! DNA damage response (DDR) is essential for main...
Therapeutic Opportunities
! Statistical analysis of DDR deregulation in patients compared
to a random set of genes
! Drugg...
Therapeutic Opportunities
DDR Pathway Signatures
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Upcoming SlideShare
Loading in …5
×

of

Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 1 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 2 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 3 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 4 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 5 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 6 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 7 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 8 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 9 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 10 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 11 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 12 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 13 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 14 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 15 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 16 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 17 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 18 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 19 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 20 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 21 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 22 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 23 Open-Source Bioinformatics for Data Scientists with Amanda Schierz Slide 24
Upcoming SlideShare
Metabolic and Inflammatory Disease R&D: An Assessment of 5 Highly Promising Therapeutic Classes
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Download to read offline

Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Open-Source Bioinformatics for Data Scientists with Amanda Schierz

  1. 1. Open Source Bioinformatics 
 for Data Scientists Amanda Schierz
  2. 2. Recent Projects ! Druggability prediction ! 3D structure ! Protein Sequence ! Predict a protein’s druggability based on it’s position in the protein-protein interaction network ! Drug Resistance ! Therapeutic opportunities ! Identification of new gene targets for cancer ! Are they Druggable? ! Candidate Compounds ! Compounds more likely to be a hit for a bioassay
  3. 3. Drug Discovery Process Early-stage: Discovery Optimisation ADMET Clinical Trials Paperwork • Target Evaluation • Compound Screening • Computational Chemistry • Structure- based Drug Design • Absorption Distribution Metabolism Excretion Toxicity • Patient Stratification • Protocol • Drug Approval
  4. 4. Biology 101 ! There is a many to many relationship between Gene and Protein ! A Protein is a large molecule; a Drug is a small molecule ! Gene Expression data ! The amount of a gene produced. Epigenetics. ! highly / lowly / over / under – fold change ! Warning: Platforms and preprocessing ! Gene Copy Number ! Loss / Gain a gene ! On one strand or 2? ! There are only approx. 400 genetic targets of approved pharmaceuticals ! Only from a handful of Protein Families ! Desperate need for diversity
  5. 5. ! TCGGTCAGGCTAGCCGTTACAGGG
  6. 6. Target Identification ! Prediction of disease-associated genes ! patient level ! gene / protein level ! network ! Prediction of mechanisms of disease ! Epigenetic targets – meta-targets ! Prediction of protein function – from sequence / structure / network ! multi-class; multi-label ! Prediction of 3D structure ! Prediction of protein binding ! New immune targets
  7. 7. Druggability Prediction ! Drugs – FDA Approved ~350 Very strict – know therapeutic benefit ! Drugbank – loose – binds but no therapeutic benefit ! Tractable or Druggable ! Rule of 5 compliant ! Precedence-based - Druggable families / Homology - Ligand-based scoring - Uniprot, bioassays – EBI and Pubchem bioassay - Statistical analysis
  8. 8. Druggability Prediction ! Sequence Analysis - Amino Acid motifs and composition - Physicochemical descriptors - infinite amount – very wide data set - Supervised classification ! FASTA - can download all human sequences from Uniprot >seq0 FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTD ! R ProtR ; R Bioconductor ! species,mhc,peptide_length,A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y ,V,scl1.lag1,scl2.lag1,scl1.lag2,scl2.lag2,scl1.2.lag1,scl2.1.lag1,scl1.2.l ag2,scl2.1.lag2,AA,RA,NA,DA,CA,EA,QA,GA,HA,IA,LA,KA,MA,F A,PA,SA,TA,WA,YA,VA,AR,RR,NR,DR,CR ..... ,Schneider.Xr.K,Schn eider.Xr.M,Schneider.Xr.F, Grantham.Xr.A,Grantham.Xr.R,
  9. 9. Druggability Prediction ! 3D structure - Pockets, surface area - Ligand interaction fingerprints - Supervised classification
  10. 10. 3D Structure ! PDB, ProtDCal, PockDrug
  11. 11. Druggability Prediction ! Interaction Network ! Many use cases ! Data from EBI and Y2H ! List of binary interactions ! Becareful 1: Data is inherently biased ! Becareful 2: Complex interactions ! R iGraph; Gephi for visualisation ! Topological properties ! Community analysis ! Subgraph analysis ! Statistical analysis, network analysis and supervised classification
  12. 12. Drug Resistance
  13. 13. Drug Resistance
  14. 14. Compound Bioactivity ! Brute force mass screening ! 1000s compounds screened in batches ! Primary Assays; Secondary / confirmatory assays ! Can be binary classification or regression ! The IC50 is a measure of how effective a drug is. ! Active / inactive : IC50 threshold ! Goal is also to identify diverse compound structures ! Scaffold Hopping ! Same kind of method as Protein Sequence conversion ! Pharmacophore fingerprints ! https://www.chemaxon.com/free-software/
  15. 15. Compound ADMET ! Many use cases ! ADMET of hits ! Absorption ! Distribution ! Metabolism ! Excretion ! Toxicity ! Mutagenecity ! Protein binding
  16. 16. General Resources ! EBI European Bioinformatics Institute / Pubchem ! API ! Integrates several downloadable Data Sources (expression, Copy Number, Bioassays, network, disease-specific) ! Baseline data (Normal not diseased) ! Protein Data Bank – 3D Structures ! DrugBank ! Cancer – The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) ! Coding Tools – R Bioconductor , BioPerl, BioPython ! https://docs.chemaxon.com/display/docs/Documentation
  17. 17. General Resources ! canSAR database ! Integration of biological, pharmacological, chemical, structural biology and protein network data
  18. 18. Beware 101 ! Non-standard Gene names ! Some experiments Genes, some are Proteins ! We need new Drug Targets, different from established ones. ! Keep in mind when analysing results ! Cancer is difficult ! Drug resistance ! Data is not up with the science ! Tumour Heterogeneity ! Wide data = random patterns ! Different expression / sequencing platforms
  19. 19. Therapeutic Opportunities ! Approximately only 350 - 400 protein targets ! DNA damage response (DDR) is essential for maintaining the genomic integrity of the cell ! Currently targeted by chemotherapy and radiation. Goal is for small molecule targeting ! TCGA Patient Analysis: Expression, Copy Number Variation and Mutation data. ! 15 cancer disease types ! Telegraph March 2015 ! New drugs to tackle cancer cell weak spots could end 'scattergun' chemotherapy Laurence H. Pearl, Amanda C. Schierz, Simon E. Ward, Bissan Al-Lazikani, Frances M. G. Pearl. Therapeutic opportunities within the DNA Damage Response. Nature Cancer Reviews
  20. 20. Therapeutic Opportunities ! Statistical analysis of DDR deregulation in patients compared to a random set of genes ! Druggability prediction of deregulated DDR genes ! Synthetic Lethality analysis of Yeast DDR orthologues ! Two genes are synthetic lethal if mutation of either alone is fine but mutation of both leads to cell death. Targeting a gene that is synthetic lethal to a cancer-relevant mutation theoretically will kill only cancer cells.
  21. 21. Therapeutic Opportunities
  22. 22. DDR Pathway Signatures
  • DikshaPatel11

    Jul. 7, 2018

Open-Source Bioinformatics for Data Scientists with Amanda Schierz

Views

Total views

1,422

On Slideshare

0

From embeds

0

Number of embeds

131

Actions

Downloads

19

Shares

0

Comments

0

Likes

1

×