SlideShare a Scribd company logo
1 of 47
Download to read offline
CpG Island identification
with Hidden Markov
Models
!
- Kshitij Tayal
1
CpG Island
• Region of the genome with high frequency of CpG
sites than the rest of the genome.
• Formal Definition - CpG island is a region with at
least 200 bp, and a GC percentage that is greater
than 50 % .
• CpG is shorthand for “—C—phosphate—G—that
is, cytosine and guanine separated by only one
phosphate.
2
Genome ~ 3 billion
characters. Find gene ?
3
Importance of CpG Islands
• CpG island acts as a proxy to
identify a gene.
• They often occur at the start of
the gene.
• Cytosines in CpG
dinucleotides can be
methylated(have methyl group
attache) to form 5-
methylcytosine.
4
5
Importance of Methylation
• Our body consist thousand of cell . Every cell of our body
contain same copy of DNA with same blueprint of genetic code,
then how do they decide among themselves which function has
to performed ?
• How Does heart cell know it’s a heart cell
• How Does skin cell know it’s skin cell.
• They need outside instructions from these little carbon hydrogen
compounds called methyl group.
• How characteristics change across generations without changes
to the DNA sequence itself.
6
Epigenetics & CpG Islands
• Literal meaning of epigenetic is ‘above genetics’. It
decides methylation of CpG island
• CpG islands regulate expression of nearby genes.
• Proteins involved in
gene expression
can be repelled or
attracted by the
methyl group
7
Background: Epigenetics
• Environmental factors like what we do, what we eat, what we
smoke and how stressed we are decide the methyl group binding.
• Bad diet can actually lead methyl group binding to the wrong
place and with these bad instruction cell become abnormal and
become disease
• Epigenetics is also controlled by histones. Histones are protein that
are basically spools that DNA wind itself around . Histones can
change how tightly or loosely the DNA is around them.
• If loosely around — the gene get more expressed
• If tightly around — the gene get less expressed
8
9
Background: Epigenetics
• So methyl group is more like a ‘switch’ and histones
are more like a ‘knob’
• Every cell of your body has a distinct methylation
and histones pattern that gives every cell its
marching order.
• DNA can be thought of as body ‘hardware’ and
epigenome is more like a software which tells the
hardware what work it has to do and hence justifies
its meaning.
10
Now Some Computer
Science……..
• Task - Design a method that, given a candidate
string (k-mer), score it according to how confident it
came from CpG Island.
• Apply, Sequence Model which is a probabilistic
model that associates probabilities with
sequences.
11
Sequence Models
• Sequence models learn from examples.
• Say we have sampled 100K 5-mers from inside
CpG islands and 100K 5-mers from outside.
• Can we guess whether CGCGC came from CpG
island.?
• P(inside) = 315/(315 + 12)
12
# CGCGC inside 315
# CGCGC outside 12
Sequence Models
• To estimate p(x) we count # times x appears in the
training set labelled INSIDE divided by total # of
times x appears in training set.
• But for sufficiently long k, we might not see any
occurrences of x, or very few.To overcome this
limitation we will go for joint probability distribution.
• P(X) = P(Xk,Xk-1,………X1) where P(X) is the
probability of sequence X
13
14
15
16
• P(x) now equal product
of all the Markov chain
edge weights on our
string driven walk
through the chain
!
!
• Nodes label are symbol
and transition label are
conditional probability
17
18
19
20
Hidden Markov Model
• In simpler Markov models (like a Markov chain), the
state is directly visible to the observer, and
therefore the state transition probabilities are the
only parameters.
• In a hidden Markov model, the state is not directly
visible, but the output, dependent on the state, is
visible. Each state has a probability distribution
over the possible output tokens. The adjective
'hidden' refers to the state sequence through which
the model passes.
21
22
23
24
25
26
27
28
Hidden Markov Model-
Viterbi Algorithm
• Given flips can we say when the dealer was using
loaded coin.
• We want to find p* , the most likely path given the
emission.
!
• Viterbi algorithm is a dynamic programming algorithm
for finding the most likely sequence of hidden states –
called the Viterbi path – that results in a sequence of
observed events.
29
30
31
32
33
34
35
36
37
38
39
Hidden Markov Model
40
Hidden Markov Model
41
42
EMISSIONS
43
44
Hidden Markov Model
45
46
THANK YOU
47

More Related Content

What's hot (20)

Microsatellite
MicrosatelliteMicrosatellite
Microsatellite
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Dna methylation ppt
Dna methylation pptDna methylation ppt
Dna methylation ppt
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Telomeraz ve Kanser
Telomeraz ve KanserTelomeraz ve Kanser
Telomeraz ve Kanser
 
Physical mapping
Physical mappingPhysical mapping
Physical mapping
 
Microsatellites- Molecular fingerprints
Microsatellites- Molecular fingerprints Microsatellites- Molecular fingerprints
Microsatellites- Molecular fingerprints
 
Molecular Basis of Inheritance : DNA Profiling : Brief History and Satellite DNA
Molecular Basis of Inheritance : DNA Profiling : Brief History and Satellite DNAMolecular Basis of Inheritance : DNA Profiling : Brief History and Satellite DNA
Molecular Basis of Inheritance : DNA Profiling : Brief History and Satellite DNA
 
Genomic instability and Cancer
Genomic instability and CancerGenomic instability and Cancer
Genomic instability and Cancer
 
Dna methylation
Dna methylationDna methylation
Dna methylation
 
Mitochondrial dna
Mitochondrial   dnaMitochondrial   dna
Mitochondrial dna
 
Histone modifications
Histone modificationsHistone modifications
Histone modifications
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
DNA organization in Eukaryotic cells
DNA organization in Eukaryotic cellsDNA organization in Eukaryotic cells
DNA organization in Eukaryotic cells
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Primer designing
Primer designingPrimer designing
Primer designing
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
Regulatory RNA
Regulatory RNARegulatory RNA
Regulatory RNA
 
Cell cycle regulation
Cell cycle regulationCell cycle regulation
Cell cycle regulation
 
Gene overexpression protocol.doc
Gene overexpression protocol.docGene overexpression protocol.doc
Gene overexpression protocol.doc
 

Viewers also liked

Biologie pro bakaláře - Cytogenetika I
Biologie pro bakaláře - Cytogenetika IBiologie pro bakaláře - Cytogenetika I
Biologie pro bakaláře - Cytogenetika Imedik.cz
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study designamlbinder
 
Epigenetics
EpigeneticsEpigenetics
Epigeneticsshethkev
 
METHYLATION CYCLE AND IT\'S POLYMORPHISM
METHYLATION CYCLE AND IT\'S POLYMORPHISMMETHYLATION CYCLE AND IT\'S POLYMORPHISM
METHYLATION CYCLE AND IT\'S POLYMORPHISMbdiconza33
 
Epigenetic
EpigeneticEpigenetic
EpigeneticSmawi GH
 
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesDNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesQIAGEN
 
281 lec24 eukaryotic_regulation2
281 lec24 eukaryotic_regulation2281 lec24 eukaryotic_regulation2
281 lec24 eukaryotic_regulation2hhalhaddad
 
Epigenetics : overview and concepts
Epigenetics : overview and conceptsEpigenetics : overview and concepts
Epigenetics : overview and conceptsPrabhash Bhavsar
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discoveryAmit Ruchi Yadav
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijayVijay Hemmadi
 
Kidney function test
Kidney function testKidney function test
Kidney function testGavin Yap
 
Liquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKAR
Liquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKARLiquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKAR
Liquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKARDr. Ravi Sankar
 
Renal Function Tests by Dr.Ankur Puri
Renal Function Tests by Dr.Ankur PuriRenal Function Tests by Dr.Ankur Puri
Renal Function Tests by Dr.Ankur PuriAnkur Puri
 

Viewers also liked (20)

Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Biologie pro bakaláře - Cytogenetika I
Biologie pro bakaláře - Cytogenetika IBiologie pro bakaláře - Cytogenetika I
Biologie pro bakaláře - Cytogenetika I
 
Epigenetik
EpigenetikEpigenetik
Epigenetik
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Central Dogma Of Dna
Central Dogma Of DnaCentral Dogma Of Dna
Central Dogma Of Dna
 
Introduction to epigenetics and study design
Introduction to epigenetics and study designIntroduction to epigenetics and study design
Introduction to epigenetics and study design
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
METHYLATION CYCLE AND IT\'S POLYMORPHISM
METHYLATION CYCLE AND IT\'S POLYMORPHISMMETHYLATION CYCLE AND IT\'S POLYMORPHISM
METHYLATION CYCLE AND IT\'S POLYMORPHISM
 
Epigenetic
EpigeneticEpigenetic
Epigenetic
 
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesDNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
 
Epigenetics presentation
Epigenetics presentationEpigenetics presentation
Epigenetics presentation
 
281 lec24 eukaryotic_regulation2
281 lec24 eukaryotic_regulation2281 lec24 eukaryotic_regulation2
281 lec24 eukaryotic_regulation2
 
Epigenetics : overview and concepts
Epigenetics : overview and conceptsEpigenetics : overview and concepts
Epigenetics : overview and concepts
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Kidney function test
Kidney function testKidney function test
Kidney function test
 
Liquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKAR
Liquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKARLiquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKAR
Liquid chromatography–mass spectrometry (LC-MS) BY P. RAVISANKAR
 
Renal Function Tests by Dr.Ankur Puri
Renal Function Tests by Dr.Ankur PuriRenal Function Tests by Dr.Ankur Puri
Renal Function Tests by Dr.Ankur Puri
 

Similar to CpG Island Identification with Hidden Markov Models

Similar to CpG Island Identification with Hidden Markov Models (20)

Gene mapping methods
Gene mapping methodsGene mapping methods
Gene mapping methods
 
Genome structure
Genome structure Genome structure
Genome structure
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
Dnareplication
DnareplicationDnareplication
Dnareplication
 
Gene mapping tools
Gene mapping toolsGene mapping tools
Gene mapping tools
 
Lecture 4.ppt
Lecture 4.pptLecture 4.ppt
Lecture 4.ppt
 
8 f forensic d n a analysis (student)
8 f forensic d n a analysis (student)8 f forensic d n a analysis (student)
8 f forensic d n a analysis (student)
 
genomeannotation-160822182432.pdf
genomeannotation-160822182432.pdfgenomeannotation-160822182432.pdf
genomeannotation-160822182432.pdf
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Recombinant DNA Technology -2015
Recombinant DNA Technology -2015Recombinant DNA Technology -2015
Recombinant DNA Technology -2015
 
Seminar 20150920.2
Seminar 20150920.2Seminar 20150920.2
Seminar 20150920.2
 
genome mapping
genome mappinggenome mapping
genome mapping
 
Genomics_final.pptx
Genomics_final.pptxGenomics_final.pptx
Genomics_final.pptx
 
GENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.pptGENOME_STRUCTURE1.ppt
GENOME_STRUCTURE1.ppt
 
Genetic fingerprinting
Genetic fingerprintingGenetic fingerprinting
Genetic fingerprinting
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Modern concept of gene.pdf
Modern concept of gene.pdfModern concept of gene.pdf
Modern concept of gene.pdf
 
Molecular genetics
Molecular geneticsMolecular genetics
Molecular genetics
 
212 basic molecular genetic studies in atherosclerosis
212 basic molecular genetic studies in atherosclerosis212 basic molecular genetic studies in atherosclerosis
212 basic molecular genetic studies in atherosclerosis
 

Recently uploaded

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

CpG Island Identification with Hidden Markov Models

  • 1. CpG Island identification with Hidden Markov Models ! - Kshitij Tayal 1
  • 2. CpG Island • Region of the genome with high frequency of CpG sites than the rest of the genome. • Formal Definition - CpG island is a region with at least 200 bp, and a GC percentage that is greater than 50 % . • CpG is shorthand for “—C—phosphate—G—that is, cytosine and guanine separated by only one phosphate. 2
  • 3. Genome ~ 3 billion characters. Find gene ? 3
  • 4. Importance of CpG Islands • CpG island acts as a proxy to identify a gene. • They often occur at the start of the gene. • Cytosines in CpG dinucleotides can be methylated(have methyl group attache) to form 5- methylcytosine. 4
  • 5. 5
  • 6. Importance of Methylation • Our body consist thousand of cell . Every cell of our body contain same copy of DNA with same blueprint of genetic code, then how do they decide among themselves which function has to performed ? • How Does heart cell know it’s a heart cell • How Does skin cell know it’s skin cell. • They need outside instructions from these little carbon hydrogen compounds called methyl group. • How characteristics change across generations without changes to the DNA sequence itself. 6
  • 7. Epigenetics & CpG Islands • Literal meaning of epigenetic is ‘above genetics’. It decides methylation of CpG island • CpG islands regulate expression of nearby genes. • Proteins involved in gene expression can be repelled or attracted by the methyl group 7
  • 8. Background: Epigenetics • Environmental factors like what we do, what we eat, what we smoke and how stressed we are decide the methyl group binding. • Bad diet can actually lead methyl group binding to the wrong place and with these bad instruction cell become abnormal and become disease • Epigenetics is also controlled by histones. Histones are protein that are basically spools that DNA wind itself around . Histones can change how tightly or loosely the DNA is around them. • If loosely around — the gene get more expressed • If tightly around — the gene get less expressed 8
  • 9. 9
  • 10. Background: Epigenetics • So methyl group is more like a ‘switch’ and histones are more like a ‘knob’ • Every cell of your body has a distinct methylation and histones pattern that gives every cell its marching order. • DNA can be thought of as body ‘hardware’ and epigenome is more like a software which tells the hardware what work it has to do and hence justifies its meaning. 10
  • 11. Now Some Computer Science…….. • Task - Design a method that, given a candidate string (k-mer), score it according to how confident it came from CpG Island. • Apply, Sequence Model which is a probabilistic model that associates probabilities with sequences. 11
  • 12. Sequence Models • Sequence models learn from examples. • Say we have sampled 100K 5-mers from inside CpG islands and 100K 5-mers from outside. • Can we guess whether CGCGC came from CpG island.? • P(inside) = 315/(315 + 12) 12 # CGCGC inside 315 # CGCGC outside 12
  • 13. Sequence Models • To estimate p(x) we count # times x appears in the training set labelled INSIDE divided by total # of times x appears in training set. • But for sufficiently long k, we might not see any occurrences of x, or very few.To overcome this limitation we will go for joint probability distribution. • P(X) = P(Xk,Xk-1,………X1) where P(X) is the probability of sequence X 13
  • 14. 14
  • 15. 15
  • 16. 16
  • 17. • P(x) now equal product of all the Markov chain edge weights on our string driven walk through the chain ! ! • Nodes label are symbol and transition label are conditional probability 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. Hidden Markov Model • In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. • In a hidden Markov model, the state is not directly visible, but the output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. The adjective 'hidden' refers to the state sequence through which the model passes. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 27. 27
  • 28. 28 Hidden Markov Model- Viterbi Algorithm • Given flips can we say when the dealer was using loaded coin. • We want to find p* , the most likely path given the emission. ! • Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events.
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 43. 43
  • 45. 45
  • 46. 46