SlideShare a Scribd company logo
1 of 45
Download to read offline
Genome Assembly 
Frank Austin Nothaft 
CS176, 10/16/2014
Processing Reads 
• As we’ve covered before, if we already have a 
reference assembly, we can process reads by 
aligning to the reference genome
The Sequencing Abstraction 
It was the best of times, it was the worst of times… 
worst of times 
was the worst 
the worst of 
• Sequencing performs a poisson distributed 
sampling of substrings from a larger string 
• Reads are exact substrings (i.e., error free) 
Metaphor borrowed from Michael Schatz 
It was the 
the best of 
times, it was 
best of times
The Alignment Abstraction 
It was the best of times, it was the worst of times… 
It was the 
the best of 
times, it was 
worst of times 
the worst of 
best of times was the worst 
was the worst 
It was the 
worst of times 
the best of 
the worst of 
times, it was 
best of times
But! 
• What do we do if we don’t have a reference 
genome to map against? 
• Can we use information in the reads to assemble 
the reads together into a string?
Sequence Assembly 
was the worst 
best of times 
It was the 
worst of times 
the best of 
the worst of 
times, it was 
It was the 
the best of 
best of times 
times, it was 
was the worst 
the worst of 
worst of times 
It was the best of times, it was the worst of times…
The Assembly Problem 
• Given a set of reads, we want to assemble the 
“best” contigs possible 
• Contig = contiguous sequence 
• Two general formulations for assembly: 
• Overlap-layout-consensus (OLC) 
• de Brujin graph (DBG)
Assembly was the 
Human Genome Project! 
(in a nutshell)
Assembly is Graph Traversal 
• In OLC, we create an overlap graph, and find a 
Hamiltonian path 
• In DBG, we create a de Brujin graph, and find an 
Eulerian path
Overlap Graphs 
• Given a set of reads, represents how these reads 
overlap 
Nodes are reads, edges are overlaps.
Example Overlap Graph 
It was the 
the best of 
times, it was 
the worst of 
worst of times 
best of times 
was the worst 
the best of 
It was the 
times, it was 
worst of times 
the worst of 
best of times 
was the worst
Hamiltonian Path 
• A Hamiltonian Path is a path which visits each node 
in the graph exactly once
Computing Overlaps 
• To compute overlaps between two reads, we 
compute the pairwise alignment of these two reads 
• This can be done using dynamic programming 
(Smith-Waterman) or a profile HMM 
• We can accelerate this with indexing-based 
methods, similar to those in SNAP
Two Problems 
1. Overlapping is expensive: 
• Must compute O(n2) overlaps, n = # reads 
• Computing an overlap is O(l2), l = read length 
2. Hamiltonian Path is NP-hard: 
• Approximate solvers exist, but don’t scale up 
to genomics datasets
de Brujin Graphs 
• In a de Brujin graph, nodes are k-mers, and edges 
represent observed transitions between k-mers 
• k-mers are k-length substrings from reads 
ACACTGCACT 
ACCAAC 
ACT 
CTG 
T GGCCA 
C AACCT
de Brujin Graphs 
• In a de Brujin graph, we may have multiple paths 
between two nodes 
ACACTGCACT 
ACCAAC 
ACT 
CTG 
T GGCCA 
C AACCT 
ACA CAC ACT 
GCA TGC CTG
Eulerian Path 
• In an Eulerian path, we use every edge exactly once 
• Preconditions for finding an Eulerian path assembly on a DBG: 
1. One node must have one more edge leaving than 
entering 
2. One node must have one more edge entering than 
leaving 
3. All other nodes must have equal numbers of edges 
entering and leaving
Finding an Eulerian Path 
• Connect the two nodes with unbalanced edges 
• This provides us an Eulerian cycle 
• From an arbitrary node n, walk the graph until we return from 
n, and save the path we’ve walked 
• Until all edges have been used: 
• Pick a point n’ from our path, where n’ has unused 
edges 
• Walk from n’ until we return to n’, and track visited edges
Problems with Eulerian Path 
• For a given graph, we may have multiple valid paths! 
CAA AAT 
8 
9 
1 2 
ACA CAC ACT 
GCA TGC CTG 
10 
ATG 7 
3 
5 4 
6 
11 
ACACTGCACAATGC 
CAA AAT 
1 
2 
8 9 
ACA CAC ACT 
GCA TGC CTG 
3 
ATG 7 
10 
5 11 
6 
4 
ACAATGCACACTGC
How Many Paths? 
Kingsford et al, BMC Bioinformatics 2010
How Do We Assemble 
Multiple Reads? 
• In practice, de Brujin graphs are additive 
• This allows us to merge graphs from multiple reads 
• When do we keep/remove edges?
ACACTGC 
ACCAAC 
ACT 
CTG 
TGC 
CAC ACT 
GCA TGC CTG 
CTGCACT 
CTG 
T GGCCA 
C AACCT 
ACA CAC ACT 
TGC CTG 
ACA CAC ACT 
GCA TGC CTG
Errors! 
• One of the key assumptions that we make in the 
sequencing process is that reads are correct 
• But, in reality, reads have a 2% error rate 
• How does this impact us?
What Are The Errors Like? 
ACATATAGAA 
AGATATAGAN 
• Currently, the most common sequencing technology 
is called Illumina 
• Errors tend to be a misread of a single base 
• Errors tend to be clustered at the ends of reads
Errors In Action 
ACCCAAATCTAATCAAGGC 
CCCAAATCTAATCAAGGCT 
ACTCTACCTCCCAAGCTCT 
CTCTACCTCCCAAGCTCTA 
TCTACCTCCCAAGCTCTAG 
CTACCTCCCAAGCTCTAGG 
CCAAATCTAATCAAGGCTC 
CAAATCTAATCAAGGCTCC 
AAATCTAATCAAGGCTCCC 
AATCTAATCAAGGCTCCCA 
CTAACTCCCAAGCTCTAGG 
AAAGGAAGATCATGAAATA 
AAGGAAGATCATGAAATAC 
AGGAAGATCATGAAATACC 
GGAAGATCATGAAATACCA 
GAAGATCATGAAATACCAC 
AGATCATGAAATACCACCA 
CATGAAATACCACCATGGG 
ATGAAATACCACCATGGGG 
TGAAATACCACCATGGGGA 
ATGGGGATTCAATCAGCAA 
TGGGGATTCAATCAGCAAA 
GGGGATTCAATCAGCAAAT GGGGATTCAATCAGCAAAG 
GGGATTCAATCAGCAAATT GGGATTCAATCAGCAAAGT 
AGATTCAATCAGCAAATTC 
AGCAAATTCTGAAATGCAT AGCAAATTCTGAAATGCAA 
GTGAAATGCAACATTGCCA 
TGAAATGCAACATTGCCAT 
GAAATGCAACATTGCCATT 
CATTGCCATTTACCCTGCT 
ATTGCCATTTACCCTGCTT 
TTGCCATTTACCCTGCTTG 
TCTGAGGAAGAATTTGAGA 
TGAGGAAGAATTTGAGATG 
GAGGAAGAATTTGAGATGA 
AGGAAGAATTTGAGATGAG 
GGAAGAATTTGAGATGAGG 
GACTAAGGAAGATCATGAA 
ACTAAGGAAGATCATGAAA 
CTAAGGAAGATCATGAAAT 
ACTCCCAAGCTCTAGGATA 
CTCCCAAGCTCTAGGATAT 
TCCCAAGCTCTAGGATATA 
AGAATTTGAGATGAGGGGA 
AATTTGAGATGAGGGGACG 
ATTTGAGATGAGGGGACGG 
TTTGAGATGAGGGGACGGA 
GAGGGGACGGATTTGCTGC 
AGGGGACGGATTTGCTGCC 
GGGGACGGATTTGCTGCCT 
GAGATTCAATCAGCAAATT 
CAGCCAATTCTGAAATGCA 
AGCCAATTCTGAAATGCAA 
GCCAATTCTGAAATGCAAC 
CCAATTCTGAAATGCAACA 
CAATTCTGAAATGCAACAT 
AATTCTGAAATGCAACATT 
CATTATCCTTCACCCCGCT 
ATTATCCTTCACCCCGCTT 
TTATCCTTCACCCCGCTTG 
TATCCTTCACCCCGCTTGG 
ATCCTTCACCCCGCTTGGC 
TCCTTCACCCCGCTTGGCC 
TGCCATTTACCCTGCTTGG 
GCCATTTACCCTGCTTGGC 
ATTTACCCTGCTTGGCCTA 
TTTACCCTGCTTGGCCTAA 
CCCCTGCTTGGCCTAAAAG 
CCCTGCTTGGCCTAAAAGT 
CCTAAAAGTTCAAAATAAC 
CTAAAAGTTCAAAATAACA 
TACCAGAGCCTGTTATATT 
ACCAGAGCCTGTTATATTT 
CCAGAGCCTGTTATATTTT 
CATGAAATACCACCATGGT 
ATGAAATACCACCATGGTG 
TGAAATACCACCATGGTGA 
NGGATTCAATCAGCAAATT 
ATTCAATCAGCAAATTCTG 
TTCAATCAGCAAATTCTGA 
TCAATCAGCAAATTCTGAA 
CAATCAGCAAATTCTGAAA 
AATCAGCAAATTCTGAAAT 
TTTGCTGCCTCTGAGGAGG 
TTGCTGCCTCTGAGGAGGG 
TGCTGCCTCTGAGGAGGGC 
GAGGAGGGCATTAGAATAG 
AGGAGGGCATTAGAATAGA 
GGAGGGCATTAGAATAGAA 
ACTCCAGGAAAAAGTCAGC 
CTCCAGGAAAAAGTCAGCT 
TCCAGGAAAAAGTCAGCTG 
GCAAAGTCTGAAATGCAAC 
CAAAGTCTGAAATGCAACA 
ACATTATCCTTCACCCTGC 
CATTATCCTTCACCCTGCT 
ATTATCCTTCACCCTGCTT 
TATCCTTCACCCTGCTTGG 
ATCCTTCACCCTGCTTGGC 
TCCTTCACCCTGCTTGGCC 
GGCCTAAAAGTACAAAAAA 
GCCTAAAAGTACAAAAAAA 
CCTAAAAGTACAAAAAAAC 
ATTCTGAAATGCATCATTA 
TTCTGAAATGCATCATTAT 
TCTGAAATGCATCATTATC 
CTGAAATGCATCATTATCC 
TGAAATGCATCATTATCCT 
CTTCCCCCTGCTTGGCCTA 
CCCCCTGCTTGGCCTAAAA 
CCTGCTTGGTCTAAAAGTA 
CTGCTTGGTCTAAAAGTAC 
TGCTTGGTCTAAAAGTACA 
AAAAGTACAAAATAACACG 
AAAGTACAAAATAACACGA 
AAGTACAAAATAACACGAA 
TACAAAATAACACGAAGAA 
ACAAAATAACACGAAGAAA 
CAAAATAACACGAAGAAAA 
ACACGAAGAAAAATTAGTT 
CACGAAGAAAAATTAGTTT 
ACGAAGAAAAATTAGTTTC 
AGAAAAATTAGTTTCCAGA 
GAAAAATTAGTTTCCAGAG 
AAAAATTAGTTTCCAGAGC 
CCCAAGCTCTAGGACATAC 
CCAAGCTCTAGGACATACC 
ACATACCAAGGACAAAGGA 
CATACCAAGGACAAAGGAA 
ATACCACCATGGTGATTCA 
TACCACCATGGTGATTCAA 
ACCACCATGGTGATTCAAT 
CCACCATGGTGATTCAATC 
GGTCTAAAAGTACAAAATA 
GTCTAAAAGTACAAAATAA 
TCTAAAAGTACAAAATAAC 
CCATGGGGATTCGATCAGC 
CATGGGGATTCGATCAGCA 
ATGGGGATTCGATCAGCAA 
ACAAAAAAACACGAAGAAC 
CAAAAAAACACGAAGAACC 
ACGAAGAACCATTAGTTAC 
CCAGAGCCAGTTATATTTT 
CAGAGCCAGTTATATTTTG 
AGAGCCAGTTATATTTTGA 
TACCAAGGACAAAGGAAGA 
ACCAAGGACAAAGGAAGAT 
CCAAGGACAAAGGAAGATC 
CCTGCTTGACTTAAAAGTA 
CTGCTTGACTTAAAAGTAC 
TGCTTGACTTAAAAGTACA 
GCTCTAGGACATACCAAGG 
CTCTAGGACATACCAAGGA 
TCTAGGACATACCAAGGAC 
AATCAGCAAAGTCTGAAAT 
ATCAGCAAAGTCTGAAATG 
TCAGCAAAGTCTGAAATGC 
GGAAGATCATGAAATCCCA 
GAAGATCATGAAATCCCAC 
AAGATCATGAAATCCCACC 
NNNNNNNNNNNNTTTCTGA 
NNNNNNNNNNNTTTCTGAA 
NNNNNNNNNNTTTCTGAAT 
CCAGAGCCAGTTATACTTT 
CAGAGCCAGTTATACTTTG 
AGAGCCAGTTATACTTTGA 
ATGAAATCCCACCATGGGG 
TGAAATCCCACCATGGGGA 
GAAATCCCACCATGGGGAT 
AATCAGCCAATTCTGAAAT 
ATCAGCCAATTCTGAAATG 
TCAGCCAATTCTGAAATGC 
GATTCAATCAGCAAATTCT 
CCAGGAAAAAGTCAGCTGT 
CAGGAAAAAGTCAGCTGTG 
AGGAAAAAGTCAGCTGTGT 
AATAACACGAAGAAAAATT 
ATAACACGAAGAAAAATTA 
TAACACGAAGAAAAATTAG 
GCCAGTTATATTGTTAAAA 
CCAGTTATATTGTTAAAAA 
CAGTTATATTGTTAAAAAT 
AGTTATATTGTTAAAAATC 
TAAAAATCACCCAAAAACC 
AAAAATCACCCAAAAACCA 
AATCAACGATAGAATATAC 
ATCAACGATAGAATATACA 
TCAACGATAGAATATACAG 
GCCAGTTATATTTTGAAAA 
CCAGTTATATTTTGAAAAA 
GCCTAAAAGGACAAAACAA 
CCTAAAAGGACAAAACAAC 
CTAAAAGGACAAAACAACA 
AAAATAACACGAGGAAAAA 
AAATAACACGAGGAAAAAT 
AATAACACGAGGAAAAATT 
GCTTGACTTAAAAGTACAA 
AACTCCCAAGCTCTAGGAC 
ACTCCCAAGCTCTAGGACA 
CTCCCAAGCTCTAGGACAT 
GAAGAACCATTAGTTACCA 
AAGAACCATTAGTTACCAG 
AGAACCATTAGTTACCAGA 
TCCCAAGCTCTAGGACATA 
ATCCTTCCCCCTGCTTGGC 
TCCTTCCCCCTGCTTGGCC 
CCTTCCCCCTGCTTGGCCT 
ATCACCCAAAAACCAAGAA 
TCACCCAAAAACCAAGAAT 
CACCCAAAAACCAAGAATC 
AAGGACAAAGGAAGATCAT 
AGGACAAAGGAAGATCATG 
GGACAAAGGAAGATCATGA 
AAAAACACGAAGAACCATT 
AAAACACGAAGAACCATTA 
AAACACGAAGAACCATTAG 
GAGCCAGTTATATTTTGAA 
AGCCAGTTATATTTTGAAA 
TTAGTTTCCACAGCCTGTT 
TAGTTTCCACAGCCTGTTA 
AGTTTCCACAGCCTGTTAT 
CCATCGGAATCCACTCAGC 
CATCGGAATCCACTCAGCA 
ATCGGAATCCACTCAGCAA 
ATACCAAGGACAAAGGAAG 
GGGACGGATTTGCTGCCTC 
GGACGGATTTGCTGCCTCT 
CAAAGCTAATCAAGGCTCC 
AAAGCTAATCAAGGCTCCC 
AAGCTAATCAAGGCTCCCA 
ATTAGTTTCCAGAGCCAGT 
TTAGTTTCCAGAGCCAGTT 
TAGTTTCCAGAGCCAGTTA 
TACCTCCCAAGCTCTAGGA 
ACCTCCCAAGCTCTAGGAT 
CCTCCCAAGCTCTAGGATA 
TGCAACATTGCCATTTACC 
GCAACATTGCCATTTACCC 
CAACATTGCCATTTACCCT 
AAATCCCACCATGGGGATT 
AATCCCACCATGGGGATTC 
ATCCCACCATGGGGATTCA 
CAGCAAATTCTGAAATGCN 
AGCAAATTCTGAAATGCNN 
GCAAATTCTGAAATGCNNN 
AAATGCNNNNNNNNNNNNN 
AATGCNNNNNNNNNNNNNN 
ATGCNNNNNNNNNNNNNNN 
NNNNNNNNNNNNNNNANAT 
NNNNNNNNNNNNNNANATT 
NNNNNNNNNNNNNANATTN 
ATCAAGGCTCCCACTCTAC 
TCAAGGCTCCCACTCTACC 
CAAGGCTCCCACTCTACCT 
TTATCCTTCACCCTGCTTG 
TATCCTTCACCCTGCTTGA 
CCAAGCTCTAGGATATACC 
CAAGCTCTAGGATATACCA 
AAGCTCTAGGATATACCAA 
AGAATCAACGATAGAATAT 
GAATCAACGATAGAATATA 
CTCAGCAAATTCTGAAATG 
TCAGCAAATTCTGAAATGC 
CAGCAAATTCTGAAATGCA 
ATATTGTTAAAAATCACCC 
TATTGTTAAAAATCACCCA 
ATTGTTAAAAATCACCCAA 
GATCATGAAATCCCACCAT 
ATCATGAAATCCCACCATG 
TCATGAAATCCCACCATGG 
TTCCACAGCCTGTTATATT 
TCCACAGCCTGTTATATTT 
CCACAGCCTGTTATATTTT 
GAAATACCACCATGGTGAT 
AAATACCACCATGGTGATT 
AATACCACCATGGTGATTC 
CCATGNGGATTCAATCAGC 
CATGNGGATTCAATCAGCA 
ATGNGGATTCAATCAGCAA 
GCCTGTTATATTTTGAAAA 
CCTGTTATATTTTGAAAAC CCTGTTATATTTTGAAAAA 
TAGACCAAGGACAAAGGAA 
AGACCAAGGACAAAGGAAG 
GACCAAGGACAAAGGAAGA 
TGCCTCTGAGGAGGGCATT 
GCCTCTGAGGAGGGCATTA 
CCTCTGAGGAGGGCATTAG 
ATCTAATCAAGGCTCCCAC 
TCTAATCAAGGCTCCCACT 
CTAATCAAGGCTCCCACTC 
TATACCAAGGACAAAGGAA 
TCACCCTGCTTGGCCTAAA 
CACCCTGCTTGGCCTAAAA 
ACCCTGCTTGGCCTAAAAG 
CAATCTGAGGAAGAATTTG 
AATCTGAGGAAGAATTTGA 
ATCTGAGGAAGAATTTGAG 
TCATTATCCTTCCCCCTGC 
CATTATCCTTCCCCCTGCT 
ATTATCCTTCCCCCTGCTT 
CTGCCTCTGAGGAGGGCAT 
CNNNNNNNNNNNNNNNNNN 
NNNNNNNNNNNNNNNNNNN 
NNNNNNNNNNNNNNNNNNT NNNNNNNNNNNNNNNNNNA 
AACGATAGAATATACAGTA 
ACGATAGAATATACAGTAC 
CGATAGAATATACAGTACA 
AATGCATCATTATCCTTCC 
ATGCATCATTATCCTTCCC 
TGCATCATTATCCTTCCCC 
AGTACAAAATAACACGAAG 
GTACAAAATAACACGAAGA 
TTACCAGAGCCTGTTATAT 
GTACATTCCTTCCCCGGAA 
TACATTCCTTCCCCGGAAG 
ACATTCCTTCCCCGGAAGC 
GTTTCCAGAGCCTGTTATA 
TTTCCAGAGCCTGTTATAT 
TTCCAGAGCCTGTTATATT 
AACTCCAGGAAAAAGTCAG 
TCGATCAGCAAATTCTGAA 
CGATCAGCAAATTCTGAAA 
GATCAGCAAATTCTGAAAT 
AGCTTCCACAGTTGCATCA 
GCTTCCACAGTTGCATCAG 
CTTCCACAGTTGCATCAGC 
GCATCATTATCCTTCCCCC 
CATCATTATCCTTCCCCCT 
ATCATTATCCTTCCCCCTG 
GATAGAATATACAGTACAT 
AAAATAACACGAAGAAAAA 
AAATAACACGAAGAAAAAT 
AGTTTCCAGAGCCTGTTAT 
AATTAGTTTCCAGAGCCAG 
TTATCCTTCCCCCTGCTTG 
TATCCTTCCCCCTGCTTGG 
GATAGACCAAGGACAAAGG 
ATAGACCAAGGACAAAGGA 
CAACGATAGAATATACAGT 
CCACCATCGGAATCCACTC 
CACCATCGGAATCCACTCA 
ACCATCGGAATCCACTCAG 
CTAGGACATACCAAGGACA 
TAGGACATACCAAGGACAA 
AGGACATACCAAGGACAAA 
AAAAAAACACGAAGAACCA 
AAAAAACACGAAGAACCAT 
CAAATTCTGAAATGCAACA 
AAATTCTGAAATGCAACAT 
GCTGCCTCTGAGGAGGGCA 
CCCCGCTTGGCCTAAAAGT 
CCCGCTTGGCCTAAAAGTA 
CCGCTTGGCCTAAAAGTAC 
GAATATACAGTACATTCCT 
AATATACAGTACATTCCTT 
ATATACAGTACATTCCTTC 
TGGGGATTCGATCAGCAAA 
GGGGATTCGATCAGCAAAT 
GGGATTCGATCAGCAAATT 
AACATTATCCTTCACCCTG 
CACTCTAACTCCCAAGCTC 
ACTCTAACTCCCAAGCTCT 
CTCTAACTCCCAAGCTCTA 
CTAAAAGTACAAAAAAACA 
CACCATGGTGATTCAATCA 
ACCATGGTGATTCAATCAG 
TAAAAGTACAAAAAAACAC 
TGGCCTAAAAGGACAAAAC 
GGCCTAAAAGGACAAAACA 
AATACCACCATGGGGATTC 
ATACCACCATGGGGATTCA 
TACCACCATGGGGATTCAA 
TAACTCCCAAGCTCTAGGA 
AACTCCCAAGCTCTAGGAT 
NATTAGTTTCCAGAGCCTG 
ATTAGTTTCCAGAGCCTGT 
TTAGTTTCCAGAGCCTGTT 
TCTGAAATGCAACATTATC 
CTGAAATGCAACATTATCC 
TGAAATGCAACATTATCCT 
TAGAATAACTCCAGGAAAA 
AGAATAACTCCAGGAAAAA 
GAATAACTCCAGGAAAAAG 
TTCACCCTGCTTGGCCTAA 
TAGTTACCAGAGCCTGTTA 
AGTTACCAGAGCCTGTTAT 
GTTACCAGAGCCTGTTATA 
TCCCACCATGGGGATTCAA 
CCCACCATGGGGATTCAAT 
CCACCATGGGGATTCAATC 
GAAGCTTCCACAGTTGCAT 
AAGCTTCCACAGTTGCATC 
ATTCTGAAATGCAACATTA 
TTCTGAAATGCAACATTAT 
AACATTGCCATTTACCCTG 
ACATTGCCATTTACCCTGC 
CGAAGAAAAATTAGTTTCC 
GAAGAAAAATTAGTTTCCA 
GATTCAATCAGCAAAGTCT 
ATTCAATCAGCAAAGTCTG 
TTCAATCAGCAAAGTCTGA 
GCCAGTTATACTTTGAAAA 
CCAGTTATACTTTGAAAAA 
ACCATGNGGATTCAATCAG 
CAAGAATCAACGATAGAAT 
AAGAATCAACGATAGAATA 
CACAGCCTGTTATATTTTG 
ACAGCCTGTTATATTTTGA 
CAGCCTGTTATATTTTGAA 
GGATTCGATCAGCAAATTC 
GATTCGATCAGCAAATTCT 
GAGGGCATTAGAATAGAAT 
AGGGCATTAGAATAGAATA 
GGGCATTAGAATAGAATAA 
CGAGGAAAAATTAGTTTCC 
GAGGAAAAATTAGTTTCCA 
AGGAAAAATTAGTTTCCAG 
CCATGGTGATTCAATCAGC 
AAATTCTGAAATGCNNNNN 
AATTCTGAAATGCNNNNNN 
ATTCTGAAATGCNNNNNNN 
ATTAGTTTCCACAGCCTGT 
TCGAGGAAAAATTAGTTTC 
AACACGAGGAAAAATTAGT 
ACACGAGGAAAAATTAGTT 
CACGAGGAAAAATTAGTTT 
TAGGATATACCAAGGACTA 
AGGATATACCAAGGACTAA 
GGATATACCAAGGACTAAG 
GTTTCCAGAGCCAGTTATA 
TTTCCAGAGCCAGTTATAT TTTCCAGAGCCAGTTATAC 
TTCCAGAGCCAGTTATATT TTCCAGAGCCAGTTATACT 
TGNGGATTCAATCAGCAAA 
GNGGATTCAATCAGCAAAT 
TCGGAATCCACTCAGCAAA 
CGGAATCCACTCAGCAAAT 
GGAATCCACTCAGCAAATT 
CTTCACCCCGCTTGGCCTA 
TTCACCCCGCTTGGCCTAA 
TCACCCCGCTTGGCCTAAA 
CTCGAGGAAAAATTAGTTT 
CCTTCACCCCGCTTGGCCT 
GTCTGAAATGCAACATTAT 
NNNNNNANATTNTNANAAA 
CGAAGAACCATTAGTTACC 
ATTCGATCAGCAAATTCTG 
TTCGATCAGCAAATTCTGA 
AGCCAATCTGAGGAAGAAT 
GCCAATCTGAGGAAGAATT 
CCAATCTGAGGAAGAATTT 
CACCCTGCTTGACTTAAAA 
ACCCTGCTTGACTTAAAAG 
CCCTGCTTGACTTAAAAGT 
AAAGTACAAAAAAACACGA 
AAGTACAAAAAAACACGAA 
AGTACAAAAAAACACGAAG 
CGGATTTGCTGCCTCTGAG 
GGATTTGCTGCCTCTGAGG 
GATTTGCTGCCTCTGAGGA 
CAATCAGCAAAGTCTGAAA 
ATGCAACATTATCCTTCAC 
TGCAACATTATCCTTCACC TGCAACATTATCCTTCACA 
GCAACATTATCCTTCACCC GCAACATTATCCTTCACAC 
TCCAGAGCCAGTTATATTT 
AAAATCACCCAAAAACCAA 
AAATCACCCAAAAACCAAG 
AATCACCCAAAAACCAAGA 
AAGTACAAAATAACACGAG 
AGTACAAAATAACACGAGG 
GTACAAAATAACACGAGGA 
AGTTTCCAGAGCCAGTTAT 
AGGCTCCCACTCTACCTCC 
GGCTCCCACTCTACCTCCC 
GCTCCCACTCTACCTCCCA 
CAAATTCTGAAATGCNNNN 
TGTTAAAAATCACCCAAAA 
GTTAAAAATCACCCAAAAA 
TTAAAAATCACCCAAAAAC 
TTCCTTCCCCGGAAGCTTC 
TCCTTCCCCGGAAGCTTCC 
CCTTCCCCGGAAGCTTCCA 
TATACCAAGGACTAAGGAA 
ATACCAAGGACTAAGGAAG 
TACCAAGGACTAAGGAAGA 
ATGAGGGGACGGATTTGCT 
TGAGGGGACGGATTTGCTG 
TGCNNNNNNNNNNNNNNNN 
GCNNNNNNNNNNNNNNNNN 
CCTTCACACTGCTTGGCCT 
CTTCACACTGCTTGGCCTA 
TTCACACTGCTTGGCCTAA 
CCACTCTAACTCCCAAGCT 
CCTGCTTGGCCTAAAAGTA 
CCGGAAGCTTCCACAGTTG 
CGGAAGCTTCCACAGTTGC 
GGAAGCTTCCACAGTTGCA 
CAAGGCTCCCACTCTAACT 
AAGGCTCCCACTCTAACTC 
AGGCTCCCACTCTAACTCC 
CCCAAGCTCTAGGATATAC 
AAATTCTGAAATGCATCAT 
AATTCTGAAATGCATCATT 
GTTTCCACAGCCTGTTATA 
TTTCCACAGCCTGTTATAT 
AAGATCATGAAATACCACC 
AACCATTAGTTACCAGAGC 
ACCATTAGTTACCAGAGCC 
CCATTAGTTACCAGAGCCT 
CAAGGACAAAGGAAGATCA 
CTTGACTTAAAAGTACAAA 
AGTTCAAAATAACACGAGG 
GTTCAAAATAACACGAGGA 
TTCAAAATAACACGAGGAA 
ATAACACGAGGAAAAATTA 
TTCCACAGTTGCATCAGCG 
TCCACAGTTGCATCAGCGT 
CCACAGTTGCATCAGCGTA 
CTCCCACTCTACCTCCCAA 
TCCCACTCTACCTCCCAAG 
CCCACTCTACCTCCCAAGC 
CAAGGACTAAGGAAGATCA 
AAGGACTAAGGAAGATCAT 
AGGACTAAGGAAGATCATG 
TGGTGATTCAATCAGCAAA 
GGTGATTCAATCAGCAAAT 
GTGATTCAATCAGCAAATT 
AGCCTGTTATATTTTGAAA 
AGATGAGGGGACGGATTTG 
GATGAGGGGACGGATTTGC 
GGAAAAATTAGTTTCCAGA 
TATACAGTACATTCCTTCC 
ATACAGTACATTCCTTCCC 
NNNNNNNNNNNNNNNNANA 
AATAACTCCAGGAAAAAGT 
ATAACTCCAGGAAAAAGTC 
TAACTCCAGGAAAAAGTCA 
AGCTAATCAAGGCTCCCAC 
GCTAATCAAGGCTCCCACT 
CTCTAGGATATACCAAGGA 
TCTAGGATATACCAAGGAC 
CTAGGATATACCAAGGACA CTAGGATATACCAAGGACT 
GGACTAAGGAAGATCATGA 
CAAAGGAAGATCATGAAAT 
AAAGGAAGATCATGAAATC 
AAGGAAGATCATGAAATCC 
GAGCCAGTTATACTTTGAA 
CAACATTATCCTTCACCCC CAACATTATCCTTCACCCT 
AACATTATCCTTCACCCCG 
CAGAGCCTGTTATATTTTG 
AGAGCCTGTTATATTTTGA 
TTGAGATGAGGGGACGGAT 
CAGTACATTCCTTCCCCGG 
AGTACATTCCTTCCCCGGA 
AGCTCTAGGATATACCAAG 
GCTCTAGGATATACCAAGG 
GCTTGGTCTAAAAGTACAA 
CTTGGTCTAAAAGTACAAA 
TTGGTCTAAAAGTACAAAA 
TTGTTAAAAATCACCCAAA 
TAGGATATACCAAGGACAA 
AGGATATACCAAGGACAAA 
GGATATACCAAGGACAAAG 
CCCAAAAACCAAGAATCAA 
CCAAAAACCAAGAATCAAC 
CAAAAACCAAGAATCAACG 
AATACCACCATCGGAATCC 
ATACCACCATCGGAATCCA 
TACCACCATCGGAATCCAC 
TTACCCTGCTTGGCCTAAA 
TACCCTGCTTGGCCTAAAA 
TTCACCCTGCTTGACTTAA 
TCACCCTGCTTGACTTAAA 
GACATACCAAGGACAAAGG 
ATCCTTCACCCTGCTTGAC 
TCCTTCACCCTGCTTGACT 
TCCAGAGCCAGTTATACTT 
ACCACCATCGGAATCCACT 
CACTCAGCAAATTCTGAAA 
ACTCAGCAAATTCTGAAAT 
TTAGAATAGAATAACTCCA 
TAGAATAGAATAACTCCAG 
AGAATAGAATAACTCCAGG 
TGGTCTAAAAGTACAAAAT 
GGAAAAAGTCAGCTGTGTT 
GAAAAAGTCAGCTGTGTTG 
TGCTTGGCCTAAAAGTACA 
GCTTGGCCTAAAAGTACAA 
CTTGGCCTAAAAGTACAAA 
GGATTCAATCAGCAAATTC 
NNNNNNNNNNNNNNNNNAN 
CCCCGGAAGCTTCCACAGT 
CCCGGAAGCTTCCACAGTT 
TATCCTTCACACTGCTTGG 
ATCCTTCACACTGCTTGGC 
TCCTTCACACTGCTTGGCC 
GAAATACCACCATGGGGAT 
AAATACCACCATGGGGATT 
AGATCATGAAATCCCACCA 
GAATAGAATAACTCCAGGA 
AATAGAATAACTCCAGGAA 
CAATCAGCCAATTCTGAAA 
GAAGAATTTGAGATGAGGG 
AAGAATTTGAGATGAGGGG 
TAACACGAGGAAAAATTAG 
AAATGCAACATTGCCATTT 
AATGCAACATTGCCATTTA 
ATGCAACATTGCCATTTAC 
TCACACTGCTTGGCCTAAA 
AAAAACCAAGAATCAACGA 
AAAACCAAGAATCAACGAT 
AAACCAAGAATCAACGATA 
TAGTTTCCAGAGCCTGTTA 
AGTCTGAAATGCAACATTA 
AACACGAAGAACCATTAGT 
ACACGAAGAACCATTAGTT 
CACGAAGAACCATTAGTTA 
ACAAAATAACACGAGGAAA 
CAAAATAACACGAGGAAAA 
NNNNNNNNNNNNNTTTCTG 
ATTAGAATAGAATAACTCC 
CCTTCACCCTGCTTGGCCT 
TCTAACTCCCAAGCTCTAG 
ATCCACTCAGCAAATTCTG 
TCCACTCAGCAAATTCTGA 
CCACTCAGCAAATTCTGAA 
CAACATTATCCTTCACACT 
AACATTATCCTTCACACTG 
NNNNNNNNNNNNNNNNNTT 
GAAATGCATCATTATCCTT 
AAATGCATCATTATCCTTC 
NNNNNNTTTCTGAATGTTT 
CTTCCCCGGAAGCTTCCAC 
TTCCCCGGAAGCTTCCACA 
TCCCCGGAAGCTTCCACAG 
GGATAGACCAAGGACAAAG 
CACCCCGCTTGGCCTAAAA 
ACCCCGCTTGGCCTAAAAG 
AAGAAAAATTAGTTTCCAG 
CATTCCTTCCCCGGAAGCT 
ATTCCTTCCCCGGAAGCTT 
CAGCAAAGTCTGAAATGCA 
AGCAAAGTCTGAAATGCAA 
CTTCACCCTGCTTGGCCTA 
CACTGCTTGGCCTAAAAGG 
ACTGCTTGGCCTAAAAGGA 
CTGCTTGGCCTAAAAGGAC 
ATCAGCAAATTCTGAAATG 
GAATTTGAGATGAGGGGAC 
GCTCCCACTCTAACTCCCA 
CTCCCACTCTAACTCCCAA 
TCCCACTCTAACTCCCAAG 
GAGCCTGTTATATTTTGAA 
AAAAAGTCAGCTGTGTTGA 
AAAAGTCAGCTGTGTTGAT 
GATATACCAAGGACTAAGG 
ATATACCAAGGACTAAGGA 
ATAGAATAACTCCAGGAAA 
NNNNNNNNNNNNNNTTTCT 
AATCAAGGCTCCCACTCTA 
ATCAAGGCTCCCACTCTAA 
TCAAGGCTCCCACTCTAAC 
CAAATACCACCATCGGAAT 
AAATACCACCATCGGAATC 
GCCTAAAAGTACAAAATAA 
CCTAAAAGTACAAAATAAC 
CTAAAAGTACAAAATAACA 
TTGGCCTAAAAGTACAAAA 
TGGCCTAAAAGTACAAAAT TGGCCTAAAAGTACAAAAA 
ATCATGAAATACCACCATG 
TCATGAAATACCACCATGG 
ATTTGCTGCCTCTGAGGAG 
GATATACCAAGGACAAAGG 
CTGCTTGGCCTAAAAGTAC 
TACAGTACATTCCTTCCCC 
CCTTCACCCTGCTTGACTT 
TCAATCAGCAAAGTCTGAA 
CTGAGGAAGAATTTGAGAT 
CGCTTGGCCTAAAAGTACA 
GCAAATTCTGAAATGCATC GCAAATTCTGAAATGCAAC 
CTCTGAGGAGGGCATTAGA 
TCTGAGGAGGGCATTAGAA 
CCACCATGGGGATTCGATC 
CACCATGGGGATTCGATCA 
ACCATGGGGATTCGATCAG 
NNNNNNNNNNNNNNNNTTT 
NNNNNNNNNNNNNNNTTTC 
TACAAAATAACACGAGGAA 
ACGAGGAAAAATTAGTTTC 
TGATTCAATCAGCAAATTC 
AACCAAGAATCAACGATAG 
ACCAAGAATCAACGATAGA 
CCAAGAATCAACGATAGAA 
AGCTCTAGGACATACCAAG 
GAACCATTAGTTACCAGAG 
TAATCAAGGCTCCCACTCT 
AAAATTAGTTTCCAGAGCC 
AAATTAGTTTCCAGAGCCT AAATTAGTTTCCAGAGCCA 
AATTAGTTTCCAGAGCCTG 
CTTCACCCTGCTTGACTTA 
NNNNNNNNNNNNANATTNT 
NNNNNNNNNNNANATTNTN 
GGAGATTCAATCAGCAAAT 
AGGAAGATCATGAAATCCC 
ACAGTACATTCCTTCCCCG 
NNNNNNNNNTTTCTGAATG 
NNNNNNNNTTTCTGAATGT 
NNNNNNNTTTCTGAATGTT 
CAAATTCTGAAATGCATCA 
AGCCAGTTATACTTTGAAA 
TCAAAATAACACGAGGAAA 
TTCCCCCTGCTTGGCCTAA 
TCCCCCTGCTTGGCCTAAA 
GTTATATTGTTAAAAATCA 
TTATATTGTTAAAAATCAC 
TATATTGTTAAAAATCACC 
CCATTTACCCTGCTTGGCC 
CATTTACCCTGCTTGGCCT 
CATGGTGATTCAATCAGCA 
ATGGTGATTCAATCAGCAA 
ACCCAAAAACCAAGAATCA 
CACCATGGGGATTCAATCA 
ACCATGGGGATTCAATCAG 
CCATGGGGATTCAATCAGC 
AAGTTCAAAATAACACGAG 
TGCTTGGCCTAAAAGGACA 
CAAGCTCTAGGACATACCA 
AAGCTCTAGGACATACCAA 
GGCATTAGAATAGAATAAC 
GCATTAGAATAGAATAACT 
GATCATGAAATACCACCAT 
CATGGGGATTCAATCAGCA 
CCTCCCACTCTAACTCCCA 
CATTATCCTTCACACTGCT 
ATTATCCTTCACACTGCTT 
TTATCCTTCACACTGCTTG 
GGATTCAATCAGCAAAGTC 
AAGTCTGAAATGCAACATT 
TGAGATGAGGGGACGGATT 
GAGATGAGGGGACGGATTT 
AACACGAAGAAAAATTAGT 
GACAAAGGAAGATCATGAA 
ACAAAGGAAGATCATGAAA 
ACATTATCCTTCACACTGC 
CCCACTCTAACTCCCAAGC 
ATAGAATATACAGTACATT 
TAGAATATACAGTACATTC 
AGAATATACAGTACATTCC 
ACCAAGGACTAAGGAAGAT 
CCAAGGACTAAGGAAGATC 
TACAAAAAAACACGAAGAA 
CTGAGGAGGGCATTAGAAT 
TGAGGAGGGCATTAGAATA 
CATGAAATCCCACCATGGG 
AAAAGTTCAAAATAACACG 
AAAGTTCAAAATAACACGA 
ACCACCATGGGGATTCAAT 
GGACATACCAAGGACAAAG 
TAAGGAAGATCATGAAATA 
TAAAAGTACAAAATAACAC 
GAAATGCAACATTATCCTT 
AAATGCAACATTATCCTTC 
AATGCAACATTATCCTTCA 
GAATCCACTCAGCAAATTC 
AATCCACTCAGCAAATTCT 
AAGGCTCCCACTCTACCTC 
ACGGATTTGCTGCCTCTGA 
AAAAGTACAAAAAAACACG 
CACAGTTGCATCAGCGTAG 
TCTGAAATGCNNNNNNNNN 
CTGAAATGCNNNNNNNNNN 
TGAAATGCNNNNNNNNNNN 
GAAATGCNNNNNNNNNNNN 
NNNNNNNNNNANATTNTNA 
NNNNNNNNNANATTNTNAN 
GCTTGGCCTAAAAGGACAA 
CTTGGCCTAAAAGGACAAA 
ATTAGTTACCAGAGCCTGT 
TTAGTTACCAGAGCCTGTT 
GGCTCCCACTCTAACTCCC 
GACGGATTTGCTGCCTCTG 
CACACTGCTTGGCCTAAAA 
ACACTGCTTGGCCTAAAAG 
CCCAAAGCTAATCAAGGCT 
CCAAAGCTAATCAAGGCTC 
GAGCCAATCTGAGGAAGAA 
CCACTCTACCTCCCAAGCT 
CCCTGCTTGGTCTAAAAGT 
CATTAGAATAGAATAACTC 
ATATACCAAGGACAAAGGA 
TTCTGAAATGCNNNNNNNN 
TCCAGAGCCTGTTATATTT 
AAAGTCTGAAATGCAACAT 
TAAAAGGACAAAACAACAG 
GTACAAAAAAACACGAAGA 
ACAGTTGCATCAGCGTAGA 
GGCCTAAAAGTACAAAATA 
TTGGCCTAAAAGGACAAAA 
CACTCTACCTCCCAAGCTC 
ACCCTGCTTGGTCTAAAAG 
TAAAAGTTCAAAATAACAC 
ACTCGAGGAAAAATTAGTT 
NNNNNNNNANATTNTNANA 
NNNNNNNANATTNTNANAA 
AGCCAGTTATATTGTTAAA 
CATTAGTTACCAGAGCCTG 
ACATTATCCTTCACCCCGC 
ACCCAAAGCTAATCAAGGC
This graph comes from… 
GTTCTAGAAAGTTCTTTGC 
TAGAAAGTTCTTTGCCCTA 
AGAAAGTTCTTTGCCCTAA 
GAAAGTTCTTTGCCCTAAA 
CAGTGAAAATTTGTGCCTA 
AGTGAAAATTTGTGCCTAC 
CACGGACGGCCCGCCAGTC 
ACGGACGGCCCGCCAGTCA 
GATAGACTCAAGGGACAAA 
AAATGTGTAATTTCATGAG 
AATGTGTAATTTCATGAGT 
ATGTGTAATTTCATGAGTG 
CAGAAAACTAAGAATCAAG 
AGAAAACTAAGAATCAAGG 
GAAAACTAAGAATCAAGGA 
ACTAAGAATCAAGGATAGA 
CTAAGAATCAAGGATAGAA 
TAAGAATCAAGGATAGAAT TAAGAATCAAGGATAGAAG 
CTCCCCCTAAAGCTTTCAC 
CTGAGAATCAAGGATAGAA 
TGAGAATCAAGGATAGAAT 
TCAAGGATAGAATTTCTAG 
CAAGGATAGAATTTCTAGA 
AAGGATAGAATTTCTAGAA 
CTAAAGCTTTCACAGTTGA 
TAAAGCTTTCACAGTTGAC 
TGTGTAATTTCATGAGTGG 
AAGCTTTCACACTTGCCTC 
AGCTTTCACACTTGCCTCG 
AAAGTTCATTCCCCTAAAG 
AGAATTTCTAGAAAGTTCC 
TTCTAGAAAGTCCCTCCCC 
CTCAGTGTATATATGTGGG CTCAGTGTATATATGTGGC 
GAGTGGGGTCTCCAGTCAT CGGACGGCCCGCCAGTCAT 
AGTGGGGTCTCCAGTCATT 
GTGGGGTCTCCAGTCATTA 
ACATCAGAAAACTGAGAAT 
CATCAGAAAACTGAGAATC 
ATCAGAAAACTGAGAATCA 
AAAGTCCCTCCCCCTAAGG 
AAGTCCCTCCCCCTAAGGC 
AGTCCCTCCCCCTAAGGCT 
CCCGCTGACAGGCCCCCAG 
CCGCTGACAGGCCCCCAGT 
CGCTGACAGGCCCCCAGTC 
GTCTCCAGTCATTAAATTC 
TCTCCAGTCATTAAATTCA 
TCATTAAATTCAAGCCCCA 
CATTAAATTCAAGCCCCAA 
GCCTAGGAGAAAGCAACAT 
CCTAGGAGAAAGCAACATG 
CTAGGAGAAAGCAACATGA 
ATTCAAGCTCCAAGAAACA 
TTCAAGCTCCAAGAAACAA 
TCAAGCTCCAAGAAACAAA 
GCCTAGGAGATAGCAACAT 
CCTAGGAGATAGCAACATG 
CTAGGAGATAGCAACATGA 
GTGGCTATCCCCCTGAGGG 
TGGCTATCCCCCTGAGGGG 
GGCTATCCCCCTGAGGGGC 
AATTGTAAGAACTGCCCTC 
ATTGTAAGAACTGCCCTCC 
TTGTAAGAACTGCCCTCCC 
GTGTATATTGGTGGCTATC 
TGTATATTGGTGGCTATCC 
GTATATTGGTGGCTATCCC 
GTAATTGTAAGAACTGCCC 
TAATTGTAAGAACTGCCCT 
CCCCGTAAAGCTTTCACAC 
CCCGTAAAGCTTTCACACT 
CCGTAAAGCTTTCACACTT 
ACTCCCGGGCCGCCAGTCA 
CTCCCGGGCCGCCAGTCAT 
TCCCGGGCCGCCAGTCATT 
GCCTCAGTGTATATATGAG 
CCTCAGTGTATATATGAGG 
CTCAGTGTATATATGAGGC 
ACTCATCAGAAAACTGAGA 
CTCATCAGAAAACTGAGAA 
TCATCAGAAAACTGAGAAT 
GTCTTTACTGGTGCTCTTC 
TCTTTACTGGTGCTCTTCC 
CTTTACTGGTGCTCTTCCC 
TCCCCCTGACGGCCCGCCA 
CCCCCTGACGGCCCGCCAG 
CCCCTGACGGCCCGCCAGT 
TTTACTGGTGCTCTTCCCA 
TTACTGGTGCTCTTCCCAC 
TACTGGTGCTCTTCCCACT 
GAAAAATCATCAGAAAACT 
AAAAATCATCAGAAAACTG AAAAATCATCAGAAAACTA 
AAAATCATCAGAAAACTGA AAAATCATCAGAAAACTAA 
AGACAAACCCTTGAAAAAA 
GACAAACCCTTGAAAAAAA 
ACAAACCCTTGAAAAAAAG 
CTACCCCACTCCCGGGCCG 
TACCCCACTCCCGGGCCGC 
ACCCCACTCCCGGGCCGCC 
NNTCAGAAAACTGAGAATC 
NTCAGAAAACTGAGAATCA 
TCAGAAAACTGAGAATCAA 
AGTTATACTTTGAAAAATC 
GTTATACTTTGAAAAATCA 
TTATACTTTGAAAAATCAT 
ACACTTGCCTCAGTGTAAA 
CACTTGCCTCAGTGTAAAT 
ACTTGCCTCAGTGTAAATA 
CCCCCAGTCATAAAATTCA 
CCCCAGTCATAAAATTCAA 
CCCAGTCATAAAATTCAAG 
TATCCCACTGACAGGCCGC 
ATCCCACTGACAGGCCGCC 
TCCCACTGACAGGCCGCCA 
GAAAGTTCCTCCCCCTAAA 
AAAGTTCCTCCCCCTAAAG 
AAGTTCCTCCCCCTAAAGC 
TGACAGGCCCCCAGTCATT 
GACAGGCCCCCAGTCATTA 
ACAGGCCCCCAGTCATTAA 
GGCATTAAATTCAAGCTCC 
GCATTAAATTCAAGCTCCA 
CATTAAATTCAAGCTCCAA 
AGCTTTCACTCTTGCCTCA 
GCTTTCACTCTTGCCTCAG 
CTTTCACTCTTGCCTCAGT 
AAAAGCCAGCCTAGGAGAA 
AAAGCCAGCCTAGGAGAAA 
AAGCCAGCCTAGGAGAAAG 
AAGGGACAAAGCAGTAAAA 
AGGGACAAAGCAGTAAAAT 
GGGACAAAGCAGTAAAATG 
ACTCTTGCCTCAGTGTATA 
CTCTTGCCTCAGTGTATAT 
TCTTGCCTCAGTGTATATA 
CCTCGGAGAAAGCAACATG 
CTCGGAGAAAGCAACATGA 
TCGGAGAAAGCAACATGAT 
GCTTTCACACTTGCCTCAG 
CTTTCACACTTGCCTCAGT 
TTTCACACTTGCCTCAGTG 
CCGGACCCCCAGTCATAAA 
CGGACCCCCAGTCATAAAA 
GGACCCCCAGTCATAAAAT 
TTTGCCCTAAAGATTTCAC 
TTGCCCTAAAGATTTCACA 
TGCCCTAAAGATTTCACAC 
CAAGGGACAAAGCAGTAAA 
GCCAGTTATATTTTGAAAA 
CCAGTTATATTTTGAAAAA 
CAGTTATATTTTGAAAAAT 
ATACTTTGAAAAATCATCA 
TACTTTGAAAAATCATCAG 
ACTTTGAAAAATCATCAGA 
AAAATGTGTAATTTCATGA 
CCACTCCCGGGCCGCCAGT 
CACTCCCGGGCCGCCAGTC 
TAGAAAGTTCCTTCCCCTA 
AGAAAGTTCCTTCCCCTAA 
GAAAGTTCCTTCCCCTAAA 
AGGCTATACCACTGACGGG 
GGCTATACCACTGACGGGC 
GCTATACCACTGACGGGCC 
ATGCAAGCTCCAAGAGACA 
TGCAAGCTCCAAGAGACAA 
GCAAGCTCCAAGAGACAAA 
CATTAAATTCAACCACCAA 
ATTAAATTCAACCACCAAG 
TTAAATTCAACCACCAAGA 
ATCAAGGATAGACTTTCTA 
TCAAGGATAGACTTTCTAG 
CAAGGATAGACTTTCTAGA 
TTCAACCCTGGCCTCAGTG 
TCAACCCTGGCCTCAGTGT 
CAACCCTGGCCTCAGTGTA 
TGAAAAATCATCAGAAAAC 
CCCACTCCCGGGCCGCCAG 
CACACTTGCCTAGGTGAAT 
ACACTTGCCTAGGTGAATA 
CACTTGCCTAGGTGAATAT 
GTATATATGGGGCTATACC 
TATATATGGGGCTATACCA 
ATATATGGGGCTATACCAC 
CCTTTGACAGGCCGCCAGT 
CTTTGACAGGCCGCCAGTC 
TTTGACAGGCCGCCAGTCA 
AGCTTCCACACTTGCCTCA 
GCTTCCACACTTGCCTCAG 
CTTCCACACTTGCCTCAGT 
TCATGAGTGGGGTCTCCAG 
CATGAGTGGGGTCTCCAGT 
ATGAGTGGGGTCTCCAGTC 
CTTTGAAAAATCATCAGAA 
CCCCCTAAAGCTTCAACCC 
CCCCTAAAGCTTCAACCCT 
CCCTAAAGCTTCAACCCTG 
CAGGCATTAAATTCAAGCT 
AGGCATTAAATTCAAGCTC 
AAATGTGATTTGCCCAGGA 
AATGTGATTTGCCCAGGAG 
ATGTGATTTGCCCAGGAGG 
AGGGGCCGCCAGTCATTAA 
GGGGCCGCCAGTCATTAAA 
GGGCCGCCAGTCATTAAAT 
CCACTTCCCTCAGTGTATA 
CACTTCCCTCAGTGTATAT 
ACTTCCCTCAGTGTATATA 
GTGGCTATCCCACTGACGG 
TGGCTATCCCACTGACGGG 
GGCTATCCCACTGACGGGC 
AGCCCCAAGAGACAAACCC 
GCCCCAAGAGACAAACCCT 
CCCCAAGAGACAAACCCTT 
CTAGGAGAAAGAAACATGA 
TAGGAGAAAGAAACATGAT 
AGGAGAAAGAAACATGATT 
TTTTGGAAAAAAAGGCAGC 
TTTGGAAAAAAAGGCAGCC 
TTGGAAAAAAAGGCAGCCT 
TTCCTCCCCCTAAAGCTTT 
TCCTCCCCCTAAAGCTTTC 
CCTCCCCCTAAAGCTTTCA 
AGAAACAAACTCTTGAAAA 
GAAACAAACTCTTGAAAAA 
AAACAAACTCTTGAAAAAA 
ACTGGTGCTCTTCCCACTT 
CAGGCCGCCAGTCATTAAA 
AGGCCGCCAGTCATTAAAT 
GGCCGCCAGTCATTAAATT 
CATTAAATGCAAGCTCCAA 
ATTAAATGCAAGCTCCAAG 
TTAAATGCAAGCTCCAAGA 
CCCAAGAGACAAACCCTTG 
GAAAATTTGTGCCTACCCC 
AAAATTTGTGCCTACCCCA 
AAATTTGTGCCTACCCCAC 
AAGAATCAAGGATAGAATT 
AGAATCAAGGATAGAATTT 
GAATCAAGGATAGAATTTC 
ACCACCGACGGCCCGCCAG 
CCACCGACGGCCCGCCAGG 
CACCGACGGCCCGCCAGGC 
CACACTTGCCTCGGTGTAT 
ACACTTGCCTCGGTGTATA 
CACTTGCCTCGGTGTATAT 
GAGGCTATACCACTGACGG 
AAAGAGACAAACTCTTGAA 
AAGAGACAAACTCTTGAAA 
AGAGACAAACTCTTGAAAA 
GGCAGCCTAGGAGAAAGCA 
GCAGCCTAGGAGAAAGCAA 
CAGCCTAGGAGAAAGCAAC 
AGTGTATATATGTGGCTAT 
GTGTATATATGTGGCTATA GTGTATATATGTGGCTATC 
TGTATATATGTGGCTATAC TGTATATATGTGGCTATCC 
TCTAGAAAGTACCTTCCCC 
CTAGAAAGTACCTTCCCCT 
TAGAAAGTACCTTCCCCTA 
TATAGGTGGGTATCCCGCT 
ATAGGTGGGTATCCCGCTG 
TAGGTGGGTATCCCGCTGA 
TTTCCCACTTCCCTCAGTG 
TTCCCACTTCCCTCAGTGT 
TCCCACTTCCCTCAGTGTA 
GGATAGAAGTTCTAGAAAG 
GATAGAAGTTCTAGAAAGT 
ATAGAAGTTCTAGAAAGTT 
CAAAGCAGTAAAATGTGTA 
AAAGCAGTAAAATGTGTAA 
AAGCAGTAAAATGTGTAAT 
TAAAATGTGTAATTTCATG 
GCCTAGGTGAATATAGGTG 
CCTAGGTGAATATAGGTGG 
CTAGGTGAATATAGGTGGG 
GCCTGTTATATTTTGAAAA 
CCTGTTATATTTTGAAAAC CCTGTTATATTTTGAAAAA 
CTGTTATATTTTGAAAACT CTGTTATATTTTGAAAAAA CTGTTATATTTTGAAAAAT 
TTGACAGGCCGCCAGTCAT 
TGACAGGCCGCCAGTCATT 
GACAGGCCGCCAGTCATTA 
TAGAATTTCTAGAAATTTC 
AGAATTTCTAGAAATTTCC 
GAATTTCTAGAAATTTCCT 
AGGAGGGGGCGTCCAGTCA 
GGAGGGGGCGTCCAGTCAT 
AGGAGAAAGCAGCATGATT 
GGAGAAAGCAGCATGATTA 
GAGAAAGCAGCATGATTAT 
GGCCCGCCAGTCATTAAAT 
GCCCGCCAGTCATTAAATT 
CCCGCCAGTCATTAAATTC 
TGTAAGAACTGCCCTCCCC 
ATGTGCCTATACCACGGAC 
TGTGCCTATACCACGGACG 
GTGCCTATACCACGGACGG 
CCACCAAGAGACAAACTCT 
CACCAAGAGACAAACTCTT 
ACCAAGAGACAAACTCTTG 
TAAAGATTTCACACTTGTG 
AAAGATTTCACACTTGTGT 
AAGATTTCACACTTGTGTC 
CTTTCCCACTTCCCTCAGT 
TTNTNANAAATNNTCAGAA 
TNTNANAAATNNTCAGAAA 
NTNANAAATNNTCAGAAAA 
CCTAAAGATTTCACACTTG 
CTAAAGATTTCACACTTGT 
AAAGAAACATGATTTTTCA 
TAAATTCAACCACCAAGAG 
AAATTCAACCACCAAGAGA 
TGGCTATTCCTTTGACAGG 
GGCTATTCCTTTGACAGGC 
GCTATTCCTTTGACAGGCC 
CCTTCCCCTAAAGCTTTCA 
CTTCCCCTAAAGCTTTCAC 
TTCCCCTAAAGCTTTCACT TTCCCCTAAAGCTTTCACA 
AAGCTTTCACTCTTGCCTC 
TGCTCTTCCCACTTCCGGA 
GCTCTTCCCACTTCCGGAC 
CTCTTCCCACTTCCGGACC 
CCCCTGACAGGCCGCCAGT 
CCCTGACAGGCCGCCAGTC 
CCTGACAGGCCGCCAGTCA 
TATATTTTGAAAAAACATC 
ATATTTTGAAAAAACATCA 
TATTTTGAAAAAACATCAG 
ACTGGCCTCAGTGTATATA 
CTGGCCTCAGTGTATATAT 
TGGCCTCAGTGTATATATG 
ACAGGCCGCCAGTCATTAA 
CTAGAAATTTCCTTCCCCT 
TAGAAATTTCCTTCCCCTA 
AGAAATTTCCTTCCCCTAA 
CTTTCACACTGGCCTCAGT 
TTTCACACTGGCCTCAGTG 
TTCACACTGGCCTCAGTGT 
CCCACTGACAGGCCGCCAG 
CCACTGACAGGCCGCCAGT 
GCTGACAGGCCCCCAGTCA 
CTGACAGGCCCCCAGTCAT 
AACTCATCAGAAAACTGAG 
CCCTAAAGATTTCACACTT 
TCACACTGGCCTCAGTGTA 
NTTTCTGAATGTTTCTTAG 
AGAAAGTCCCTCCCCCTAA 
GAAAGTCCCTCCCCCTAAG 
CAGTCATTAAATTCAAACT 
AGTCATTAAATTCAAACTC 
GTCATTAAATTCAAACTCC 
ATGTGGCTATACCACTTAC 
TGTGGCTATACCACTTACG 
GTGGCTATACCACTTACGG 
GGGCGTCCAGTCATTAAAT 
GGCGTCCAGTCATTAAATT 
GCGTCCAGTCATTAAATTC 
ACGGCCCGCCAGTCATTAA 
CGGCCCGCCAGTCATTAAA 
CTCCAAGAGACAAACCCTT 
TCCAAGAGACAAACCCTTG 
CCAAGAGACAAACCCTTGA 
CTATTCCTTTGACAGGCCG 
CTCCCCGTAAAGCTTTCAC 
TCCCCGTAAAGCTTTCACA 
CAAGCTCCAAGAGACAAAC 
CTCCCCCTAAGGCTTTCAC 
TCCCCCTAAGGCTTTCACA 
CCCCCTAAGGCTTTCACAC 
CTAGAAAGTTCCTTCCCCT 
CAATATATGTGACTACACC 
AATATATGTGACTACACCA 
ATATATGTGACTACACCAC 
ATTAAATTCAAGCTCCAAG 
TTAAATTCAAGCTCCAAGA 
TAAATTCAAGCTCCAAGAG TAAATTCAAGCTCCAAGAA TAAATTCAAACTCCAAGAG 
AAATTCAAACTCCAAGAGA 
AATTCAAACTCCAAGAGAC 
GTGGCTATACCACTGACAG 
TGGCTATACCACTGACAGG 
GGCTATACCACTGACAGGC 
TGTGATTTGCCCAGGAGGG 
TCTAGAAAGTCCCTCCCCC 
CTAGAAAGTCCCTCCCCCT 
TAGAAAGTCCCTCCCCCTA 
TCAAAGAGACAAACTCTTG 
CAAAGAGACAAACTCTTGA 
AGGCAGCCTAGGAGAAAGA 
GGCAGCCTAGGAGAAAGAA 
GCAGCCTAGGAGAAAGAAA 
AAGCTCCAAGAGACAAACC AAGCTCCAAGAGACAAACT 
AGCTCCAAGAGACAAACCC AGCTCCAAGAGACAAACTC 
CATTAAATTCAAACTCCAA 
ATTAAATTCAAACTCCAAG 
TTAAATTCAAACTCCAAGA 
AAACTGAGAATCAAGGATA 
AACTGAGAATCAAGGATAG 
ACTGAGAATCAAGGATAGA 
TTGAAAAAAAGCCAGCCTA 
TGAAAAAAAGCCAGCCTAG 
GAAAAAAAGCCAGCCTAGG 
CTTGCCTCAGTGTATATAT 
TTGCCTCAGTGTATATATG 
TGCCTCAGTGTATATATGG TGCCTCAGTGTATATATGT TGCCTCAGTGTATATATGA 
AAAGCAACCGGATTTTTCA 
TAGAAAGTTCATTCCCCTA 
AGAAAGTTCATTCCCCTAA 
GAAAGTTCATTCCCCTAAA 
GACAAGTTTTGGAAAAAAA 
ACAAGTTTTGGAAAAAAAG 
CAAGTTTTGGAAAAAAAGG 
AATTTCTAGAAAGTTCCTT 
ATTTCTAGAAAGTTCCTTC 
TTTCTAGAAAGTTCCTTCC 
TTCCCACTTCCGGACCCCC 
TCCCACTTCCGGACCCCCA 
CCCACTTCCGGACCCCCAG 
ACTTGCCTCGGTGTATATA 
CTTGCCTCGGTGTATATAT 
AGCTTCAACCCTGGCCTCA 
GCTTCAACCCTGGCCTCAG 
CTTCAACCCTGGCCTCAGT 
AACAAATGTGATTTGCCCA 
ACAAATGTGATTTGCCCAG 
CAAATGTGATTTGCCCAGG 
AGACAAGTTTTGGAAAAAA 
TTATATATGTGGCTATCCC 
TATATATGTGGCTATCCCA 
ATATATGTGGCTATCCCAC 
TCAAGCTCCAAGAGACAAA 
TCCGAGGAGAAAGCAACCG 
CCGAGGAGAAAGCAACCGG 
CGAGGAGAAAGCAACCGGA 
ATGTGGGTATACCACTGAC 
TGTGGGTATACCACTGACA 
GTGGGTATACCACTGACAG 
ATGAGGCTATACCACTGAC 
TGAGGCTATACCACTGACG 
ACACTTGCCACAGTGAAAA 
CACTTGCCACAGTGAAAAT 
ACTTGCCACAGTGAAAATT 
TCTTGAAAAAAAGGCAGCC 
CTTGAAAAAAAGGCAGCCT 
TTGAAAAAAAGGCAGCCTA TTGAAAAAAAGGCAGCCTC 
GGCCTCAGTGTATATATGT 
GCCTCAGTGTATATATGTG 
CTGAGGGGCCGCCAGTCAT 
TGAGGGGCCGCCAGTCATT 
GAGGGGCCGCCAGTCATTA 
AATTTCCTTCCCCTAAACC 
ATTTCCTTCCCCTAAACCT 
TTTCCTTCCCCTAAACCTT 
CAAGAGACAAACCCTTGAA 
TGTATATATGAGGCTATAC 
GTATATATGAGGCTATACC 
TATATATGAGGCTATACCA 
TATTGGTGGCTATCCCCCT 
ATTGGTGGCTATCCCCCTG 
TTGGTGGCTATCCCCCTGA 
TAAAGCTTTCCCACTTCCC 
AAAGCTTTCCCACTTCCCT 
AAGCTTTCCCACTTCCCTC 
CCAAGAAACAAACTCTTGA 
CAAGAAACAAACTCTTGAA 
AAGAAACAAACTCTTGAAA 
TCCCGCTGACAGGCCCCCA 
TATGAGGCTATACCACTGA 
AGAAAGCAGCATGATTATT 
TTCCACACTTGCCTCAGTG 
CAAGCTCAAAGAGACAAAC 
AAGCTCAAAGAGACAAACT 
AGCTCAAAGAGACAAACTC 
GTATATATGTGGCTATACC 
TATATATGTGGCTATACCA 
ATATATGTGGCTATACCAC 
TTCTTTGCCCTAAAGATTT 
TCTTTGCCCTAAAGATTTC 
CTTTGCCCTAAAGATTTCA 
AAAGTTCCTTCCCCTAAAG 
AAGTTCCTTCCCCTAAAGC 
AGTTCCTTCCCCTAAAGCT 
ATATGTGCCTATACCACGG 
TATGTGCCTATACCACGGA 
TNNTCAGAAAACTGAGAAT 
TATATATGTGGGTATACCA 
ATATATGTGGGTATACCAC 
TATATGTGGGTATACCACT 
CACTGACGGGCCGCCAGTC 
ACTGACGGGCCGCCAGTCC ACTGACGGGCCGCCAGTCA 
CTGACGGGCCGCCAGTCCT CTGACGGGCCGCCAGTCAT 
AGTTCCTCCCCCTAAAGCT 
GTTCCTCCCCCTAAAGCTT 
CCACACTTGCCTCAGTGTA 
CACACTTGCCTCAGTGTAA 
AGGTGGGTATCCCGCTGAC 
CCACTTCCGGACCCCCAGT 
CACTTCCGGACCCCCAGTC 
CCCTTGCCTCAGTGTATAT 
CCTTGCCTCAGTGTATATA 
AAAAAAAGGCAGCCTCGGA 
AAAAAAGGCAGCCTCGGAG 
AAAAAGGCAGCCTCGGAGA 
TAAAGCTTTCACTCTTGCC 
AAAGCTTTCACTCTTGCCT 
CTAAAGCTTCAACCCTGGC 
TAAAGCTTCAACCCTGGCC 
AAAGCTTCAACCCTGGCCT 
GAGAAAGCAACATGATTTT 
AGAAAGCAACATGATTTTT 
GAAAGCAACATGATTTTTC 
GTGGCTATCCCCCTGACGG 
TGGCTATCCCCCTGACGGC 
GGCTATCCCCCTGACGGCC 
GTAAAGCTTTCACACTTGC 
TAAAGCTTTCACACTTGCC 
AAAGCTTTCACACTTGCCT 
TTCTAGAAAGTTCTTTGCC 
TCTAGAAAGTTCTTTGCCC 
CTAGAAAGTTCTTTGCCCT 
ACGGCCCGCCAGGCATTAA 
CGGCCCGCCAGGCATTAAA 
GGCCCGCCAGGCATTAAAT 
CAGTGTATATATGAGGCTA 
AGTGTATATATGAGGCTAT 
GTGTATATATGAGGCTATA 
ATATGTGGGTATACCACTG 
TATGTGGGTATACCACTGA 
ATTCAACCACCAAGAGACA 
TTCAACCACCAAGAGACAA 
TCAACCACCAAGAGACAAA 
TGAAAAAAAGGCAGCCTCG 
GAAAAAAAGGCAGCCTCGG 
GCTCCAAGAGACAAACTCT 
CTCCAAGAGACAAACTCTT 
CCCTAAAGCTTCCACACTT 
CCTAAAGCTTCCACACTTG 
CTAAAGCTTCCACACTTGC 
AGCTTCCACACTTGCCTAG 
GCTTCCACACTTGCCTAGG 
CTTCCACACTTGCCTAGGT 
CACTGGCCTCAGTGTATAT 
ATTTCATGAGTGGGGTCTC 
TTTCATGAGTGGGGTCTCC 
TTCATGAGTGGGGTCTCCA 
AGGATAGAATTTCTAGAAA 
ACTCAAGGGACAAAGCAGT 
CTCAAGGGACAAAGCAGTA 
TCAAGGGACAAAGCAGTAA 
CCTGAGGGGCCGCCAGTCA 
AAAAAAAGCCAGCCTAGGA 
GGGGCTATACCACTGACAG 
GGGCTATACCACTGACAGG 
AAATTTCCTTCCCCTAAAC 
GTAAAATGTGTAATTTCAT 
AGATTTCACACTTGTGTCA 
TCAAGCTCCAAGAGACAAG 
CAAGCTCCAAGAGACAAGT 
AAGCTCCAAGAGACAAGTT 
GCCAGTTATACTTTGAAAA 
CCAGTTATACTTTGAAAAA 
CAGTTATACTTTGAAAAAT 
CCAAGAGACAAGTTTTGGA 
CAAGAGACAAGTTTTGGAA 
AAGAGACAAGTTTTGGAAA 
AAGCTCCAAGAAACAAACT 
AGCTCCAAGAAACAAACTC 
GCTCCAAGAAACAAACTCT 
TAGAATTTCTAGAAAGTCC 
AGAATTTCTAGAAAGTCCC 
GAATTTCTAGAAAGTCCCT 
GTATACCACTGACAGGCCG 
TATACCACTGACAGGCCGC 
ATACCACTGACAGGCCGCC 
ACCACGGACAGGCCGCCAG 
CCACGGACAGGCCGCCAGT 
CACGGACAGGCCGCCAGTC 
AAAATTCAAGCTCCAAGAG 
AAATTCAAGCTCCAAGAGA 
AATTCAAGCTCCAAGAGAC 
GTGTAATTTCATGAGTGGG 
TGTAATTTCATGAGTGGGG 
GTAATTTCATGAGTGGGGT 
AACTGCCCTCCCCCTAAAG 
ACTGCCCTCCCCCTAAAGC 
CTGCCCTCCCCCTAAAGCT 
CCCTTAAAGCTTCCACACT 
CCTTAAAGCTTCCACACTT 
CTTAAAGCTTCCACACTTG 
AGCCTCGGAGAAAGCAACA 
GCCTCGGAGAAAGCAACAT 
AGAAAGTACCTTCCCCTAA 
ACTTCCGGACCCCCAGTCA 
CTTCCGGACCCCCAGTCAT 
TTCCGGACCCCCAGTCATA 
ATAGCAACATGATTTTTCA 
GAGACAAACCCTTGAAAAA 
TTTGTGCCTACCCCACTCC 
TTGTGCCTACCCCACTCCC 
TGTGCCTACCCCACTCCCG 
AGTTCATTCCCCTAAAGCC 
GTTCATTCCCCTAAAGCCT 
TTCATTCCCCTAAAGCCTT 
AAATCATCAGAAAACTAAG 
AATCATCAGAAAACTAAGA 
ATCATCAGAAAACTAAGAA 
TGGCTATACCACTTACGGG 
AATTTGTGCCTACCCCACT 
ATTTGTGCCTACCCCACTC 
CCGCCAGGCATTAAATTCA 
CGCCAGGCATTAAATTCAA 
GCCAGGCATTAAATTCAAG 
AGCTCCAAGAGACAAGTTT 
GCTCCAAGAGACAAGTTTT 
CTCCAAGAGACAAGTTTTG 
TATTCCTTTGACAGGCCGC 
ATTCCTTTGACAGGCCGCC 
CAGTGTATATTGGTGGCTA 
AGTGTATATTGGTGGCTAT 
TACCACTGACAGGCCGCCA 
ACCACTGACAGGCCGCCAG 
ACGGGCCGCCAGTCATTAA 
CGGGCCGCCAGTCATTAAA 
GAAAAAAAGGCAGCCTAGG 
AAAAAAAGGCAGCCTAGGC AAAAAAAGGCAGCCTAGGA 
AAAAAAGGCAGCCTAGGCG AAAAAAGGCAGCCTAGGAG 
CAGCCTCGGAGAAAGCAAC 
TCAAACTCCAAGAGACAAA 
CAAACTCCAAGAGACAAAC 
AAACTCCAAGAGACAAACT 
CTATCCCCCTGACGGCCCG 
TATCCCCCTGACGGCCCGC 
ATCCCCCTGACGGCCCGCC 
GTGATTTGCCCAGGAGGGG 
TGATTTGCCCAGGAGGGGG 
GATTTGCCCAGGAGGGGGC 
ACCTTTCACACTTGCCTCA 
CCTTTCACACTTGCCTCAG 
TCAGTGTATATATGAGGCT 
CCAAGAGACAAACTCTTGA 
AGTTATATTTTGAAAAATC 
GTTATATTTTGAAAAATCA 
TTATATTTTGAAAAATCAT 
AAGCTTCAACCCTGGCCTC 
GGACGGCCCGCCAGTCATT 
GACGGCCCGCCAGTCATTA 
AAGGCAGCCTCGGAGAAAG 
AGGCAGCCTCGGAGAAAGC 
GGCAGCCTCGGAGAAAGCA 
CCACAGTGAAAATTTGTGC 
CACAGTGAAAATTTGTGCC 
ACAGTGAAAATTTGTGCCT 
GAAAGTACCTTCCCCTAAA 
AAAGTACCTTCCCCTAAAG 
GTTCCTTCCCCTAAAGCTT 
TTCCTTCCCCTAAAGCTTT 
ACTTGCCTAGGTGAATATA 
TCCTTCCCCTAAAGCTTTC 
GAAGATAGACTCAAGGGAC 
AAGATAGACTCAAGGGACA 
AGATAGACTCAAGGGACAA 
AATNNTCAGAAAACTGAGA 
ATNNTCAGAAAACTGAGAA 
TTCACTCTTGCCTCAGTGT 
TCACTCTTGCCTCAGTGTA 
CACTCTTGCCTCAGTGTAT 
AGGTGGCTATTCCTTTGAC 
GGTGGCTATTCCTTTGACA 
GTGGCTATTCCTTTGACAG 
AAGGATAGACTTTCTAGAA 
GGTGGGTATCCCGCTGACA 
GTGGGTATCCCGCTGACAG 
AAGTTTTGGAAAAAAAGGC 
AGTTTTGGAAAAAAAGGCA 
GTTTTGGAAAAAAAGGCAG 
TTGCCTAGGTGAATATAGG 
TGCCTAGGTGAATATAGGT 
NNNNNNANATTNTNANAAA 
NNNNNANATTNTNANAAAT 
NNNNANATTNTNANAAATN 
TCAGTGTATATATGGGGCT 
CAGTGTATATATGGGGCTA 
AGTGTATATATGGGGCTAT 
AGGATAGACTTTCTAGAAA 
GGATAGACTTTCTAGAAAG 
TCCCCCTAAAGCTTTCACA 
CCCCCTAAAGCTTTCACAC 
CCCCTAAAGCTTTCACACT 
GCGCGAACCCACGGACAGG 
CGCGAACCCACGGACAGGC 
GCGAACCCACGGACAGGCC 
TGGGTATACCACTGACAGG 
GGGTATACCACTGACAGGC 
TGCCTATACCACGGACGGC 
GCCTATACCACGGACGGCC 
CAGAAAACTGAGAATCAAG 
GTGCCTACCCCACTCCCGG 
ATTCAAACTCCAAGAGACA 
TTCAAACTCCAAGAGACAA 
TGGAAAAAAAGGCAGCCTA 
GGAAAAAAAGGCAGCCTAG 
TCATTAAATTCAAGCTCCA 
TCCGGACCCCCAGTCATAA 
GCTTTCACACTTGCCTCGG 
CTTTCACACTTGCCTCGGT 
TTTCACACTTGCCTCGGTG 
CTTCCCTCAGTGTATATAT 
TTCCCTCAGTGTATATATG 
TCCCTCAGTGTATATATGT 
GTGAAAATTTGTGCCTACC 
TGAAAATTTGTGCCTACCC 
GGTATACCACTGACAGGCC 
AATGCAAGCTCCAAGAGAC 
AACCCTGGCCTCAGTGTAT 
ACCCTGGCCTCAGTGTATA 
CCCTGGCCTCAGTGTATAT 
AGTACCTTCCCCTAAAGCT 
GTACCTTCCCCTAAAGCTT 
TACCTTCCCCTAAAGCTTT 
ATACCACGGACGGCCCGCC 
TACCACGGACGGCCCGCCA 
ACCACGGACGGCCCGCCAG 
CCCACTGACGGGCCGCCAG 
CCACTGACGGGCCGCCAGT 
GCCGCCAGTCATTAAATTC 
CCGCCAGTCATTAAATTCA 
CGCCAGTCATTAAATTCAA 
ATTTTGAAAAAACATCAGA 
TTTTGAAAAAACATCAGAA 
TTTGAAAAAACATCAGAAA 
GAAAAAACATCAGAAAACT 
AAAAAACATCAGAAAACTG 
AAAAACATCAGAAAACTGA 
GTCATTAAATTCAACCACC 
TCATTAAATTCAACCACCA 
GGACAAAGCAGTAAAATGT 
CTATACCACTTACGGGCCG 
TATACCACTTACGGGCCGC 
ATACCACTTACGGGCCGCC 
TCATTAAATTCAAACTCCA 
AATATAGGTGGGTATCCCG 
ATATAGGTGGGTATCCCGC 
CCTGGCCTCAGTGTATATA 
GAGAAAGCAACCGGATTTT 
AGAAAGCAACCGGATTTTT 
GAAAGCAACCGGATTTTTC 
ATAGATTTTCTAGAAAGTT 
TAGATTTTCTAGAAAGTTC 
AGATTTTCTAGAAAGTTCC 
GATAGACTTTCTAGAAAGT 
GTATATATGTGGGTATACC 
CCTAAAGCTTCAACCCTGG 
CCCCTAAAGCTTTCACAGT 
CCCTAAAGCTTTCACAGTT 
CCTAAAGCTTTCACAGTTG 
TCCAAGAGACAAACTCTTG 
TCCCCTAAAGCTTTCACTC 
CCCCTAAAGCTTTCACTCT 
CCCTAAAGCTTTCACTCTT 
AGAATTTCTAGAAAGTTCA 
GAATTTCTAGAAAGTTCAT 
AATTTCTAGAAAGTTCATT ATTTTCTAGAAAGTTCCTT 
TTTTCTAGAAAGTTCCTTC 
AAGAATCAAGGATAGAAGT 
CTCAGTGTATATTGGTGGC 
TCAGTGTATATTGGTGGCT 
GCTATCCCCCTGACGGCCC 
TAAAGCTTCCACACTTGCC 
CTTGAAAAAAAGCCAGCCT 
CAGTAAAATGTGTAATTTC 
AGTAAAATGTGTAATTTCA 
NNNANATTNTNANAAATNN 
NNANATTNTNANAAATNNT 
NANATTNTNANAAATNNTC 
CCTAGGAGAAAGAAACATG 
TCTTCCCACTTCCGGACCC 
CTTCCCACTTCCGGACCCC 
TNANAAATNNTCAGAAAAC 
NANAAATNNTCAGAAAACT 
ANAAATNNTCAGAAAACTG 
GACGGGCCGCCAGTCCTTA 
ACGGGCCGCCAGTCCTTAA 
CGGGCCGCCAGTCCTTAAA 
TGTAAATATGTGGCTATAC 
GTAAATATGTGGCTATACC 
TAAATATGTGGCTATACCA 
CCCTGACGGCCCGCCAGTC 
CCTGACGGCCCGCCAGTCA 
CTGACGGCCCGCCAGTCAT 
CTACACCACCGACGGCCCG 
TACACCACCGACGGCCCGC 
ACACCACCGACGGCCCGCC 
TCATTCCCCTAAAGCCTTC 
CATTCCCCTAAAGCCTTCA 
ATTCCCCTAAAGCCTTCAC 
TGTTATATTTTGAAAACTC 
GTTATATTTTGAAAACTCA 
ATTAAATTCAAGCCCCAAG 
TTAAATTCAAGCCCCAAGA 
TAAATTCAAGCCCCAAGAG 
CCCCTAAAGCTTCCACACT 
GGATAGATTTTCTAGAAAG 
GATAGATTTTCTAGAAAGT 
GCTATACCACTGACAGGCC 
ACTCTTGAAAAAAAGGCAG 
CTCTTGAAAAAAAGGCAGC 
CACACTTGCCTCAGTGTAT 
ACACTTGCCTCAGTGTATA 
CACTTGCCTCAGTGTATAT 
CAGTCTTTACTGGTGCTCT 
AGTCTTTACTGGTGCTCTT 
ATGTAACAAATGTGATTTG 
TGTAACAAATGTGATTTGC 
GTAACAAATGTGATTTGCC 
AAAAGGCAGCCTCGGAGAA 
AAAGGCAGCCTCGGAGAAA 
GATAGAATTTCTAGAAAGT 
ATAGAATTTCTAGAAAGTT 
TAGAATTTCTAGAAAGTTC 
AAGTACCTTCCCCTAAAGC 
CAAGAGACAAACTCTTGAA 
GCTATCCCCCTGAGGGGCC 
CTATCCCCCTGAGGGGCCG 
TCCAGTCATTAAATTCAAG 
CCAGTCATTAAATTCAAGC 
CAGTCATTAAATTCAAGCC CAGTCATTAAATTCAAGCT 
CCCACGGACAGGCCGCCAG 
AACTCTTGAAAAAAAGGCA 
TTCCACACTTGCCTAGGTG 
TCCACACTTGCCTAGGTGA 
CCCCTAAGGCTTTCACACT 
CCCTAAGGCTTTCACACTT 
CTTGCCTCAGTGTAAATAT 
TTGCCTCAGTGTAAATATG 
TGCCTCAGTGTAAATATGT 
CTCTTGAAAAAAAGCCAGC 
TCTTGAAAAAAAGCCAGCC 
AACCACCAAGAGACAAACT 
ACCACCAAGAGACAAACTC 
TGGCTATCCCCCTGACAGG 
GGCTATCCCCCTGACAGGC 
GCTATCCCCCTGACAGGCC 
AGCTTTCACACTTGCCTCA 
TATACTTTGAAAAATCATC 
CTTGCCTAGGTGAATATAG 
TTGAAAAAACATCAGAAAA 
TGAAAAAACATCAGAAAAC 
TACCACTTACGGGCCGCCA 
GCCCTAAAGATTTCACACT 
GCCGCCAGTCATTAAATGC 
CCGCCAGTCATTAAATGCA 
CGCCAGTCATTAAATGCAA 
GAAATTTCCTTCCCCTAAA 
TTGCCCAGGAGGGGGCGTC 
TGCCCAGGAGGGGGCGTCC 
GCCCAGGAGGGGGCGTCCA 
AACAAACTCTTGAAAAAAA 
ACAAACTCTTGAAAAAAAG 
AAATCATCAGAAAACTGAG 
TCCCCTAAAGCTTTCACAC TCCCCTAAAGCTTTCACAG 
CTGGTGCTCTTCCCACTTC 
TGGTGCTCTTCCCACTTCC 
TCCAAGAAACAAACTCTTG 
AAGCTTCCACACTTGCCTA 
AAAGCAACATGATTTTTCT AAAGCAACATGATTTTTCA 
GACGGGCCGCCAGTCATTA 
GGGGGCGTCCAGTCATTAA 
GGGGCGTCCAGTCATTAAA 
GGAGATAGCAACATGATTT 
GAGATAGCAACATGATTTT 
AGATAGCAACATGATTTTT 
TAGGAGAAAGCAGCATGAT 
TGCCCTCCCCCTAAAGCTT 
GCCCTCCCCCTAAAGCTTC 
CCCTCCCCCTAAAGCTTCA 
TGGGGTCTCCAGTCATTAA 
GGGGTCTCCAGTCATTAAA 
GGGTATCCCGCTGACAGGC 
GGTATCCCGCTGACAGGCC 
GTATCCCGCTGACAGGCCC 
AAAGTTCTTTGCCCTAAAG 
AAGTTCTTTGCCCTAAAGA 
AGTTCTTTGCCCTAAAGAT 
GCCCGCCAGGCATTAAATT 
GCCTCAGTGTATATATGGG 
CCTCAGTGTATATATGGGG 
CTCAGTGTATATATGGGGC 
ACCTTCCCCTAAAGCTTTC 
AGCCAGCCTAGGAGAAAGC 
GGCCGCCAGTCATTAAATG 
CAGCCTAGGAGATAGCAAC 
AGCCTAGGAGATAGCAACA 
GAGGGGGCGTCCAGTCATT 
AGGGGGCGTCCAGTCATTA 
ATTTTGAAAACTCATCAGA 
TTTTGAAAACTCATCAGAA 
TTTGAAAACTCATCAGAAA 
ATTTGCCCAGGAGGGGGCG 
TTTGCCCAGGAGGGGGCGT 
AAATTCAAGCTCCAAGAAA 
AATTCAAGCTCCAAGAAAC 
GGACAGGCCGCCAGTCATT 
AACTCCAAGAGACAAACTC 
ACTCCAAGAGACAAACTCT 
GCTATCCCACTGACGGGCC 
CTATCCCACTGACGGGCCG 
TATCCCACTGACGGGCCGC 
CAGGCCCCCAGTCATTAAA 
CGTAAAGCTTTCACACTTG 
TAATTTCATGAGTGGGGTC 
AATTTCATGAGTGGGGTCT 
GGGCCGCCAGTCCTTAAAT 
ATGTGGCTATCCCACTGAC 
TGTGGCTATCCCACTGACA TGTGGCTATCCCACTGACG 
GTGGCTATCCCACTGACAG 
GCTATACCACTTACGGGCC 
ACAAAGCAGTAAAATGTGT 
TATATGTGGCTATACCACT 
ATATGTGGCTATACCACTG ATATGTGGCTATACCACTT 
ATCCCGCTGACAGGCCCCC 
GTTCTTTGCCCTAAAGATT 
CAGCCTAGGAGAAAGAAAC 
AGAATCAAGGATAGAAGTT 
TCCAAGAGACAAGTTTTGG 
GTATATATGTGGCTATCCC 
TATATATGTGGCTATCCCC 
ATATATGTGGCTATCCCCC 
TGACGGGCCGCCAGTCATT 
CTTGCCACAGTGAAAATTT 
TTGCCACAGTGAAAATTTG 
TGCCACAGTGAAAATTTGT 
CCCTCAGTGTATATATGTG 
CCTCAGTGTATATATGTGG 
AAAACTAAGAATCAAGGAT 
AAACTAAGAATCAAGGATA 
AACTAAGAATCAAGGATAG 
GATAGCAACATGATTTTTC 
GCCTCAGTGTAAATATGTG 
CCTCAGTGTAAATATGTGG 
CACTGACAGGCCGCCAGTC 
ACTGACAGGCCGCCAGTCA 
CTGACAGGCCGCCAGTCAT 
CCCTGAGGGGCCGCCAGTC 
CCTATACCACGGACGGCCC 
CTATACCACGGACGGCCCG 
ATGTGACTACACCACCGAC 
TGTGACTACACCACCGACG 
GTGACTACACCACCGACGG 
TTCCTTTGACAGGCCGCCA 
TCCTTTGACAGGCCGCCAG 
AGAGACAAGTTTTGGAAAA 
ANATTNTNANAAATNNTCA 
AATCAAGGATAGAATTTCT 
CCCACTTCCCTCAGTGTAT 
TATCCCGCTGACAGGCCCC 
TTAACACTTGCCACAGTGA 
TAACACTTGCCACAGTGAA 
AACACTTGCCACAGTGAAA 
CTCCAGTCATTAAATTCAA 
TATACCACGGACGGCCCGC 
NNNNNNTTTCTGAATGTTT 
NNNNNTTTCTGAATGTTTC 
NNNNTTTCTGAATGTTTCT 
GTCATTAAATTCAAGCTCA 
TCATTAAATTCAAGCTCAA 
CATAAAATTCAAGCTCCAA CATTAAATTCAAGCTCAAA 
ATAAAATTCAAGCTCCAAG 
TAAAATTCAAGCTCCAAGA 
CTATCCCCCTGACAGGCCG 
TATCCCCCTGACAGGCCGC 
CACTTACGGGCCGCCAGTC 
ACTTACGGGCCGCCAGTCA 
CTTACGGGCCGCCAGTCAT 
TGTTATATTTTGAAAAATC 
GAATCAAGGATAGACTTTC 
AATCAAGGATAGACTTTCT 
AGGAGAAAGCAACATGATT 
GGAGAAAGCAACATGATTT 
NAAATNNTCAGAAAACTGA 
AAATNNTCAGAAAACTGAG 
GGCTATACCACTTACGGGC 
CATCAGAAAACTAAGAATC 
ATCAGAAAACTAAGAATCA 
TCAGAAAACTAAGAATCAA 
TTCCTTCCCCTAAACCTTT 
TCCTTCCCCTAAACCTTTC 
CCTTCCCCTAAACCTTTCA 
TGGCTATCCCACTGACAGG 
AATTTCTAGAAATTTCCTT 
ATATATGAGGCTATACCAC 
CCCGGGCCGCCAGTCATTA 
CCGGGCCGCCAGTCATTAA 
GAATATAGGTGGGTATCCC 
GCTTTAACACTTGCCACAG 
CTTTAACACTTGCCACAGT 
TTTAACACTTGCCACAGTG 
GAAAACTGAGAATCAAGGA 
AAAACTGAGAATCAAGGAT 
CAGTCATTAAATTCAACCA 
AGTCATTAAATTCAACCAC 
CTCCAAGAAACAAACTCTT 
AACCTTTCACACTTGCCTC 
CGCAGCCTAGGAGATAGCA 
GCAGCCTAGGAGATAGCAA 
AAAACATCAGAAAACTGAG 
AAACATCAGAAAACTGAGA 
AACATCAGAAAACTGAGAA 
CCAGTCATAAAATTCAAGC 
TATATTGGTGGCTATCCCC 
ATATTGGTGGCTATCCCCC 
TCGGTGTATATATGTGGCT 
CGGTGTATATATGTGGCTA 
GGTGTATATATGTGGCTAT 
TGTTATATTTTGAAAAAAC 
AGAAAGTTCCTCCCCCTAA 
GCTCAAAGAGACAAACTCT 
AGCCTAGGAGAAAGCAACA 
GACTACACCACCGACGGCC 
ACTACACCACCGACGGCCC 
TCCCACTGACGGGCCGCCA 
AGTCATAAAATTCAAGCTC 
GTCATAAAATTCAAGCTCC 
TCATAAAATTCAAGCTCCA 
TCCACACTTGCCTCAGTGT 
ATTTTGAAAAATCATCAGA 
TTTTGAAAAATCATCAGAA 
TTTGAAAAATCATCAGAAA 
TAGAAGTTCTAGAAAGTTC 
AGAAGTTCTAGAAAGTTCT 
GAAGTTCTAGAAAGTTCTT 
ATGTGGCTATACCACTGAC 
TGTGGCTATACCACTGACA TGTGGCTATACCACTGACG 
GTGGCTATACCACTGACGG 
AGTCATTAAATTCAAGCTC 
GTCATTAAATTCAAGCTCC 
CTAAAGCTTTCCCACTTCC 
ATCCCACTGACGGGCCGCC 
CCTCCCCCTAAGGCTTTCA 
CGAACCCACGGACAGGCCG 
GATTTTCTAGAAAGTTCCT 
AGTGTATATATGTGGGTAT 
GTGTATATATGTGGGTATA 
TGTATATATGTGGGTATAC 
AAATATGTGGCTATACCAC 
NNNTTTCTGAATGTTTCTT 
NNTTTCTGAATGTTTCTTA 
TATGTGGCTATACCACTGA TATGTGGCTATACCACTTA 
CACCACCGACGGCCCGCCA 
NATTNTNANAAATNNTCAG 
GAGAATCAAGGATAGAATT 
GGCTATCCCACTGACAGGC 
GCTATCCCACTGACAGGCC 
CTATCCCACTGACAGGCCG 
TAGAAAGTTCCTCCCCCTA 
CCTCCCCCTAAAGCTTCAA 
CTCCCCCTAAAGCTTCAAC 
TCCCCCTAAAGCTTCAACC 
GGTGGCTATCCCCCTGAGG 
CCACGGACGGCCCGCCAGT 
ATTCAAGCCCCAAGAGACA 
TTCAAGCCCCAAGAGACAA 
TCAAGCCCCAAGAGACAAA 
TATATGAGGCTATACCACT 
CTAAGAATCAAGGATAGAC 
TAAGAATCAAGGATAGACT 
TGGTGGCTATCCCCCTGAG 
AGTCATTAAATTCAAGCCC 
GTCATTAAATTCAAGCCCC 
CCAGGCATTAAATTCAAGC 
AAGTTCATTCCCCTAAAGC 
CCCTAAAGCTTTCACACTT 
CCTAAAGCTTTCACACTTG 
CTAAAGCTTTCACACTTGC 
GTAAGAACTGCCCTCCCCC 
AAACTCTTGAAAAAAAGGC 
CCCTAAACCTTTCACACTT 
CCTAAACCTTTCACACTTG 
CTAAACCTTTCACACTTGC 
TATCCCCCTGAGGGGCCGC 
ATCCCCCTGAGGGGCCGCC 
AAAAGGCAGCCTAGGCGAA 
AAAGGCAGCCTAGGCGAAA 
AAGGCAGCCTAGGCGAAAG 
GCTCCAAGAGACAAACCCT 
TTTCACTCTTGCCTCAGTG 
TAAACCTTTCACACTTGCC 
AAACCTTTCACACTTGCCT 
TATATGTGGCTATCCCCCT 
ATATGTGGCTATCCCCCTG 
TATGTGGCTATCCCCCTGA 
TTCCCCTAAACCTTTCACA 
TCCCCTAAACCTTTCACAC 
CCCCTAAACCTTTCACACT 
CCACACTTGCCTAGGTGAA 
AGACTCAAGGGACAAAGCA 
GACTCAAGGGACAAAGCAG 
GAGGAGAAAGCAACCGGAT 
AGGAGAAAGCAACCGGATT 
TATATTTTGAAAAATCATC 
ATATTTTGAAAAATCATCA 
TATTTTGAAAAATCATCAG 
CCAGTCATTAAATTCAAAC 
TTCTAGAAAGTTCCTTCCC 
ACCACTGACGGGCCGCCAG 
CCTCCCCGTAAAGCTTTCA 
GAAAGAAACATGATTTTTC 
ATTNTNANAAATNNTCAGA 
CCCCCTAAAGCTTTCCCAC 
CCCCTAAAGCTTTCCCACT 
CCCTAAAGCTTTCCCACTT 
AAGAATCAAGGATAGACTT 
AGAATCAAGGATAGACTTT 
GTCATTAAATGCAAGCTCC 
TCATTAAATGCAAGCTCCA 
TCAGTGTAAATATGTGGCT 
CAGTGTAAATATGTGGCTA 
AGTGTAAATATGTGGCTAT 
CTATACCACTGACGGGCCG 
TATACCACTGACGGGCCGC 
GAGACAAGTTTTGGAAAAA 
GGTGAATATAGGTGGGTAT 
GTGAATATAGGTGGGTATC 
TGAATATAGGTGGGTATCC 
AACTCTTGAAAAAAAGCCA 
ACTCTTGAAAAAAAGCCAG 
TGCCTCGGTGTATATATGT 
GCCTCGGTGTATATATGTG 
CCTCGGTGTATATATGTGG 
AATATGTGGCTATACCACT 
AAAAAGGCAGCCTAGGAGA 
GAATTTCTAGAAAGTTCCT 
ACCCACGGACAGGCCGCCA 
AATTTCTAGAAAGTCCCTC 
ATTTCTAGAAAGTCCCTCC 
CAAACTCTTGAAAAAAAGG CAAACTCTTGAAAAAAAGC 
AGGCAGCCTAGGAGAAAGC 
ATATGTGACTACACCACCG 
TATGTGACTACACCACCGA 
GTTATATTTTGAAAAAACA 
ATTTCTAGAAAGTTCATTC 
TTTCTAGAAAGTTCATTCC 
TTCACACTTGCCTCGGTGT 
AGCCTAGGAGAAAGAAACA 
GCCTAGGAGAAAGAAACAT 
GTGTAAATATGTGGCTATA 
AGGCAGCCTAGGCGAAAGC 
GGCAGCCTAGGCGAAAGCA 
TCATCAGAAAACTAAGAAT 
TAGGAGATAGCAACATGAT 
AGGAGATAGCAACATGATT 
ATATGGGGCTATACCACTG 
TATGGGGCTATACCACTGA 
ATGGGGCTATACCACTGAC 
GACAAAGCAGTAAAATGTG 
CTAGAAAGTTCATTCCCCT 
CTAGAAAGTTCCTCCCCCT 
AGAGACAAACCCTTGAAAA 
NNNNNNNTTTCTGAATGTT 
GACGGCCCGCCAGGCATTA 
GTCCAGTCATTAAATTCAA 
TTATATTTTGAAAAAACAT 
GGAGAAAGCAACCGGATTT 
TGCCTACCCCACTCCCGGG 
GCCTACCCCACTCCCGGGC 
TGAAAACTCATCAGAAAAC 
GAAAACTCATCAGAAAACT 
AAAACTCATCAGAAAACTG 
AGTCAATATATGTGACTAC 
GTCAATATATGTGACTACA 
TCAATATATGTGACTACAC 
AAGTTCTAGAAAGTTCTTT 
AGTTCTAGAAAGTTCTTTG 
TTGCCTCGGTGTATATATG 
AAACTCATCAGAAAACTGA 
TGAAAAAAAGGCAGCCTAG 
TAACAAATGTGATTTGCCC 
ATCCCCCTGACAGGCCGCC 
AAAGGCAGCCTAGGAGAAA 
AAGGCAGCCTAGGAGAAAG 
AAATTCAAGCTCAAAGAGA 
AATTCAAGCTCAAAGAGAC 
ATTCAAGCTCAAAGAGACA 
TCCCACCAGTCATTAAATT 
CCCACCAGTCATTAAATTC 
CCACCAGTCATTAAATTCA 
GGAGAAAGAAACATGATTT 
GAGAAAGAAACATGATTTT 
TACCACGGACAGGCCGCCA 
CAACCACCAAGAGACAAAC 
AAAAAAGCCAGCCTAGGAG 
AAATTCAAGCCCCAAGAGA 
AATTCAAGCCCCAAGAGAC 
CCTCAGTCAATATATGTGA 
CTCAGTCAATATATGTGAC 
TCAGTCAATATATGTGACT 
AGAAAACTGAGAATCAAGG 
ATTTCTAGAAATTTCCTTC 
TTTCTAGAAATTTCCTTCC 
TAAATGCAAGCTCCAAGAG 
GAAAGCAGCATGATTATTC 
TTTCTAGAAAGTCCCTCCC 
CCCTCCCCCTAAGGCTTTC 
GCTTTCCCACTTCCCTCAG 
TGGGTATCCCGCTGACAGG 
CACCAGTCATTAAATTCAA 
ACCAGTCATTAAATTCAAC 
TGACGGGCCGCCAGTCCTT 
GTCCCTCCCCCTAAGGCTT 
TCCCTCCCCCTAAGGCTTT 
TCCCCTAAAGCTTCCACAC 
CCTAAAGCTTTCCCACTTC 
ATGTGGCTATCCCCCTGAC 
TGTGGCTATCCCCCTGACA TGTGGCTATCCCCCTGACG 
GTGGCTATCCCCCTGACAG 
AATTCAACCACCAAGAGAC 
AAAGCAGCATGATTATTCA 
TAAGAACTGCCCTCCCCCT 
AAGAACTGCCCTCCCCCTA 
GTGTATATATGGGGCTATA 
GCTTTCACAGTTGACTCAG 
CTTTCACAGTTGACTCAGT 
TTTCACAGTTGACTCAGTG 
AGCTTTCCCACTTCCCTCA 
CCTAAAGCTTTCACTCTTG 
CCCAGTCATTAAATTCAAG 
CTAAAGCTTTCACTCTTGC 
TAAGGCTTTCACACTTGCC 
AAGGCTTTCACACTTGCCT 
AGGCTTTCACACTTGCCTC 
GGCCCCCAGTCATTAAATT 
GCCCCCAGTCATTAAATTC 
CCCCCAGTCATTAAATTCA 
ACCACTTACGGGCCGCCAG 
ATCAAGGATAGAATTTCTA 
TTCTAGAAATTTCCTTCCC 
TCTAGAAATTTCCTTCCCC 
GAACTGCCCTCCCCCTAAA 
CAGGAGGGGGCGTCCAGTC 
CTCAGTGTAAATATGTGGC 
GGCTTTCACACTTGCCTCA 
GCCACAGTGAAAATTTGTG 
AAATATGTGCCTATACCAC 
AATATGTGCCTATACCACG 
TTCAAGCTCCAAGAGACAA 
CTAAGGCTTTCACACTTGC 
TCCCCCTGACAGGCCGCCA 
CCCCCTGACAGGCCGCCAG 
TACCACTGACGGGCCGCCA 
AATCATCAGAAAACTGAGA 
GAGACAAACTCTTGAAAAA 
CGACGGCCCGCCAGGCATT 
ATATGAGGCTATACCACTG 
CAAGCCCCAAGAGACAAAC 
CCCCACTCCCGGGCCGCCA 
ATATGTGGCTATCCCACTG 
TATGTGGCTATCCCACTGA 
AAATGCAAGCTCCAAGAGA 
AGGATAGAAGTTCTAGAAA 
AAGAGACAAACCCTTGAAA 
TTGAAAACTCATCAGAAAA 
CAAGCTCCAAGAAACAAAC 
TAGGAGAAAGCAACATGAT 
TCCCCCTGAGGGGCCGCCA 
CGGACAGGCCGCCAGTCAT 
AGGCCCCCAGTCATTAAAT 
CCTACCCCACTCCCGGGCC 
CCAGTCATTAAATTCAACC 
ATAGACTCAAGGGACAAAG 
TAGACTCAAGGGACAAAGC 
CTCGGTGTATATATGTGGC 
TCAGTGTATATATGTGGGT 
CAGTGTATATATGTGGGTA 
TAGGTGGCTATTCCTTTGA 
CCCCAGTCATTAAATTCAA 
GTTTATATATGTGGCTATC 
TTTATATATGTGGCTATCC 
AGACAAACTCTTGAAAAAA 
GACAAACTCTTGAAAAAAA 
CACACTGGCCTCAGTGTAT 
ACACTGGCCTCAGTGTATA 
GCCAGTCATTAAATTCAAG GCCAGTCATTAAATTCAAA 
TTAAAGCTTCCACACTTGC 
ATTAAATTCAAGCTCAAAG 
AAACTCTTGAAAAAAAGCC 
TAGGTGAATATAGGTGGGT 
CCGACGGCCCGCCAGGCAT 
TGGGGCTATACCACTGACA 
GCCAGTCATTAAATGCAAG 
CCAGTCATTAAATGCAAGC 
CAGTCATTAAATGCAAGCT 
TTAAATTCAAGCTCAAAGA 
TAAATTCAAGCTCAAAGAG 
TTGAAAAATCATCAGAAAA 
CGGAGAAAGCAACATGATT 
CCCCTAAAGCCTTCACACT 
CCCTAAAGCCTTCACACTT 
CCTAAAGCCTTCACACTTG 
AAAAGGCAGCCTAGGAGAA 
AAAGCTTTCACAGTTGACT 
AAGCTTTCACAGTTGACTC 
AGCTTTCACAGTTGACTCA 
TATATGTGGCTATCCCACT 
TATATGTGACTACACCACC TATATGGGGCTATACCACT 
AGTCATTAAATGCAAGCTC 
TCACACTTGCCTCGGTGTA 
TCAAGCTCAAAGAGACAAA 
CCTCAGTGTATATTGGTGG 
AAAAAGCCAGCCTAGGAGA 
CCACTTACGGGCCGCCAGT 
CAGTCAATATATGTGACTA 
CCCCCCAGTCATTAAATTC 
AACCCACGGACAGGCCGCC 
CTTCCCCTAAACCTTTCAC 
CCAGTCTTTACTGGTGCTC 
TTCACACTTGCCTCAGTGT 
TCACACTTGCCTCAGTGTA 
GTCCCACCAGTCATTAAAT 
ATACCACTGACGGGCCGCC 
AAGCCCCAAGAGACAAACC 
CCCCCTGAGGGGCCGCCAG 
CCCCTGAGGGGCCGCCAGT 
GCAGCCTCGGAGAAAGCAA 
TGACGGCCCGCCAGTCATT 
AGCAGTAAAATGTGTAATT 
TCTAGAAAGTTCCTCCCCC 
AACCTTTCACACTGGCCTC 
ACCTTTCACACTGGCCTCA 
CCTTTCACACTGGCCTCAG 
CTCAAAGAGACAAACTCTT 
GCAGTAAAATGTGTAATTT 
TTCCCCTAAAGCCTTCACA 
TCCCCTAAAGCCTTCACAC 
CCCAGGAGGGGGCGTCCAG 
CCAGGAGGGGGCGTCCAGT 
GTGCTCTTCCCACTTCCGG 
CGTCCCACCAGTCATTAAA 
GACCCCCAGTCATAAAATT 
AGGTGAATATAGGTGGGTA 
TGGCTATACCACTGACGGG 
CTATACCACTGACAGGCCG 
AGAAAGAAACATGATTTTT 
GAATCAAGGATAGAAGTTC 
AATCAAGGATAGAAGTTCT 
ATTCAAGCTCCAAGAGACA 
TTCTAGAAAGTTCATTCCC 
TCTAGAAAGTTCATTCCCC 
ACTTGCCTCAGTGTATATA 
GGGTCTCCAGTCATTAAAT 
GGTCTCCAGTCATTAAATT 
CGTCCAGTCATTAAATTCA 
TTATATTTTGAAAACTCAT 
TATATTTTGAAAACTCATC 
GATTTCACACTTGTGTCAT 
ATTTCACACTTGTGTCATT 
GGCGTCCCACCAGTCATTA 
GCGTCCCACCAGTCATTAA 
AAAGCTTCCACACTTGCCT 
AAGCTTCCACACTTGCCTC 
AGAACTGCCCTCCCCCTAA 
ACCGACGGCCCGCCAGGCA 
ATAGAATTTCTAGAAATTT 
ATCATCAGAAAACTGAGAA 
GAACCCACGGACAGGCCGC 
TCCCTTAAAGCTTCCACAC 
TATTTTGAAAACTCATCAG 
TCAGTGTATATATGTGGCT 
CAGTGTATATATGTGGCTA 
TTACGGGCCGCCAGTCATT 
GGATAGAATTTCTAGAAAT 
GATAGAATTTCTAGAAATT 
ATCAAGGATAGAAGTTCTA 
TCAAGGATAGAAGTTCTAG 
CAAGGATAGAAGTTCTAGA 
AAAAAGGCAGCCTAGGCGA 
TCTAGAAAGTTCCTTCCCC 
ATATTTTGAAAACTCATCA 
TGAGTGGGGTCTCCAGTCA 
ACGGACAGGCCGCCAGTCA 
CCCGCCAGGCATTAAATTC 
TGTATATATGGGGCTATAC 
ACCCCCAGTCATAAAATTC 
TTCAAGCTCAAAGAGACAA 
CCCCTTGCCTCAGTGTATA 
GGATAGAATTTCTAGAAAG 
CATGTAACAAATGTGATTT 
GGTGCTCTTCCCACTTCCG 
AAGGATAGAAGTTCTAGAA 
CAGTCATAAAATTCAAGCT 
NNNNNNNANATTNTNANAA 
TGACTACACCACCGACGGC 
TACGGGCCGCCAGTCATTA 
CCCAGTCTTTACTGGTGCT 
CCTAAGGCTTTCACACTTG 
AGTAATTGTAAGAACTGCC 
CCCTCCCCGTAAAGCTTTC
200bp of a human genome! 
GGTTTTTCTCATAAAATGA 
TTTTTCTCATAAAATGATT 
TTCTCATAAAATGGTTTCT 
TCTCATAAAATGGTTTCTG TCTCATAAAATGGTTTCTA 
TTTTCTCATAAAATGGTCT 
TTTCTCATAAAATGGTCTC 
TTTGTATGTTTCTTAGCTT 
TTGTATGTTTCTTAGCTTT 
GTTTCTAAATGTTTCTTAG 
ATGTTTCTTAGCTTTCAGT 
TTAGCTTCCAATGGGCAAT 
TTAGCTTTCAATGGGGAAT 
TCCAATGGGCAATAAATAA 
TTCAATGGGCAGTAAATAA 
TAAATAACTTTTAGTGAAA 
AAATAACTTTTAGTGAAAT 
AATAACTTTTAGTGAAATA 
CAATCTGAGGAAGTCTTTG 
AATCTGAGGAAGTCTTTGA 
ATCTGAGGAAGTCTTTGAG 
GAAGTCTTTGAGATGGAGG 
AAGTCTTTGAGATGGAGGG 
TGAGATGGAGGGAAAGCTT 
GAGATGGAGGGAAAGCTTT 
CTATGAGGAGTGCATTAGA 
GAATAGAATCGCTCCAGGA 
AATAGAATCGCTCCAGGAA 
TTATGAGGTGACATTTAAA 
ATGATTCTTAGGTTTCAAT 
TGATTCTTAGGTTTCAATG 
GATTCTTAGGTTTCAATGG 
TTTTCTCATAAAATGATTT 
TTTCTCATAAAATGATTTC 
TAGCTTCCAATGGGCAATA 
AGCTTCCAATGGGCAATAA 
GCTTCCAATGGGCAATAAA 
TTTTTTCTCATAAAATGGT 
TTTCTAAATGTTTCTTAGC 
TTCTAAATGTTTCTTAGCT 
TCTAAATGTTTCTTAGCTT 
TTTTTCTCATAAAATGGTT 
TTTTCTCATAAAATGGTTT 
TTTCAATGGGCAATAAATT 
ACTTTTCGAGATATTGTTG 
ATGAAGCGTAGGCTATGCT 
TGAAGCGTAGGCTATGCTG 
GAAGCGTAGGCTATGCTGC 
TTTTTGTATGTTTCTTAGC 
TTTTGTATGTTTCTTAGCT 
CAATAAATAACTTTTAGGG 
AATAAATAACTTTTAGGGA 
ATAAATAACTTTTAGGGAA 
AATAACTTTTAGGAAAATA 
ATAACTTTTAGGAAAATAG 
TAACTTTTAGGAAAATAGA 
CTGAGATGAAGAGAAGGCT 
TGAGATGAAGAGAAGGCTT 
GAGATGAAGAGAAGGCTTT 
AGCCATTCTGAGGAAGTTT 
GCCATTCTGAGGAAGTTTT 
CCATTCTGAGGAAGTTTTT 
CATTCTGAGGAAGTTTTTG 
ATAAAATGGTCTCTGAATG 
TAAAATGGTCTCTGAATGT 
AAAATGGTCTCTGAATGTT 
GCTTTGCTTTCTATGAGGA 
CTTTGCTTTCTATGAGGAG 
TTTGCTTTCTATGAGGAGT 
GTTTCTTAGCTTCAATGGG 
TCAATGGGCAATAAAAAAC 
CAATGGGCAATAAAAAACT 
AATGGGCAATAAAAAACTT 
AATGGGCAGTAAATAACTT 
ATGGGCAGTAAATAACTTT 
AACTTTTAGGGAAATAGAT 
ACTTTTAGGGAAATAGATG 
CTTTTAGGGAAATAGATGT 
GGAAGCATCTGAGATGAAG 
AGTATTTGAGATGAAGAGA 
AGCTTTCAATGGGGAATAA 
GCTTTCAATGGGGAATAAA 
CTTTCAATGGGGAATAAAT 
GTATTTGAGATGAAGAGAA 
TATTTGAGATGAAGAGAAG 
CCAATCTGAGGAAGCATCT 
CAATCTGAGGAAGCATCTG 
AATCTGAGGAAGCATCTGA 
ATTTGAGATGAAGAGAAGG 
TAGAAGTGAGCCAATCTGA 
AGAAGTGAGCCAATCTGAG 
GAAGTGAGCCAATCTGAGG 
CTATGCTGCCTTTGATGTG 
TATGCTGCCTTTGATGTGT 
ATGCTGCCTTTGATGTGTG 
AACTTTTAGGGAAATAGAA 
ACTTTTAGGGAAATAGAAG 
CTTTTAGGGAAATAGAAGT 
TTTTTGAGATGAAGCGAAG 
TTTTGAGATGAAGCGAAGG 
TTTGAGATGAAGCGAAGGC 
TGTTTTTCTCATAAAATGG 
GTTTTTCTCATAAAATGGT 
TTTTTCTCATAAAATGGTC 
AGTTTTTCTCATAAAATGG 
TCATAAAATGATTTCTGAA 
CATAAAATGATTTCTGAAT 
ATAAAATGATTTCTGAATG 
AGTCTTTGAGATGGAGGGA 
GTCTTTGAGATGGAGGGAA 
TCTTTGAGATGGAGGGAAA 
TGAAGCGAAGGCTTTGCTG 
GTCTATGAGGAGAGCATTA 
TCTATGAGGAGAGCATTAG 
CTATGAGGAGAGCATTAGA 
CCAATCTGTGGAAGCATTT 
CAATCTGTGGAAGCATTTG 
AATCTGTGGAAGCATTTGA 
ATAAAATGGTTTTTGTATG 
TAAAATGGTTTTTGTATGT 
AAAATGGTTTTTGTATGTT 
TTCTCATAAATTGGTTTCT 
TCTCATAAATTGGTTTCTG 
CTCATAAATTGGTTTCTGA 
GTGGGCAATAAATAAATTA 
TGGGCAATAAATAAATTAT 
AAGCGTAGGCTATGCTGCC 
TGATTGCCTTTATGAGGTG 
GATTGCCTTTATGAGGTGA 
ATTGCCTTTATGAGGTGAC 
TGGTTTTTGTATGTTTCTT 
GGTTTTTGTATGTTTCTTA 
GTTTTTGTATGTTTCTTAG 
CTAAATGTTTCTTAGCTTT 
GGGAAAGCTTTGCTGTCTA 
GGAAAGCTTTGCTGTCTAT 
GAGAAGGCTGTGCTGTCTA 
AGAAGGCTGTGCTGTCTAT 
GAAGGCTGTGCTGTCTATG 
TGTATGTTTCTTAGCTTTC 
GTATGTTTCTTAGCTTTCA 
TATGTTTCTTAGCTTTCAA 
TTCTTAGCTTCCAATGGGC 
TCTTAGCTTCCAATGGGCA 
CTTAGCTTCCAATGGGCAA 
AGATGAAGCGAAGGCTTTG 
GATGAAGCGAAGGCTTTGC 
ATGAAGCGAAGGCTTTGCT 
GGAAGCATTTGAGATGAAG 
GAAGCATTTGAGATGAAGA GAAGCATTTGAGATGAAGC 
AAGCATTTGAGATGAAGAG AAGCATTTGAGATGAAGCG 
AATAACTTTTAGGGAAATA 
ATAACTTTTAGGGAAATAG 
TAACTTTTAGGGAAATAGA 
TTTGAGATGAAGAGAAGGC TTTGAGATGAAGAGAAGGG 
CTTTGAGATGGAGGGAAAG 
TTTGAGATGGAGGGAAAGC 
TTGAGATGGAGGGAAAGCT 
TTTCAATGGGGAATAAATA 
TTCAATGGGGAATAAATAA 
TCAATGGGGAATAAATAAC 
CCAATCTGAGGAAGTATCT 
CTGAGGAAGTATCTGAGAT 
TGAGGAAGTATCTGAGATG 
GAGGAAGTATCTGAGATGA 
AGGAAGTATCTGAGATGAA 
GGAAGTATCTGAGATGAAG 
TGCATTAGAATAGAATCGC 
GCATTAGAATAGAATCGCT 
CATTAGAATAGAATCGCTC 
TTCAATGGGCAATAAATAA 
TCAATGGGCAATAAATAAC 
CAATGGGCAATAAATAACT 
GTGAGCTAATCTGAGTAGG 
TGAGCTAATCTGAGTAGGT 
GAGCTAATCTGAGTAGGTA 
AGATGGAGGGAAAGCTTTG 
GATGGAGGGAAAGCTTTGC 
ATGGAGGGAAAGCTTTGCT 
AGATGAAGAGAAGGCTGTG 
GATGAAGAGAAGGCTGTGC 
ATGAAGAGAAGGCTGTGCT 
AGGGAAAGCTTTGCTGTCT 
GCTTTGCTGTCTATGAGGA 
CTTTGCTGTCTATGAGGAG 
TTTGCTGTCTATGAGGAGA TTTGCTGTCTATGAGGAGT 
TTGAGATGAAGAGAAGGCT 
TGAGATGAAGAGAAGGCTG 
GAGATGAAGAGAAGGCTGT 
AAGAGAAGGCTTTGCTTTC 
AGAGAAGGCTTTGCTTTCT 
GAGAAGGCTTTGCTTTCTA 
GAAAAGGGCACCTGTGTTG 
AAAAGGGCACCTGTGTTGA 
AAAGGGCACCTGTGTTGAT 
AGCGTAGGCTATGCTGCCT 
TTGCTTTCTATGAGGAGTG 
TGCTTTCTATGAGGAGTGC 
GCTTTCTATGAGGAGTGCA 
TGAATGATTCTTAGGTTTC 
GAATGATTCTTAGGTTTCA 
AATGATTCTTAGGTTTCAA 
AAGAGAAGGCTTTGCTGTC 
AGAGAAGGCTTTGCTGTCT 
GAGAAGGCTTTGCTGTCTA 
TTTCTGAATGTTTCTTAGC 
TTCTGAATGTTTCTTAGCT 
CGCCAATCTGTGGAAGCAT 
GCCAATCTGTGGAAGCATT 
TATGAGGAGAGCATTAGAA 
ATGAGGAGAGCATTAGAAT 
TGAGGAGAGCATTAGAATA 
GCTGTCTATGAGGAGTGTA 
CTGTCTATGAGGAGTGTAT 
TGTCTATGAGGAGTGTATT 
AGGAGAGCATTAGAATAGA 
GGAGAGCATTAGAATAGAA 
GAGAGCATTAGAATAGAAT 
TGGAGGGAAAGCTTTGCTG 
GGAGGGAAAGCTTTGCTGT 
GAGGGAAAGCTTTGCTGTC 
CAATGGGCAATAAATTACT 
AATGGGCAATAAATTACTT 
ATGGGCAATAAATTACTTT 
AGAGCATTAGAATAGAATC 
GAGCATTAGAATAGAATCG 
AGCATTAGAATAGAATCGC 
TTAGCTTTCAATGGGCAAT 
TAGCTTTCAATGGGCAATA 
AGCTTTCAATGGGCAATAA 
GTGCGCCAATCTGTGGAAG 
TGCGCCAATCTGTGGAAGC 
GCGCCAATCTGTGGAAGCA 
GAGGAGAGCATTAGAATAG 
TTTTAGGGAAATAGAAGTG 
GCAATAAATTACTTTTCGA 
CAATAAATTACTTTTCGAG 
AATAAATTACTTTTCGAGA 
GAGCCAATCTGAGGAAGTC 
AGCCAATCTGAGGAAGTCT 
GCCAATCTGAGGAAGTCTT 
AGATGAAGAGAAGGCTTTG 
GATGAAGAGAAGGCTTTGC 
ATAGAATCGCTCCAGGAAA 
TAGAATCGCTCCAGGAAAA 
AGAATCGCTCCAGGAAAAG 
GGGCAGTAAATAACTTTTA 
GGCAGTAAATAACTTTTAG 
GCAGTAAATAACTTTTAGG 
AATCTGAGGAAGCATTTGA 
ATCTGAGGAAGCATTTGAG 
TCTGAGGAAGCATTTGAGA 
GGTTTTTCTCATAAAATGG 
ATGGGCAATAAATAGCTTT 
TGGGCAATAAATAGCTTTT 
AAGCATCTGAGATGAAGAG 
AGCATCTGAGATGAAGAGA 
GCATCTGAGATGAAGAGAA 
TTCTTAGCTTTCAATGGGG 
TCTTAGCTTTCAATGGGGA 
CTTAGCTTTCAATGGGGAA 
AGTGCATTAGAATAGAATT 
GTGCATTAGAATAGAATTG 
TGCATTAGAATAGAATTGC 
AAAGGTCACCTGTGTTGAT 
AAGGTCACCTGTGTTGATT 
AGGTCACCTGTGTTGATTG 
ATCGCTCCAGGAAAAGGGC 
TCGCTCCAGGAAAAGGGCA 
CGCTCCAGGAAAAGGGCAC 
TAGATGTGAGCTAATCTGA 
AGATGTGAGCTAATCTGAG 
GATGTGAGCTAATCTGAGT 
CCAGGAAAAGGGCACCTGT 
CAGGAAAAGGGCACCTGTG 
AGGAAAAGGGCACCTGTGT 
TAAATAACTTTTAGGAAAA 
AAATAACTTTTAGGAAAAT 
GGAAAAGGTCACCTGTGTT 
GAAAAGGTCACCTGTGTTG 
AAAAGGTCACCTGTGTTGA 
TCATAAATTGGTTTCTGAA 
CATAAATTGGTTTCTGAAT 
ATAAATTGGTTTCTGAATG 
GTATTAGAATAGAATCGCT 
TATTAGAATAGAATCGCTC 
ATTAGAATAGAATCGCTCC 
GAGATGAAGAGAAGGGTTT 
AGATGAAGAGAAGGGTTTG 
GATGAAGAGAAGGGTTTGC 
ATCTGAGGAAGTATTTGAG 
TCTGAGGAAGTATTTGAGA 
CTGAGGAAGTATTTGAGAT 
GCTGTGCTGTCTATGAGGA 
CTGTGCTGTCTATGAGGAG 
TGTGCTGTCTATGAGGAGT 
AGAATTGCTCCAGGAAAAG 
GAATTGCTCCAGGAAAAGG 
AATTGCTCCAGGAAAAGGT 
AAGTTTTTGAGATGAAGCG 
AGTTTTTGAGATGAAGCGA 
GTTTTTGAGATGAAGCGAA 
AATAGAAGTGAGCCAATCT 
ATAGAAGTGAGCCAATCTG 
CTCATAAAATGGTTTCTGA 
TCATAAAATGGTTTCTGAA 
CATAAAATGGTTTCTGAAT 
AATAGAATTGCTCCAGGAA 
ATAGAATTGCTCCAGGAAA 
TAGAATTGCTCCAGGAAAA 
CCAATGGGCAATAAATAAC 
AATGGGCAATAAATAACTT 
TAGCTTTCAATGGGGAATA 
TTGCTGTCTATGAGGAGAG 
TGCTGTCTATGAGGAGAGC 
TTTCTCATAAAATGGTTTC TTTCTCATAAAATGGTTTT 
ATTGCTCCAGGAAAAGGTC 
TTGCTCCAGGAAAAGGTCA 
TGCTCCAGGAAAAGGTCAC 
CTGAATGTTTCTTAGCTTT 
TGAATGTTTCTTAGCTTTC 
GAATGTTTCTTAGCTTTCA 
GCTTTCAATGGGCAATAAA 
CTTTCAATGGGCAATAAAT 
TCTCATAAAATGGTCTCTG 
CTCATAAAATGGTCTCTGA 
TCATAAAATGGTCTCTGAA 
GTTTCTGAATGATTCTTAG 
TTTCTGAATGATTCTTAGG 
TTCTGAATGATTCTTAGGT 
CAGGAAAAGGTAACGTGAG 
AGGAAAAGGTAACGTGAGG 
GGAAAAGGTAACGTGAGGT 
CTTCAATGGGCAATAAAAA 
TTCAATGGGCAATAAAAAA 
TGTTTCTTAGCTTTCAATG 
GTTTCTTAGCTTTCAATGG 
TTTCTTAGCTTTCAATGGG 
GGGCAATAAATTACTTTTC 
GGCAATAAATTACTTTTCG 
CTTGCAATGGGCAATAAAT 
TTGCAATGGGCAATAAATA 
TGCAATGGGCAATAAATAA 
CAATGGGGAATAAATAACT 
AATGGGGAATAAATAACTT 
GCGAAGGCTTTGCTGTCTA 
CGAAGGCTTTGCTGTCTAT 
GAAGGCTTTGCTGTCTATG 
TTTAGGGAAATAGATGTGA 
TTAGGGAAATAGATGTGAG 
TAGGGAAATAGATGTGAGC 
GAATGTTTCTTAGCTTCCA 
AATGTTTCTTAGCTTCCAA 
ATGTTTCTTAGCTTCCAAT 
TCTGAATGATTCTTAGGTT 
TTCAGTGGGCAATAAATAA 
TCAGTGGGCAATAAATAAA 
CAGTGGGCAATAAATAAAT 
GAAGAGAAGGCTTTGCTTT 
GCTGTCTATGAGGAGTGCA 
CTGTCTATGAGGAGTGCAT 
TGTCTATGAGGAGTGCATT 
AACTTTTAGGAAAATAGAT 
ACTTTTAGGAAAATAGATG 
AAGGCTTTGCTGTCTATGA 
TAGCTTTCAATGGGCAGTA 
AGCTTTCAATGGGCAGTAA 
GCTTTCAATGGGCAGTAAA 
AGAGAAGGCTGTGCTGTCT 
CCTGTGTTGATTGCCTTTA 
CTGTGTTGATTGCCTTTAT 
TGTGTTGATTGCCTTTATG 
TGAGGAAGTATTTGAGATG 
GAGGAAGTATTTGAGATGA 
GGAATAAATAACTTTTACG 
GAATAAATAACTTTTACGG 
AATAAATAACTTTTACGGA 
AAACTTTTAGGGAAATAGA 
TAGAATAGAATTGCTCCAG 
AGAATAGAATTGCTCCAGG 
GAATAGAATTGCTCCAGGA 
ATGGTTTCTGAATGTTTCT 
TGGTTTCTGAATGTTTCTT 
GGTTTCTGAATGTTTCTTA 
TTTTCTCATAAATTGGTTT 
TTTCTCATAAATTGGTTTC 
TATGAGGAGTGCATTAGAA 
ATGAGGAGTGCATTAGAAT 
TGAGGAGTGCATTAGAATA 
TTCTCATAAAATGATTTCT 
CCAATCTGAGGAAGTCTTT 
AGCTAATCTGAGTAGGTAT 
CAATCTGAGGAAGTATCTG 
AATCTGAGGAAGTATCTGA 
ATCTGAGGAAGTATCTGAG 
TGTGAGCCATTCTGAGGAA 
GTGAGCCATTCTGAGGAAG 
TGAGCCATTCTGAGGAAGT 
AAAACTTTTAGGGAAATAG 
ATGAGGAGTGTATTAGAAT 
TGAGGAGTGTATTAGAATA 
GAGGAGTGTATTAGAATAG 
GGTTTTTCTCATAAATTGG 
GTTTTTCTCATAAATTGGT 
AAATGTTTCTTAGCTTTCA 
AATGTTTCTTAGCTTTCAA 
ATGTTTCTTAGCTTTCAAT 
TTGAGATGAAGCGTAGGCT 
TGAGATGAAGCGTAGGCTA 
GAGATGAAGCGTAGGCTAT 
TATGAGGAGTGTATTAGAA 
TGGGGAATAAATAACTTTT 
GGGGAATAAATAACTTTTA 
GGGAATAAATAACTTTTAC 
GGTTTCTGAATGATTCTTA 
CGCTCCAGGAAAAGGTCAC 
GCTCCAGGAAAAGGTCACC 
CTCCAGGAAAAGGTCACCT 
GTCACCTGTGTTGATTGCC 
TCACCTGTGTTGATTGCCT 
CACCTGTGTTGATTGCCTT 
GTGTTGATTGCCTTTATGA 
AGAAGGCTTTGCTTTCTAT 
GAAGGCTTTGCTTTCTATG 
AAGGCTTTGCTTTCTATGA 
GAGATGAAGCGAAGGCTTT 
TCTGAGGAAGTATCTGAGA 
TGAGGAAGCATCTGAGATG 
GAGGAAGCATCTGAGATGA 
AGGAAGCATCTGAGATGAA 
GCTCCAGGAAAAGGGCACC 
TCGCTCCAGGAAAAGGTCA 
TCAATGGGCAGTAAATAAC 
CAATGGGCAGTAAATAACT 
ATTCTTAGGTTTCAATGGG 
TTCTTAGGTTTCAATGGGC 
TCTTAGGTTTCAATGGGCA 
CTGTCTATGAGGAGAGCAT 
TGTCTATGAGGAGAGCATT 
AGCCAATCTGAGGAAGCAT 
GCCAATCTGAGGAAGCATC GCCAATCTGAGGAAGCATT 
CCAATCTGAGGAAGCATTT 
TTTCTATGAGGAGTGCATT 
TTCTATGAGGAGTGCATTA 
TCTATGAGGAGTGCATTAG 
AAGGCTGTGCTGTCTATGA 
GAAGCATCTGAGATGAAGA 
ATGAAGAGAAGGGTTTGCT 
TGAAGAGAAGGGTTTGCTG 
GAAATAGATGTGAGCCAAT 
AAATAGATGTGAGCCAATC 
AATAGATGTGAGCCAATCT 
TTAGAATAGAATCGCTCCA 
TTCTCATAAAATGGTTTTT 
TCTCATAAAATGGTTTTTG 
CTCATAAAATGGTTTTTGT 
CTCCAGGAAAAGGTAACGT 
TCCAGGAAAAGGTAACGTG 
CCAGGAAAAGGTAACGTGA 
TTGATTGCCTTTATGAGGT 
GAAGTATCTGAGATGAAGA 
AGGAAAAGGTCACCTGTGT 
ACCTGTGTTGATTGCCTTT 
AGGAGTGTATTAGAATAGA 
GGAGTGTATTAGAATAGAA 
GAGTGTATTAGAATAGAAT 
TAAATGTTTCTTAGCTTTC 
TCTTAGCTTCAATGGGCAA 
CTTAGCTTCAATGGGCAAT 
TTAGCTTCAATGGGCAATA 
AAGGGCACCTGTGTTGATT 
TCTGTGGAAGCATTTGAGA 
CTGTGGAAGCATTTGAGAT 
TGTGGAAGCATTTGAGATG 
TTGGTTTCTGAATGATTCT 
TGGTTTCTGAATGATTCTT 
AATGGTTTCTAAATGTTTC 
ATGGTTTCTAAATGTTTCT 
TGGTTTCTAAATGTTTCTT 
TTGAGATGAAGCGAAGGCT 
TGAGATGAAGCGAAGGCTT 
ATCTGAGGAAGCATCTGAG 
TCTGAGGAAGCATCTGAGA 
CTCCAGGAAAAGGGCACCT 
TAACTTTTACGGAAATAGA 
AACTTTTACGGAAATAGAT 
ACTTTTACGGAAATAGATG 
AGCATTTGAGATGAAGAGA 
GCATTTGAGATGAAGAGAA 
TAGGTTTCAATGGGCATTA 
AGGTTTCAATGGGCATTAA 
GGTTTCAATGGGCATTAAA 
ATAAATTACTTTTCGAGAT 
GGAAAATAGATGTGAGCCA 
GAAAATAGATGTGAGCCAA 
AAAATAGATGTGAGCCAAT 
AGGCTATGCTGCCTTTGAT 
GGCTATGCTGCCTTTGATG 
GCTATGCTGCCTTTGATGT 
AGGCTTTGCTGTCTATGAG 
GGCTTTGCTGTCTATGAGG 
TTTCGAGATATTGTTGTGC 
TTCGAGATATTGTTGTGCG 
TCGAGATATTGTTGTGCGC 
AGAAGGCTTTGCTGTCTAT 
CTGAATGTTTCTTAGCTTC 
TGAATGTTTCTTAGCTTCC 
GTAGGCTATGCTGCCTTTG 
TAGGCTATGCTGCCTTTGA 
GCCTTTATGAGGTGACATT 
CCTTTATGAGGTGACATTT 
CTTTATGAGGTGACATTTA 
TTTCAATGGGCAATAAATA 
TTTTTCTCATAAATTGGTT 
TTAGCTTTCAATGGGCAGT 
TCAATGGGCAATAAATAGC 
CAATGGGCAATAAATAGCT 
AATGGGCAATAAATAGCTT 
AAATGGTTTTTGTATGTTT 
AATGTTTCTTAGCTTTCAG 
AATCGCTCCAGGAAAAGGT 
ATCGCTCCAGGAAAAGGTA ATCGCTCCAGGAAAAGGTC 
TCGCTCCAGGAAAAGGTAA TCGCTCCAGGAAAAGGTCC 
TAAAAAACTTTTAGGGAAA 
AAAAAACTTTTAGGGAAAT 
AAAAACTTTTAGGGAAATA 
GAATCGCTCCAGGAAAAGG 
ATTGTTGTGCGCCAATCTG 
TTGTTGTGCGCCAATCTGT 
TGTTGTGCGCCAATCTGTG 
CAGGAAAAGGTCACCTGTG 
AGGGCACCTGTGTTGATTG 
GGGCACCTGTGTTGATTGC 
GGCACCTGTGTTGATTGCC 
TTTCTTAGCTTCAATGGGC 
TTCTTAGCTTCAATGGGCA 
CTCATAAAATGGTTTCTAA 
TCATAAAATGGTTTCTAAA 
CATAAAATGGTTTCTAAAT 
ATGTGAGCTAATCTGAGTA 
GTCTATGAGGAGTGCATTA 
TCTCATAAAATGATTTCTG 
CTCATAAAATGATTTCTGA 
CATTTGAGATGAAGAGAAG 
GTTGTGCGCCAATCTGTGG 
TTGTGCGCCAATCTGTGGA 
ATGAAGAGAAGGCTTTGCT 
TAAAATGGTTTCTAAATGT 
AAAATGGTTTCTAAATGTT 
AAATGGTTTCTAAATGTTT 
TTCTTAGCTTTCAGTGGGC 
TCTTAGCTTTCAGTGGGCA 
CTTAGCTTTCAGTGGGCAA 
ATAAATAACTTTTACGGAA 
TAAATAACTTTTACGGAAA 
CTGAGGAAGTCTTTGAGAT 
TGAGGAAGTCTTTGAGATG 
GAGGAAGTCTTTGAGATGG 
CTTTCTATGAGGAGTGCAT 
AATCGCTCCAGGAAAAGGG 
GTGGAAGCATTTGAGATGA 
TGGAAGCATTTGAGATGAA 
TTGAGATGAAGAGAAGGGT 
TGAGATGAAGAGAAGGGTT 
GCATTTGAGATGAAGCGTA 
CATTTGAGATGAAGCGTAG 
ATTTGAGATGAAGCGTAGG 
AGGCTTTGCTTTCTATGAG 
GGCTTTGCTTTCTATGAGG 
TAAATAACTTTTAGGGAAA 
AAATAACTTTTAGGGAAAT 
TCAATGGGCAATAAATTAC 
TAGGGAAATAGAAGTGAGC 
AGGGAAATAGAAGTGAGCC 
GGGAAATAGAAGTGAGCCA 
CTGAGGAAGCATCTGAGAT 
ATAAAATGGTTTCTAAATG 
AATAACTTTTACGGAAATA 
ATAACTTTTACGGAAATAG 
TCTGAGTAGGTATTTGAGA 
CTGAGTAGGTATTTGAGAT 
TGAGTAGGTATTTGAGATG 
TCCAGGAAAAGGTCACCTG 
TCATAAAATGGTTTTTGTA 
TGCTGCCTTTGATGTGTGC 
GCTGCCTTTGATGTGTGCT 
TGTTTCTTAGCTTCCAATG 
GTTTCTTAGCTTCCAATGG 
GCAATGGGCAATAAATAAC 
AATTACTTTTCGAGATATT 
ATTACTTTTCGAGATATTG 
TTACTTTTCGAGATATTGT 
ATGGGCAATAAAAAACTTT 
TGGGCAATAAAAAACTTTT 
CAATCTGAGGAAGCATTTG 
CAATAAATAACTTTTAGGA 
AATAAATAACTTTTAGGAA 
ATAAATAACTTTTAGGAAA 
AAATTGGTTTCTGAATGAT 
AATTGGTTTCTGAATGATT 
ATTGGTTTCTGAATGATTC 
TTTATGAGGTGACATTTAA 
AGGGAAATAGATGTGAGCC AGGGAAATAGATGTGAGCT 
GGGAAATAGATGTGAGCCA GGGAAATAGATGTGAGCTA 
GTTTTTCTCATAAAATGAT 
CTTCCAATGGGCAATAAAT 
TTCCAATGGGCAATAAATA 
AGGAGTGCATTAGAATAGA 
GGAGTGCATTAGAATAGAA 
GAGTGCATTAGAATAGAAT 
CCAATCTGAGGAAGTATTT 
CAATCTGAGGAAGTATTTG 
AATCTGAGGAAGTATTTGA 
CAGTAAATAACTTTTAGGG 
AGTAAATAACTTTTAGGGA 
TTCTGAGGAAGTTTTTGAG 
TCTGAGGAAGTTTTTGAGA 
CTGAGGAAGTTTTTGAGAT 
TTTTAGGGAAATAGATGTG 
TTCAATGGGCAATAAATAG 
AATAAATAACTTTTAGTGA 
ATAAATAACTTTTAGTGAA 
TTGCCTTTATGAGGTGACA 
TGCCTTTATGAGGTGACAT 
AGTGGGCAATAAATAAATT 
AATGTTTCTTAGCTTCAAT 
ATGTTTCTTAGCTTCAATG 
TGTTTCTTAGCTTCAATGG 
GTAAATAACTTTTAGGGAA 
ATTCTGAGGAAGTTTTTGA 
TGGGCAGTAAATAACTTTT 
CGCTCCAGGAAAAGGTAAC 
GCTCCAGGAAAAGGTAACG 
AGGCTGTGCTGTCTATGAG 
GGCTGTGCTGTCTATGAGG 
GCGTAGGCTATGCTGCCTT 
CGTAGGCTATGCTGCCTTT 
CCAGGAAAAGGTCACCTGT 
CTTAGGTTTCAATGGGCAT 
TTAGGTTTCAATGGGCATT 
TTCAATGGGCAATAAATTA 
TGGGCAATAAATTACTTTT 
TGAGCCAATCTGAGGAAGC 
GAGCCAATCTGAGGAAGCA 
ATGGGGAATAAATAACTTT 
TGTGAGCTAATCTGAGTAG 
AGGTATTTGAGATGAAGAG 
GGTATTTGAGATGAAGAGA 
GCTGTCTATGAGGAGAGCA 
GCATTAGAATAGAATTGCT 
CATTAGAATAGAATTGCTC 
ATTAGAATAGAATTGCTCC 
AGTGAGCCAATCTGAGGAA 
GTGAGCCAATCTGAGGAAG 
TGAGCCAATCTGAGGAAGT 
TCTATGAGGAGTGTATTAG 
CTATGAGGAGTGTATTAGA 
GGAAATAGAAGTGAGCCAA 
GTTTCTGAATGTTTCTTAG 
CATCTGAGATGAAGAGAAG 
TAAATTACTTTTCGAGATA 
AAATTACTTTTCGAGATAT 
GGAAAAGGGCACCTGTGTT 
AAGTGAGCCAATCTGAGGA 
TTCTTAGCTTTCAATGGGC 
TCTTAGCTTTCAATGGGCA 
CTTAGCTTTCAATGGGCAA 
AGCTTTCAGTGGGCAATAA 
GCTTTCAGTGGGCAATAAA 
CTTTCAGTGGGCAATAAAT 
TGTATTAGAATAGAATCGC 
AAATAACTTTTACGGAAAT 
AGATGAAGCGTAGGCTATG 
GATGAAGCGTAGGCTATGC 
GAAATAGAAGTGAGCCAAT 
AAATAGAAGTGAGCCAATC 
TAAATTGGTTTCTGAATGA 
GAGTAGGTATTTGAGATGA 
AGTAGGTATTTGAGATGAA 
TTTCTTAGCTTCCAATGGG 
TGAAGAGAAGGCTGTGCTG 
TCTGAATGTTTCTTAGCTT 
TTAGCTTTCAGTGGGCAAT 
TAGCTTTCAGTGGGCAATA 
TGTGAGCCAATCTGAGGAA 
TGAGGAAGTTTTTGAGATG 
GAGGAAGTTTTTGAGATGA 
AATGGTTTTTGTATGTTTC 
TTTAGGAAAATAGATGTGA 
TTAGGAAAATAGATGTGAG 
TAGGAAAATAGATGTGAGC 
GTTTTTTCTCATAAAATGG 
TAGCTTCAATGGGCAATAA 
AGCTTCAATGGGCAATAAA 
GCTTCAATGGGCAATAAAA 
AGGAAGTCTTTGAGATGGA 
GGAAGTCTTTGAGATGGAG 
GTGTATTAGAATAGAATCG 
GAAGTTTTTGAGATGAAGC 
GTCTATGAGGAGTGTATTA 
AGCATTTGAGATGAAGCGT 
TAATCTGAGTAGGTATTTG 
AATCTGAGTAGGTATTTGA 
ATCTGAGTAGGTATTTGAG 
AGATGTGAGCCAATCTGAG 
GATGTGAGCCAATCTGAGG 
ATGTGAGCCAATCTGAGGA 
TGTGCGCCAATCTGTGGAA 
ATCTGAGATGAAGAGAAGG 
TCTGAGATGAAGAGAAGGC 
GAAGAGAAGGGTTTGCTGT 
ATGGGCAATAAATAACTTT 
GGAAGTATTTGAGATGAAG 
GAAGTATTTGAGATGAAGA 
AAGTATTTGAGATGAAGAG 
GAGCCATTCTGAGGAAGTT 
GCCAATCTGAGGAAGTATT GCTAATCTGAGTAGGTATT 
TTTCAGTGGGCAATAAATA 
AGGAAGTATTTGAGATGAA 
TAGGTATTTGAGATGAAGA 
GGGCAATAAATAACTTTTA 
GGCAATAAATAACTTTTAG 
GCAATAAATAACTTTTAGT GCAATAAATAACTTTTAGG 
CTTTCAATGGGCAGTAAAT 
GGAAATAGATGTGAGCCAA 
TTTTAGGAAAATAGATGTG 
GTTGATTGCCTTTATGAGG 
AAGAGAAGGGTTTGCTGTC 
AGAGAAGGGTTTGCTGTCT 
CGAGATATTGTTGTGCGCC 
GAGATATTGTTGTGCGCCA 
AGATATTGTTGTGCGCCAA 
GAGGAAGCATTTGAGATGA 
AGGAAGCATTTGAGATGAA 
TTGCTGTCTATGAGGAGTG 
TGCTGTCTATGAGGAGTGC TGCTGTCTATGAGGAGTGT 
TCCAGGAAAAGGGCACCTG 
TTTCTTAGCTTTCAGTGGG 
ATCTGTGGAAGCATTTGAG 
GGAAATAGATGTGAGCTAA 
GAAATAGATGTGAGCTAAT 
ATAAAATGGTTTCTGAATG 
GAAGCGAAGGCTTTGCTGT 
TGGGCAATAAATAACTTTT 
AAATGGTTTCTGAATGTTT 
AATGGTTTCTGAATGTTTC 
AGGAAGTTTTTGAGATGAA 
GGAAGTTTTTGAGATGAAG 
TTCTCATAAAATGGTCTCT 
TTTGAGATGAAGCGTAGGC 
AGTGCATTAGAATAGAATC 
TTTAGGGAAATAGAAGTGA 
TTAGGGAAATAGAAGTGAG 
AGTGTATTAGAATAGAATC 
GGCAATAAAAAACTTTTAG 
GCAATAAAAAACTTTTAGG 
CAATAAAAAACTTTTAGGG 
ATAGATGTGAGCTAATCTG 
CTGAATGATTCTTAGGTTT 
GTTTCTTAGCTTTCAGTGG 
CATAAAATGGTCTCTGAAT 
ATAAAAAACTTTTAGGGAA 
TACTTTTCGAGATATTGTT 
TGTTGATTGCCTTTATGAG 
CATAAAATGGTTTTTGTAT 
AAGAGAAGGCTGTGCTGTC 
GGGCAATAAAAAACTTTTA 
AATAGATGTGAGCTAATCT 
GTAGGTATTTGAGATGAAG 
AGGAAAATAGATGTGAGCC 
CTGAGGAAGCATTTGAGAT 
TGAGGAAGCATTTGAGATG 
AGCGAAGGCTTTGCTGTCT 
CTAATCTGAGTAGGTATTT 
CTTTTAGGAAAATAGATGT 
TTAGAATAGAATTGCTCCA 
GTGCATTAGAATAGAATCG 
ATATTGTTGTGCGCCAATC 
TATTGTTGTGCGCCAATCT 
TCTGAGGAAGTCTTTGAGA 
GAAGAGAAGGCTGTGCTGT 
AGCCAATCTGAGGAAGTAT 
GCCAATCTGAGGAAGTATC 
GATATTGTTGTGCGCCAAT 
TAAAATGGTTTCTGAATGT 
AAAATGGTTTCTGAATGTT 
CAATAAATAACTTTTAGTG 
GGTTTCTAAATGTTTCTTA 
TAGAATAGAATCGCTCCAG 
ATGGTTTTTGTATGTTTCT 
GAGGAGTGCATTAGAATAG 
AATAAAAAACTTTTAGGGA 
TTTTCGAGATATTGTTGTG 
TGTTTCTTAGCTTTCAGTG 
GAAGAGAAGGCTTTGCTGT 
TTTCAATGGGCAGTAAATA 
GGTCACCTGTGTTGATTGC 
ATAGATGTGAGCCAATCTG 
TAGATGTGAGCCAATCTGA 
CTGCCTTTGATGTGTGCTT 
GAGCCAATCTGAGGAAGTA 
AAGCGAAGGCTTTGCTGTC 
AAATAGATGTGAGCTAATC 
GCACCTGTGTTGATTGCCT 
GTTTCAATGGGCATTAAAT 
AGAATAGAATCGCTCCAGG 
CTTTTCGAGATATTGTTGT 
TGAAGAGAAGGCTTTGCTG TGAAGAGAAGGCTTTGCTT 
TGCCTTTGATGTGTGCTTT 
There are over 3000 20-mers,! 
and over 30 valid paths!
Help‽ What Can We Do? 
• For some errors, we can inspect the de Brujin 
graph directly, and eliminate edges from the graph 
• More generally, we can look at the distribution of 
k-mers, and try to make corrections to the reads
Trimming Spurs 
• Since errors are at the ends of reads, we see spurious branches 
off of the graph 
• Use heuristics to determine whether we can remove these nodes 
• E.g., if these nodes are only present in 1 read, probably OK
The k-mer Spectrum 
• If we look at the frequencies of k-mers, we see 
something interesting…
What Is This Spike?
Those Are Our Errors! 
• Errors create low-frequency substrings 
• We can identify errors with a mixture model: 
• Mixture of poissons 
• Distribution with lowest mean —> errors 
• From here, we can remove those “erroneous” 
strings, and pick likely replacements
How Do We Define Likely? 
• Can use edit distance of replacement as a heuristic 
• Can define a probabilistic measure for the quality of 
a replacement:
Dealing With Repeats 
• A cycle in a de Brujin graph is caused by repeated 
sequence 
• In real genomes, there is a lot of repetition: 
• Structural variation —> duplicated sequences 
• Transposons/Mobile Elements 
• Centromeres and Telomeres
Increased k-mer Length 
ACA CAC ACT 
GCA TGC CTG 
ACACTGCACT 
ACACT CACTG ACTGC 
GCACT TGCAC CTGCA 
• If we have a sequence which is less than b bases 
long, we can resolve the repeat by using k-mers 
with k > b
Scaffolding 
It was the best of times, it was the worst of times… 
the best of 
best of times was the worst 
It was the 
worst of times 
times, it was 
• Current sequencing technology gives us paired reads, 
with approximately known distance between reads
Scaffolding 
• We can use this to estimate repeat sizes: 
• Or, to estimate the size of gaps: 
smaller! 
bigger!
How About Large Repeats? 
Twitter, @infoecho, 9/12/2014
Long Reads To The 
Rescue!
Opportunities 
• New read technologies are available 
• Provide much longer reads (250bp vs. >10kbp) 
• Different error model… (15% INDEL errors, vs. 2% SNP errors) 
• Generally, lower sequence specific bias 
• But, need to improve OLC assembler performance! 
Left: PacBio homepage, Right: Wired, http://www.wired.com/2012/03/oxford-nanopore-sequencing-usb/
Can we turn an expensive, 
serial problem into a 
cheap, parallel problem?
Fast Overlapping with 
MinHashing 
• Wonderful realization by Berlin et al1: overlapping is 
similar to document similarity problem 
• Use MinHashing to approximate similarity: 
1: Berlin et al, bioRxiv 2014 
Per document/read, 
compute signature:! 
! 
1. Cut into shingles 
2. Apply random 
hashes to shingles 
3. Take min over all 
random hashes 
Hash into buckets:! 
! 
Signatures of length l 
can be hashed into b 
buckets, so we expect 
to compare all elements 
with similarity 
≥ (1/b)^(b/l) 
Compare:! 
! 
For two documents with 
signatures of length l, 
Jaccard similarity is 
estimated by 
(# equal hashes) / l 
! 
Can reduce complexity from O(n2) to O(nb)!
MapReduce 
• Intuition: if we have a data parallel algorithm, we can 
run the algorithm across many computers 
• Many popular systems: 
• MapReduce at Google 
• Hadoop 
• (from Berkeley!) 
• Provide special programming models for graphs…
MinHash On MR 
Per document/read, 
compute signature:! 
! 
1. Cut into shingles 
2. Apply random 
hashes to shingles 
3. Take min over all 
random hashes 
Hash into buckets:! 
! 
Signatures of length l 
can be hashed into b 
buckets, so we expect 
to compare all elements 
with similarity 
≥ (1/b)^(b/l) 
Compare:! 
! 
For two documents with 
signatures of length l, 
Jaccard similarity is 
estimated by 
(# equal hashes) / l 
! 
map groupBy map + filter
Transitive Reduction 
• We can find a consensus between clique members 
• Or, we can reduce down: 
• Can be implemented efficiently using graph-optimized 
MapReduce libraries!

More Related Content

What's hot

RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewSean Davis
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
Algorithm research project neighbor joining
Algorithm research project neighbor joiningAlgorithm research project neighbor joining
Algorithm research project neighbor joiningJay Mehta
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
Marker assissted selection
Marker assissted selectionMarker assissted selection
Marker assissted selectionmuzamil ahmad
 
Gene sequencing technique
Gene sequencing techniqueGene sequencing technique
Gene sequencing techniqueDarshan Patel
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Sequencing, Alignment and Assembly
Sequencing, Alignment and AssemblySequencing, Alignment and Assembly
Sequencing, Alignment and AssemblyShaun Jackman
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Karan Veer Singh
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Data Science Thailand
 
Lecture 3 l dand_haplotypes_full
Lecture 3 l dand_haplotypes_fullLecture 3 l dand_haplotypes_full
Lecture 3 l dand_haplotypes_fullLekki Frazier-Wood
 

What's hot (20)

Genomic Data Analysis
Genomic Data AnalysisGenomic Data Analysis
Genomic Data Analysis
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Ensembl annotation
Ensembl annotationEnsembl annotation
Ensembl annotation
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Algorithm research project neighbor joining
Algorithm research project neighbor joiningAlgorithm research project neighbor joining
Algorithm research project neighbor joining
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Marker assissted selection
Marker assissted selectionMarker assissted selection
Marker assissted selection
 
Gene sequencing technique
Gene sequencing techniqueGene sequencing technique
Gene sequencing technique
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Sequencing, Alignment and Assembly
Sequencing, Alignment and AssemblySequencing, Alignment and Assembly
Sequencing, Alignment and Assembly
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,Single nucleotide polymorphisms (sn ps), haplotypes,
Single nucleotide polymorphisms (sn ps), haplotypes,
 
Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)Single Nucleotide Polymorphism Analysis (SNPs)
Single Nucleotide Polymorphism Analysis (SNPs)
 
Lecture 3 l dand_haplotypes_full
Lecture 3 l dand_haplotypes_fullLecture 3 l dand_haplotypes_full
Lecture 3 l dand_haplotypes_full
 

Viewers also liked

Climbing Mt. Metagenome
Climbing Mt. MetagenomeClimbing Mt. Metagenome
Climbing Mt. Metagenomec.titus.brown
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 
PyCon 2011 talk - ngram assembly with Bloom filters
PyCon 2011 talk - ngram assembly with Bloom filtersPyCon 2011 talk - ngram assembly with Bloom filters
PyCon 2011 talk - ngram assembly with Bloom filtersc.titus.brown
 

Viewers also liked (6)

Climbing Mt. Metagenome
Climbing Mt. MetagenomeClimbing Mt. Metagenome
Climbing Mt. Metagenome
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
001 bacterial panicle blight, milton rush
001   bacterial panicle blight, milton rush001   bacterial panicle blight, milton rush
001 bacterial panicle blight, milton rush
 
PyCon 2011 talk - ngram assembly with Bloom filters
PyCon 2011 talk - ngram assembly with Bloom filtersPyCon 2011 talk - ngram assembly with Bloom filters
PyCon 2011 talk - ngram assembly with Bloom filters
 

Similar to CS176: Genome Assembly

Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analysesfnothaft
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...Anton Alexandrov
 
2013 py con awesome big data algorithms
2013 py con awesome big data algorithms2013 py con awesome big data algorithms
2013 py con awesome big data algorithmsc.titus.brown
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizonac.titus.brown
 
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...PROIDEA
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Torsten Seemann
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsNesreen K. Ahmed
 
Probabilistic breakdown of assembly graphs
Probabilistic breakdown of assembly graphsProbabilistic breakdown of assembly graphs
Probabilistic breakdown of assembly graphsc.titus.brown
 

Similar to CS176: Genome Assembly (20)

Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
Ch09 combinatorialpatternmatching
Ch09 combinatorialpatternmatchingCh09 combinatorialpatternmatching
Ch09 combinatorialpatternmatching
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
Combining de Bruijn graph, overlap graph and microassembly for de novo genome...
 
2013 py con awesome big data algorithms
2013 py con awesome big data algorithms2013 py con awesome big data algorithms
2013 py con awesome big data algorithms
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Word2vec and Friends
Word2vec and FriendsWord2vec and Friends
Word2vec and Friends
 
2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona2012 talk to CSE department at U. Arizona
2012 talk to CSE department at U. Arizona
 
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Graph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph AnalyticsGraph Sample and Hold: A Framework for Big Graph Analytics
Graph Sample and Hold: A Framework for Big Graph Analytics
 
Lecture6.pptx
Lecture6.pptxLecture6.pptx
Lecture6.pptx
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
Probabilistic breakdown of assembly graphs
Probabilistic breakdown of assembly graphsProbabilistic breakdown of assembly graphs
Probabilistic breakdown of assembly graphs
 

More from fnothaft

Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMfnothaft
 
Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems fnothaft
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadofnothaft
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Modelsfnothaft
 
Execution Environments
Execution EnvironmentsExecution Environments
Execution Environmentsfnothaft
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Handsfnothaft
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAMfnothaft
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114fnothaft
 
ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014fnothaft
 

More from fnothaft (9)

Scalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAMScalable Genome Analysis with ADAM
Scalable Genome Analysis with ADAM
 
Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems Rethinking Data-Intensive Science Using Scalable Analytics Systems
Rethinking Data-Intensive Science Using Scalable Analytics Systems
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocado
 
Reproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral ModelsReproducible Emulation of Analog Behavioral Models
Reproducible Emulation of Analog Behavioral Models
 
Execution Environments
Execution EnvironmentsExecution Environments
Execution Environments
 
PacMin @ AMPLab All-Hands
PacMin @ AMPLab All-HandsPacMin @ AMPLab All-Hands
PacMin @ AMPLab All-Hands
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114
 
ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014ADAM—Spark Summit, 2014
ADAM—Spark Summit, 2014
 

Recently uploaded

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 

Recently uploaded (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 

CS176: Genome Assembly

  • 1. Genome Assembly Frank Austin Nothaft CS176, 10/16/2014
  • 2. Processing Reads • As we’ve covered before, if we already have a reference assembly, we can process reads by aligning to the reference genome
  • 3. The Sequencing Abstraction It was the best of times, it was the worst of times… worst of times was the worst the worst of • Sequencing performs a poisson distributed sampling of substrings from a larger string • Reads are exact substrings (i.e., error free) Metaphor borrowed from Michael Schatz It was the the best of times, it was best of times
  • 4. The Alignment Abstraction It was the best of times, it was the worst of times… It was the the best of times, it was worst of times the worst of best of times was the worst was the worst It was the worst of times the best of the worst of times, it was best of times
  • 5. But! • What do we do if we don’t have a reference genome to map against? • Can we use information in the reads to assemble the reads together into a string?
  • 6. Sequence Assembly was the worst best of times It was the worst of times the best of the worst of times, it was It was the the best of best of times times, it was was the worst the worst of worst of times It was the best of times, it was the worst of times…
  • 7. The Assembly Problem • Given a set of reads, we want to assemble the “best” contigs possible • Contig = contiguous sequence • Two general formulations for assembly: • Overlap-layout-consensus (OLC) • de Brujin graph (DBG)
  • 8. Assembly was the Human Genome Project! (in a nutshell)
  • 9. Assembly is Graph Traversal • In OLC, we create an overlap graph, and find a Hamiltonian path • In DBG, we create a de Brujin graph, and find an Eulerian path
  • 10. Overlap Graphs • Given a set of reads, represents how these reads overlap Nodes are reads, edges are overlaps.
  • 11. Example Overlap Graph It was the the best of times, it was the worst of worst of times best of times was the worst the best of It was the times, it was worst of times the worst of best of times was the worst
  • 12. Hamiltonian Path • A Hamiltonian Path is a path which visits each node in the graph exactly once
  • 13. Computing Overlaps • To compute overlaps between two reads, we compute the pairwise alignment of these two reads • This can be done using dynamic programming (Smith-Waterman) or a profile HMM • We can accelerate this with indexing-based methods, similar to those in SNAP
  • 14. Two Problems 1. Overlapping is expensive: • Must compute O(n2) overlaps, n = # reads • Computing an overlap is O(l2), l = read length 2. Hamiltonian Path is NP-hard: • Approximate solvers exist, but don’t scale up to genomics datasets
  • 15. de Brujin Graphs • In a de Brujin graph, nodes are k-mers, and edges represent observed transitions between k-mers • k-mers are k-length substrings from reads ACACTGCACT ACCAAC ACT CTG T GGCCA C AACCT
  • 16. de Brujin Graphs • In a de Brujin graph, we may have multiple paths between two nodes ACACTGCACT ACCAAC ACT CTG T GGCCA C AACCT ACA CAC ACT GCA TGC CTG
  • 17. Eulerian Path • In an Eulerian path, we use every edge exactly once • Preconditions for finding an Eulerian path assembly on a DBG: 1. One node must have one more edge leaving than entering 2. One node must have one more edge entering than leaving 3. All other nodes must have equal numbers of edges entering and leaving
  • 18. Finding an Eulerian Path • Connect the two nodes with unbalanced edges • This provides us an Eulerian cycle • From an arbitrary node n, walk the graph until we return from n, and save the path we’ve walked • Until all edges have been used: • Pick a point n’ from our path, where n’ has unused edges • Walk from n’ until we return to n’, and track visited edges
  • 19. Problems with Eulerian Path • For a given graph, we may have multiple valid paths! CAA AAT 8 9 1 2 ACA CAC ACT GCA TGC CTG 10 ATG 7 3 5 4 6 11 ACACTGCACAATGC CAA AAT 1 2 8 9 ACA CAC ACT GCA TGC CTG 3 ATG 7 10 5 11 6 4 ACAATGCACACTGC
  • 20. How Many Paths? Kingsford et al, BMC Bioinformatics 2010
  • 21. How Do We Assemble Multiple Reads? • In practice, de Brujin graphs are additive • This allows us to merge graphs from multiple reads • When do we keep/remove edges?
  • 22. ACACTGC ACCAAC ACT CTG TGC CAC ACT GCA TGC CTG CTGCACT CTG T GGCCA C AACCT ACA CAC ACT TGC CTG ACA CAC ACT GCA TGC CTG
  • 23. Errors! • One of the key assumptions that we make in the sequencing process is that reads are correct • But, in reality, reads have a 2% error rate • How does this impact us?
  • 24. What Are The Errors Like? ACATATAGAA AGATATAGAN • Currently, the most common sequencing technology is called Illumina • Errors tend to be a misread of a single base • Errors tend to be clustered at the ends of reads
  • 25. Errors In Action ACCCAAATCTAATCAAGGC CCCAAATCTAATCAAGGCT ACTCTACCTCCCAAGCTCT CTCTACCTCCCAAGCTCTA TCTACCTCCCAAGCTCTAG CTACCTCCCAAGCTCTAGG CCAAATCTAATCAAGGCTC CAAATCTAATCAAGGCTCC AAATCTAATCAAGGCTCCC AATCTAATCAAGGCTCCCA CTAACTCCCAAGCTCTAGG AAAGGAAGATCATGAAATA AAGGAAGATCATGAAATAC AGGAAGATCATGAAATACC GGAAGATCATGAAATACCA GAAGATCATGAAATACCAC AGATCATGAAATACCACCA CATGAAATACCACCATGGG ATGAAATACCACCATGGGG TGAAATACCACCATGGGGA ATGGGGATTCAATCAGCAA TGGGGATTCAATCAGCAAA GGGGATTCAATCAGCAAAT GGGGATTCAATCAGCAAAG GGGATTCAATCAGCAAATT GGGATTCAATCAGCAAAGT AGATTCAATCAGCAAATTC AGCAAATTCTGAAATGCAT AGCAAATTCTGAAATGCAA GTGAAATGCAACATTGCCA TGAAATGCAACATTGCCAT GAAATGCAACATTGCCATT CATTGCCATTTACCCTGCT ATTGCCATTTACCCTGCTT TTGCCATTTACCCTGCTTG TCTGAGGAAGAATTTGAGA TGAGGAAGAATTTGAGATG GAGGAAGAATTTGAGATGA AGGAAGAATTTGAGATGAG GGAAGAATTTGAGATGAGG GACTAAGGAAGATCATGAA ACTAAGGAAGATCATGAAA CTAAGGAAGATCATGAAAT ACTCCCAAGCTCTAGGATA CTCCCAAGCTCTAGGATAT TCCCAAGCTCTAGGATATA AGAATTTGAGATGAGGGGA AATTTGAGATGAGGGGACG ATTTGAGATGAGGGGACGG TTTGAGATGAGGGGACGGA GAGGGGACGGATTTGCTGC AGGGGACGGATTTGCTGCC GGGGACGGATTTGCTGCCT GAGATTCAATCAGCAAATT CAGCCAATTCTGAAATGCA AGCCAATTCTGAAATGCAA GCCAATTCTGAAATGCAAC CCAATTCTGAAATGCAACA CAATTCTGAAATGCAACAT AATTCTGAAATGCAACATT CATTATCCTTCACCCCGCT ATTATCCTTCACCCCGCTT TTATCCTTCACCCCGCTTG TATCCTTCACCCCGCTTGG ATCCTTCACCCCGCTTGGC TCCTTCACCCCGCTTGGCC TGCCATTTACCCTGCTTGG GCCATTTACCCTGCTTGGC ATTTACCCTGCTTGGCCTA TTTACCCTGCTTGGCCTAA CCCCTGCTTGGCCTAAAAG CCCTGCTTGGCCTAAAAGT CCTAAAAGTTCAAAATAAC CTAAAAGTTCAAAATAACA TACCAGAGCCTGTTATATT ACCAGAGCCTGTTATATTT CCAGAGCCTGTTATATTTT CATGAAATACCACCATGGT ATGAAATACCACCATGGTG TGAAATACCACCATGGTGA NGGATTCAATCAGCAAATT ATTCAATCAGCAAATTCTG TTCAATCAGCAAATTCTGA TCAATCAGCAAATTCTGAA CAATCAGCAAATTCTGAAA AATCAGCAAATTCTGAAAT TTTGCTGCCTCTGAGGAGG TTGCTGCCTCTGAGGAGGG TGCTGCCTCTGAGGAGGGC GAGGAGGGCATTAGAATAG AGGAGGGCATTAGAATAGA GGAGGGCATTAGAATAGAA ACTCCAGGAAAAAGTCAGC CTCCAGGAAAAAGTCAGCT TCCAGGAAAAAGTCAGCTG GCAAAGTCTGAAATGCAAC CAAAGTCTGAAATGCAACA ACATTATCCTTCACCCTGC CATTATCCTTCACCCTGCT ATTATCCTTCACCCTGCTT TATCCTTCACCCTGCTTGG ATCCTTCACCCTGCTTGGC TCCTTCACCCTGCTTGGCC GGCCTAAAAGTACAAAAAA GCCTAAAAGTACAAAAAAA CCTAAAAGTACAAAAAAAC ATTCTGAAATGCATCATTA TTCTGAAATGCATCATTAT TCTGAAATGCATCATTATC CTGAAATGCATCATTATCC TGAAATGCATCATTATCCT CTTCCCCCTGCTTGGCCTA CCCCCTGCTTGGCCTAAAA CCTGCTTGGTCTAAAAGTA CTGCTTGGTCTAAAAGTAC TGCTTGGTCTAAAAGTACA AAAAGTACAAAATAACACG AAAGTACAAAATAACACGA AAGTACAAAATAACACGAA TACAAAATAACACGAAGAA ACAAAATAACACGAAGAAA CAAAATAACACGAAGAAAA ACACGAAGAAAAATTAGTT CACGAAGAAAAATTAGTTT ACGAAGAAAAATTAGTTTC AGAAAAATTAGTTTCCAGA GAAAAATTAGTTTCCAGAG AAAAATTAGTTTCCAGAGC CCCAAGCTCTAGGACATAC CCAAGCTCTAGGACATACC ACATACCAAGGACAAAGGA CATACCAAGGACAAAGGAA ATACCACCATGGTGATTCA TACCACCATGGTGATTCAA ACCACCATGGTGATTCAAT CCACCATGGTGATTCAATC GGTCTAAAAGTACAAAATA GTCTAAAAGTACAAAATAA TCTAAAAGTACAAAATAAC CCATGGGGATTCGATCAGC CATGGGGATTCGATCAGCA ATGGGGATTCGATCAGCAA ACAAAAAAACACGAAGAAC CAAAAAAACACGAAGAACC ACGAAGAACCATTAGTTAC CCAGAGCCAGTTATATTTT CAGAGCCAGTTATATTTTG AGAGCCAGTTATATTTTGA TACCAAGGACAAAGGAAGA ACCAAGGACAAAGGAAGAT CCAAGGACAAAGGAAGATC CCTGCTTGACTTAAAAGTA CTGCTTGACTTAAAAGTAC TGCTTGACTTAAAAGTACA GCTCTAGGACATACCAAGG CTCTAGGACATACCAAGGA TCTAGGACATACCAAGGAC AATCAGCAAAGTCTGAAAT ATCAGCAAAGTCTGAAATG TCAGCAAAGTCTGAAATGC GGAAGATCATGAAATCCCA GAAGATCATGAAATCCCAC AAGATCATGAAATCCCACC NNNNNNNNNNNNTTTCTGA NNNNNNNNNNNTTTCTGAA NNNNNNNNNNTTTCTGAAT CCAGAGCCAGTTATACTTT CAGAGCCAGTTATACTTTG AGAGCCAGTTATACTTTGA ATGAAATCCCACCATGGGG TGAAATCCCACCATGGGGA GAAATCCCACCATGGGGAT AATCAGCCAATTCTGAAAT ATCAGCCAATTCTGAAATG TCAGCCAATTCTGAAATGC GATTCAATCAGCAAATTCT CCAGGAAAAAGTCAGCTGT CAGGAAAAAGTCAGCTGTG AGGAAAAAGTCAGCTGTGT AATAACACGAAGAAAAATT ATAACACGAAGAAAAATTA TAACACGAAGAAAAATTAG GCCAGTTATATTGTTAAAA CCAGTTATATTGTTAAAAA CAGTTATATTGTTAAAAAT AGTTATATTGTTAAAAATC TAAAAATCACCCAAAAACC AAAAATCACCCAAAAACCA AATCAACGATAGAATATAC ATCAACGATAGAATATACA TCAACGATAGAATATACAG GCCAGTTATATTTTGAAAA CCAGTTATATTTTGAAAAA GCCTAAAAGGACAAAACAA CCTAAAAGGACAAAACAAC CTAAAAGGACAAAACAACA AAAATAACACGAGGAAAAA AAATAACACGAGGAAAAAT AATAACACGAGGAAAAATT GCTTGACTTAAAAGTACAA AACTCCCAAGCTCTAGGAC ACTCCCAAGCTCTAGGACA CTCCCAAGCTCTAGGACAT GAAGAACCATTAGTTACCA AAGAACCATTAGTTACCAG AGAACCATTAGTTACCAGA TCCCAAGCTCTAGGACATA ATCCTTCCCCCTGCTTGGC TCCTTCCCCCTGCTTGGCC CCTTCCCCCTGCTTGGCCT ATCACCCAAAAACCAAGAA TCACCCAAAAACCAAGAAT CACCCAAAAACCAAGAATC AAGGACAAAGGAAGATCAT AGGACAAAGGAAGATCATG GGACAAAGGAAGATCATGA AAAAACACGAAGAACCATT AAAACACGAAGAACCATTA AAACACGAAGAACCATTAG GAGCCAGTTATATTTTGAA AGCCAGTTATATTTTGAAA TTAGTTTCCACAGCCTGTT TAGTTTCCACAGCCTGTTA AGTTTCCACAGCCTGTTAT CCATCGGAATCCACTCAGC CATCGGAATCCACTCAGCA ATCGGAATCCACTCAGCAA ATACCAAGGACAAAGGAAG GGGACGGATTTGCTGCCTC GGACGGATTTGCTGCCTCT CAAAGCTAATCAAGGCTCC AAAGCTAATCAAGGCTCCC AAGCTAATCAAGGCTCCCA ATTAGTTTCCAGAGCCAGT TTAGTTTCCAGAGCCAGTT TAGTTTCCAGAGCCAGTTA TACCTCCCAAGCTCTAGGA ACCTCCCAAGCTCTAGGAT CCTCCCAAGCTCTAGGATA TGCAACATTGCCATTTACC GCAACATTGCCATTTACCC CAACATTGCCATTTACCCT AAATCCCACCATGGGGATT AATCCCACCATGGGGATTC ATCCCACCATGGGGATTCA CAGCAAATTCTGAAATGCN AGCAAATTCTGAAATGCNN GCAAATTCTGAAATGCNNN AAATGCNNNNNNNNNNNNN AATGCNNNNNNNNNNNNNN ATGCNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNANAT NNNNNNNNNNNNNNANATT NNNNNNNNNNNNNANATTN ATCAAGGCTCCCACTCTAC TCAAGGCTCCCACTCTACC CAAGGCTCCCACTCTACCT TTATCCTTCACCCTGCTTG TATCCTTCACCCTGCTTGA CCAAGCTCTAGGATATACC CAAGCTCTAGGATATACCA AAGCTCTAGGATATACCAA AGAATCAACGATAGAATAT GAATCAACGATAGAATATA CTCAGCAAATTCTGAAATG TCAGCAAATTCTGAAATGC CAGCAAATTCTGAAATGCA ATATTGTTAAAAATCACCC TATTGTTAAAAATCACCCA ATTGTTAAAAATCACCCAA GATCATGAAATCCCACCAT ATCATGAAATCCCACCATG TCATGAAATCCCACCATGG TTCCACAGCCTGTTATATT TCCACAGCCTGTTATATTT CCACAGCCTGTTATATTTT GAAATACCACCATGGTGAT AAATACCACCATGGTGATT AATACCACCATGGTGATTC CCATGNGGATTCAATCAGC CATGNGGATTCAATCAGCA ATGNGGATTCAATCAGCAA GCCTGTTATATTTTGAAAA CCTGTTATATTTTGAAAAC CCTGTTATATTTTGAAAAA TAGACCAAGGACAAAGGAA AGACCAAGGACAAAGGAAG GACCAAGGACAAAGGAAGA TGCCTCTGAGGAGGGCATT GCCTCTGAGGAGGGCATTA CCTCTGAGGAGGGCATTAG ATCTAATCAAGGCTCCCAC TCTAATCAAGGCTCCCACT CTAATCAAGGCTCCCACTC TATACCAAGGACAAAGGAA TCACCCTGCTTGGCCTAAA CACCCTGCTTGGCCTAAAA ACCCTGCTTGGCCTAAAAG CAATCTGAGGAAGAATTTG AATCTGAGGAAGAATTTGA ATCTGAGGAAGAATTTGAG TCATTATCCTTCCCCCTGC CATTATCCTTCCCCCTGCT ATTATCCTTCCCCCTGCTT CTGCCTCTGAGGAGGGCAT CNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNT NNNNNNNNNNNNNNNNNNA AACGATAGAATATACAGTA ACGATAGAATATACAGTAC CGATAGAATATACAGTACA AATGCATCATTATCCTTCC ATGCATCATTATCCTTCCC TGCATCATTATCCTTCCCC AGTACAAAATAACACGAAG GTACAAAATAACACGAAGA TTACCAGAGCCTGTTATAT GTACATTCCTTCCCCGGAA TACATTCCTTCCCCGGAAG ACATTCCTTCCCCGGAAGC GTTTCCAGAGCCTGTTATA TTTCCAGAGCCTGTTATAT TTCCAGAGCCTGTTATATT AACTCCAGGAAAAAGTCAG TCGATCAGCAAATTCTGAA CGATCAGCAAATTCTGAAA GATCAGCAAATTCTGAAAT AGCTTCCACAGTTGCATCA GCTTCCACAGTTGCATCAG CTTCCACAGTTGCATCAGC GCATCATTATCCTTCCCCC CATCATTATCCTTCCCCCT ATCATTATCCTTCCCCCTG GATAGAATATACAGTACAT AAAATAACACGAAGAAAAA AAATAACACGAAGAAAAAT AGTTTCCAGAGCCTGTTAT AATTAGTTTCCAGAGCCAG TTATCCTTCCCCCTGCTTG TATCCTTCCCCCTGCTTGG GATAGACCAAGGACAAAGG ATAGACCAAGGACAAAGGA CAACGATAGAATATACAGT CCACCATCGGAATCCACTC CACCATCGGAATCCACTCA ACCATCGGAATCCACTCAG CTAGGACATACCAAGGACA TAGGACATACCAAGGACAA AGGACATACCAAGGACAAA AAAAAAACACGAAGAACCA AAAAAACACGAAGAACCAT CAAATTCTGAAATGCAACA AAATTCTGAAATGCAACAT GCTGCCTCTGAGGAGGGCA CCCCGCTTGGCCTAAAAGT CCCGCTTGGCCTAAAAGTA CCGCTTGGCCTAAAAGTAC GAATATACAGTACATTCCT AATATACAGTACATTCCTT ATATACAGTACATTCCTTC TGGGGATTCGATCAGCAAA GGGGATTCGATCAGCAAAT GGGATTCGATCAGCAAATT AACATTATCCTTCACCCTG CACTCTAACTCCCAAGCTC ACTCTAACTCCCAAGCTCT CTCTAACTCCCAAGCTCTA CTAAAAGTACAAAAAAACA CACCATGGTGATTCAATCA ACCATGGTGATTCAATCAG TAAAAGTACAAAAAAACAC TGGCCTAAAAGGACAAAAC GGCCTAAAAGGACAAAACA AATACCACCATGGGGATTC ATACCACCATGGGGATTCA TACCACCATGGGGATTCAA TAACTCCCAAGCTCTAGGA AACTCCCAAGCTCTAGGAT NATTAGTTTCCAGAGCCTG ATTAGTTTCCAGAGCCTGT TTAGTTTCCAGAGCCTGTT TCTGAAATGCAACATTATC CTGAAATGCAACATTATCC TGAAATGCAACATTATCCT TAGAATAACTCCAGGAAAA AGAATAACTCCAGGAAAAA GAATAACTCCAGGAAAAAG TTCACCCTGCTTGGCCTAA TAGTTACCAGAGCCTGTTA AGTTACCAGAGCCTGTTAT GTTACCAGAGCCTGTTATA TCCCACCATGGGGATTCAA CCCACCATGGGGATTCAAT CCACCATGGGGATTCAATC GAAGCTTCCACAGTTGCAT AAGCTTCCACAGTTGCATC ATTCTGAAATGCAACATTA TTCTGAAATGCAACATTAT AACATTGCCATTTACCCTG ACATTGCCATTTACCCTGC CGAAGAAAAATTAGTTTCC GAAGAAAAATTAGTTTCCA GATTCAATCAGCAAAGTCT ATTCAATCAGCAAAGTCTG TTCAATCAGCAAAGTCTGA GCCAGTTATACTTTGAAAA CCAGTTATACTTTGAAAAA ACCATGNGGATTCAATCAG CAAGAATCAACGATAGAAT AAGAATCAACGATAGAATA CACAGCCTGTTATATTTTG ACAGCCTGTTATATTTTGA CAGCCTGTTATATTTTGAA GGATTCGATCAGCAAATTC GATTCGATCAGCAAATTCT GAGGGCATTAGAATAGAAT AGGGCATTAGAATAGAATA GGGCATTAGAATAGAATAA CGAGGAAAAATTAGTTTCC GAGGAAAAATTAGTTTCCA AGGAAAAATTAGTTTCCAG CCATGGTGATTCAATCAGC AAATTCTGAAATGCNNNNN AATTCTGAAATGCNNNNNN ATTCTGAAATGCNNNNNNN ATTAGTTTCCACAGCCTGT TCGAGGAAAAATTAGTTTC AACACGAGGAAAAATTAGT ACACGAGGAAAAATTAGTT CACGAGGAAAAATTAGTTT TAGGATATACCAAGGACTA AGGATATACCAAGGACTAA GGATATACCAAGGACTAAG GTTTCCAGAGCCAGTTATA TTTCCAGAGCCAGTTATAT TTTCCAGAGCCAGTTATAC TTCCAGAGCCAGTTATATT TTCCAGAGCCAGTTATACT TGNGGATTCAATCAGCAAA GNGGATTCAATCAGCAAAT TCGGAATCCACTCAGCAAA CGGAATCCACTCAGCAAAT GGAATCCACTCAGCAAATT CTTCACCCCGCTTGGCCTA TTCACCCCGCTTGGCCTAA TCACCCCGCTTGGCCTAAA CTCGAGGAAAAATTAGTTT CCTTCACCCCGCTTGGCCT GTCTGAAATGCAACATTAT NNNNNNANATTNTNANAAA CGAAGAACCATTAGTTACC ATTCGATCAGCAAATTCTG TTCGATCAGCAAATTCTGA AGCCAATCTGAGGAAGAAT GCCAATCTGAGGAAGAATT CCAATCTGAGGAAGAATTT CACCCTGCTTGACTTAAAA ACCCTGCTTGACTTAAAAG CCCTGCTTGACTTAAAAGT AAAGTACAAAAAAACACGA AAGTACAAAAAAACACGAA AGTACAAAAAAACACGAAG CGGATTTGCTGCCTCTGAG GGATTTGCTGCCTCTGAGG GATTTGCTGCCTCTGAGGA CAATCAGCAAAGTCTGAAA ATGCAACATTATCCTTCAC TGCAACATTATCCTTCACC TGCAACATTATCCTTCACA GCAACATTATCCTTCACCC GCAACATTATCCTTCACAC TCCAGAGCCAGTTATATTT AAAATCACCCAAAAACCAA AAATCACCCAAAAACCAAG AATCACCCAAAAACCAAGA AAGTACAAAATAACACGAG AGTACAAAATAACACGAGG GTACAAAATAACACGAGGA AGTTTCCAGAGCCAGTTAT AGGCTCCCACTCTACCTCC GGCTCCCACTCTACCTCCC GCTCCCACTCTACCTCCCA CAAATTCTGAAATGCNNNN TGTTAAAAATCACCCAAAA GTTAAAAATCACCCAAAAA TTAAAAATCACCCAAAAAC TTCCTTCCCCGGAAGCTTC TCCTTCCCCGGAAGCTTCC CCTTCCCCGGAAGCTTCCA TATACCAAGGACTAAGGAA ATACCAAGGACTAAGGAAG TACCAAGGACTAAGGAAGA ATGAGGGGACGGATTTGCT TGAGGGGACGGATTTGCTG TGCNNNNNNNNNNNNNNNN GCNNNNNNNNNNNNNNNNN CCTTCACACTGCTTGGCCT CTTCACACTGCTTGGCCTA TTCACACTGCTTGGCCTAA CCACTCTAACTCCCAAGCT CCTGCTTGGCCTAAAAGTA CCGGAAGCTTCCACAGTTG CGGAAGCTTCCACAGTTGC GGAAGCTTCCACAGTTGCA CAAGGCTCCCACTCTAACT AAGGCTCCCACTCTAACTC AGGCTCCCACTCTAACTCC CCCAAGCTCTAGGATATAC AAATTCTGAAATGCATCAT AATTCTGAAATGCATCATT GTTTCCACAGCCTGTTATA TTTCCACAGCCTGTTATAT AAGATCATGAAATACCACC AACCATTAGTTACCAGAGC ACCATTAGTTACCAGAGCC CCATTAGTTACCAGAGCCT CAAGGACAAAGGAAGATCA CTTGACTTAAAAGTACAAA AGTTCAAAATAACACGAGG GTTCAAAATAACACGAGGA TTCAAAATAACACGAGGAA ATAACACGAGGAAAAATTA TTCCACAGTTGCATCAGCG TCCACAGTTGCATCAGCGT CCACAGTTGCATCAGCGTA CTCCCACTCTACCTCCCAA TCCCACTCTACCTCCCAAG CCCACTCTACCTCCCAAGC CAAGGACTAAGGAAGATCA AAGGACTAAGGAAGATCAT AGGACTAAGGAAGATCATG TGGTGATTCAATCAGCAAA GGTGATTCAATCAGCAAAT GTGATTCAATCAGCAAATT AGCCTGTTATATTTTGAAA AGATGAGGGGACGGATTTG GATGAGGGGACGGATTTGC GGAAAAATTAGTTTCCAGA TATACAGTACATTCCTTCC ATACAGTACATTCCTTCCC NNNNNNNNNNNNNNNNANA AATAACTCCAGGAAAAAGT ATAACTCCAGGAAAAAGTC TAACTCCAGGAAAAAGTCA AGCTAATCAAGGCTCCCAC GCTAATCAAGGCTCCCACT CTCTAGGATATACCAAGGA TCTAGGATATACCAAGGAC CTAGGATATACCAAGGACA CTAGGATATACCAAGGACT GGACTAAGGAAGATCATGA CAAAGGAAGATCATGAAAT AAAGGAAGATCATGAAATC AAGGAAGATCATGAAATCC GAGCCAGTTATACTTTGAA CAACATTATCCTTCACCCC CAACATTATCCTTCACCCT AACATTATCCTTCACCCCG CAGAGCCTGTTATATTTTG AGAGCCTGTTATATTTTGA TTGAGATGAGGGGACGGAT CAGTACATTCCTTCCCCGG AGTACATTCCTTCCCCGGA AGCTCTAGGATATACCAAG GCTCTAGGATATACCAAGG GCTTGGTCTAAAAGTACAA CTTGGTCTAAAAGTACAAA TTGGTCTAAAAGTACAAAA TTGTTAAAAATCACCCAAA TAGGATATACCAAGGACAA AGGATATACCAAGGACAAA GGATATACCAAGGACAAAG CCCAAAAACCAAGAATCAA CCAAAAACCAAGAATCAAC CAAAAACCAAGAATCAACG AATACCACCATCGGAATCC ATACCACCATCGGAATCCA TACCACCATCGGAATCCAC TTACCCTGCTTGGCCTAAA TACCCTGCTTGGCCTAAAA TTCACCCTGCTTGACTTAA TCACCCTGCTTGACTTAAA GACATACCAAGGACAAAGG ATCCTTCACCCTGCTTGAC TCCTTCACCCTGCTTGACT TCCAGAGCCAGTTATACTT ACCACCATCGGAATCCACT CACTCAGCAAATTCTGAAA ACTCAGCAAATTCTGAAAT TTAGAATAGAATAACTCCA TAGAATAGAATAACTCCAG AGAATAGAATAACTCCAGG TGGTCTAAAAGTACAAAAT GGAAAAAGTCAGCTGTGTT GAAAAAGTCAGCTGTGTTG TGCTTGGCCTAAAAGTACA GCTTGGCCTAAAAGTACAA CTTGGCCTAAAAGTACAAA GGATTCAATCAGCAAATTC NNNNNNNNNNNNNNNNNAN CCCCGGAAGCTTCCACAGT CCCGGAAGCTTCCACAGTT TATCCTTCACACTGCTTGG ATCCTTCACACTGCTTGGC TCCTTCACACTGCTTGGCC GAAATACCACCATGGGGAT AAATACCACCATGGGGATT AGATCATGAAATCCCACCA GAATAGAATAACTCCAGGA AATAGAATAACTCCAGGAA CAATCAGCCAATTCTGAAA GAAGAATTTGAGATGAGGG AAGAATTTGAGATGAGGGG TAACACGAGGAAAAATTAG AAATGCAACATTGCCATTT AATGCAACATTGCCATTTA ATGCAACATTGCCATTTAC TCACACTGCTTGGCCTAAA AAAAACCAAGAATCAACGA AAAACCAAGAATCAACGAT AAACCAAGAATCAACGATA TAGTTTCCAGAGCCTGTTA AGTCTGAAATGCAACATTA AACACGAAGAACCATTAGT ACACGAAGAACCATTAGTT CACGAAGAACCATTAGTTA ACAAAATAACACGAGGAAA CAAAATAACACGAGGAAAA NNNNNNNNNNNNNTTTCTG ATTAGAATAGAATAACTCC CCTTCACCCTGCTTGGCCT TCTAACTCCCAAGCTCTAG ATCCACTCAGCAAATTCTG TCCACTCAGCAAATTCTGA CCACTCAGCAAATTCTGAA CAACATTATCCTTCACACT AACATTATCCTTCACACTG NNNNNNNNNNNNNNNNNTT GAAATGCATCATTATCCTT AAATGCATCATTATCCTTC NNNNNNTTTCTGAATGTTT CTTCCCCGGAAGCTTCCAC TTCCCCGGAAGCTTCCACA TCCCCGGAAGCTTCCACAG GGATAGACCAAGGACAAAG CACCCCGCTTGGCCTAAAA ACCCCGCTTGGCCTAAAAG AAGAAAAATTAGTTTCCAG CATTCCTTCCCCGGAAGCT ATTCCTTCCCCGGAAGCTT CAGCAAAGTCTGAAATGCA AGCAAAGTCTGAAATGCAA CTTCACCCTGCTTGGCCTA CACTGCTTGGCCTAAAAGG ACTGCTTGGCCTAAAAGGA CTGCTTGGCCTAAAAGGAC ATCAGCAAATTCTGAAATG GAATTTGAGATGAGGGGAC GCTCCCACTCTAACTCCCA CTCCCACTCTAACTCCCAA TCCCACTCTAACTCCCAAG GAGCCTGTTATATTTTGAA AAAAAGTCAGCTGTGTTGA AAAAGTCAGCTGTGTTGAT GATATACCAAGGACTAAGG ATATACCAAGGACTAAGGA ATAGAATAACTCCAGGAAA NNNNNNNNNNNNNNTTTCT AATCAAGGCTCCCACTCTA ATCAAGGCTCCCACTCTAA TCAAGGCTCCCACTCTAAC CAAATACCACCATCGGAAT AAATACCACCATCGGAATC GCCTAAAAGTACAAAATAA CCTAAAAGTACAAAATAAC CTAAAAGTACAAAATAACA TTGGCCTAAAAGTACAAAA TGGCCTAAAAGTACAAAAT TGGCCTAAAAGTACAAAAA ATCATGAAATACCACCATG TCATGAAATACCACCATGG ATTTGCTGCCTCTGAGGAG GATATACCAAGGACAAAGG CTGCTTGGCCTAAAAGTAC TACAGTACATTCCTTCCCC CCTTCACCCTGCTTGACTT TCAATCAGCAAAGTCTGAA CTGAGGAAGAATTTGAGAT CGCTTGGCCTAAAAGTACA GCAAATTCTGAAATGCATC GCAAATTCTGAAATGCAAC CTCTGAGGAGGGCATTAGA TCTGAGGAGGGCATTAGAA CCACCATGGGGATTCGATC CACCATGGGGATTCGATCA ACCATGGGGATTCGATCAG NNNNNNNNNNNNNNNNTTT NNNNNNNNNNNNNNNTTTC TACAAAATAACACGAGGAA ACGAGGAAAAATTAGTTTC TGATTCAATCAGCAAATTC AACCAAGAATCAACGATAG ACCAAGAATCAACGATAGA CCAAGAATCAACGATAGAA AGCTCTAGGACATACCAAG GAACCATTAGTTACCAGAG TAATCAAGGCTCCCACTCT AAAATTAGTTTCCAGAGCC AAATTAGTTTCCAGAGCCT AAATTAGTTTCCAGAGCCA AATTAGTTTCCAGAGCCTG CTTCACCCTGCTTGACTTA NNNNNNNNNNNNANATTNT NNNNNNNNNNNANATTNTN GGAGATTCAATCAGCAAAT AGGAAGATCATGAAATCCC ACAGTACATTCCTTCCCCG NNNNNNNNNTTTCTGAATG NNNNNNNNTTTCTGAATGT NNNNNNNTTTCTGAATGTT CAAATTCTGAAATGCATCA AGCCAGTTATACTTTGAAA TCAAAATAACACGAGGAAA TTCCCCCTGCTTGGCCTAA TCCCCCTGCTTGGCCTAAA GTTATATTGTTAAAAATCA TTATATTGTTAAAAATCAC TATATTGTTAAAAATCACC CCATTTACCCTGCTTGGCC CATTTACCCTGCTTGGCCT CATGGTGATTCAATCAGCA ATGGTGATTCAATCAGCAA ACCCAAAAACCAAGAATCA CACCATGGGGATTCAATCA ACCATGGGGATTCAATCAG CCATGGGGATTCAATCAGC AAGTTCAAAATAACACGAG TGCTTGGCCTAAAAGGACA CAAGCTCTAGGACATACCA AAGCTCTAGGACATACCAA GGCATTAGAATAGAATAAC GCATTAGAATAGAATAACT GATCATGAAATACCACCAT CATGGGGATTCAATCAGCA CCTCCCACTCTAACTCCCA CATTATCCTTCACACTGCT ATTATCCTTCACACTGCTT TTATCCTTCACACTGCTTG GGATTCAATCAGCAAAGTC AAGTCTGAAATGCAACATT TGAGATGAGGGGACGGATT GAGATGAGGGGACGGATTT AACACGAAGAAAAATTAGT GACAAAGGAAGATCATGAA ACAAAGGAAGATCATGAAA ACATTATCCTTCACACTGC CCCACTCTAACTCCCAAGC ATAGAATATACAGTACATT TAGAATATACAGTACATTC AGAATATACAGTACATTCC ACCAAGGACTAAGGAAGAT CCAAGGACTAAGGAAGATC TACAAAAAAACACGAAGAA CTGAGGAGGGCATTAGAAT TGAGGAGGGCATTAGAATA CATGAAATCCCACCATGGG AAAAGTTCAAAATAACACG AAAGTTCAAAATAACACGA ACCACCATGGGGATTCAAT GGACATACCAAGGACAAAG TAAGGAAGATCATGAAATA TAAAAGTACAAAATAACAC GAAATGCAACATTATCCTT AAATGCAACATTATCCTTC AATGCAACATTATCCTTCA GAATCCACTCAGCAAATTC AATCCACTCAGCAAATTCT AAGGCTCCCACTCTACCTC ACGGATTTGCTGCCTCTGA AAAAGTACAAAAAAACACG CACAGTTGCATCAGCGTAG TCTGAAATGCNNNNNNNNN CTGAAATGCNNNNNNNNNN TGAAATGCNNNNNNNNNNN GAAATGCNNNNNNNNNNNN NNNNNNNNNNANATTNTNA NNNNNNNNNANATTNTNAN GCTTGGCCTAAAAGGACAA CTTGGCCTAAAAGGACAAA ATTAGTTACCAGAGCCTGT TTAGTTACCAGAGCCTGTT GGCTCCCACTCTAACTCCC GACGGATTTGCTGCCTCTG CACACTGCTTGGCCTAAAA ACACTGCTTGGCCTAAAAG CCCAAAGCTAATCAAGGCT CCAAAGCTAATCAAGGCTC GAGCCAATCTGAGGAAGAA CCACTCTACCTCCCAAGCT CCCTGCTTGGTCTAAAAGT CATTAGAATAGAATAACTC ATATACCAAGGACAAAGGA TTCTGAAATGCNNNNNNNN TCCAGAGCCTGTTATATTT AAAGTCTGAAATGCAACAT TAAAAGGACAAAACAACAG GTACAAAAAAACACGAAGA ACAGTTGCATCAGCGTAGA GGCCTAAAAGTACAAAATA TTGGCCTAAAAGGACAAAA CACTCTACCTCCCAAGCTC ACCCTGCTTGGTCTAAAAG TAAAAGTTCAAAATAACAC ACTCGAGGAAAAATTAGTT NNNNNNNNANATTNTNANA NNNNNNNANATTNTNANAA AGCCAGTTATATTGTTAAA CATTAGTTACCAGAGCCTG ACATTATCCTTCACCCCGC ACCCAAAGCTAATCAAGGC
  • 26. This graph comes from… GTTCTAGAAAGTTCTTTGC TAGAAAGTTCTTTGCCCTA AGAAAGTTCTTTGCCCTAA GAAAGTTCTTTGCCCTAAA CAGTGAAAATTTGTGCCTA AGTGAAAATTTGTGCCTAC CACGGACGGCCCGCCAGTC ACGGACGGCCCGCCAGTCA GATAGACTCAAGGGACAAA AAATGTGTAATTTCATGAG AATGTGTAATTTCATGAGT ATGTGTAATTTCATGAGTG CAGAAAACTAAGAATCAAG AGAAAACTAAGAATCAAGG GAAAACTAAGAATCAAGGA ACTAAGAATCAAGGATAGA CTAAGAATCAAGGATAGAA TAAGAATCAAGGATAGAAT TAAGAATCAAGGATAGAAG CTCCCCCTAAAGCTTTCAC CTGAGAATCAAGGATAGAA TGAGAATCAAGGATAGAAT TCAAGGATAGAATTTCTAG CAAGGATAGAATTTCTAGA AAGGATAGAATTTCTAGAA CTAAAGCTTTCACAGTTGA TAAAGCTTTCACAGTTGAC TGTGTAATTTCATGAGTGG AAGCTTTCACACTTGCCTC AGCTTTCACACTTGCCTCG AAAGTTCATTCCCCTAAAG AGAATTTCTAGAAAGTTCC TTCTAGAAAGTCCCTCCCC CTCAGTGTATATATGTGGG CTCAGTGTATATATGTGGC GAGTGGGGTCTCCAGTCAT CGGACGGCCCGCCAGTCAT AGTGGGGTCTCCAGTCATT GTGGGGTCTCCAGTCATTA ACATCAGAAAACTGAGAAT CATCAGAAAACTGAGAATC ATCAGAAAACTGAGAATCA AAAGTCCCTCCCCCTAAGG AAGTCCCTCCCCCTAAGGC AGTCCCTCCCCCTAAGGCT CCCGCTGACAGGCCCCCAG CCGCTGACAGGCCCCCAGT CGCTGACAGGCCCCCAGTC GTCTCCAGTCATTAAATTC TCTCCAGTCATTAAATTCA TCATTAAATTCAAGCCCCA CATTAAATTCAAGCCCCAA GCCTAGGAGAAAGCAACAT CCTAGGAGAAAGCAACATG CTAGGAGAAAGCAACATGA ATTCAAGCTCCAAGAAACA TTCAAGCTCCAAGAAACAA TCAAGCTCCAAGAAACAAA GCCTAGGAGATAGCAACAT CCTAGGAGATAGCAACATG CTAGGAGATAGCAACATGA GTGGCTATCCCCCTGAGGG TGGCTATCCCCCTGAGGGG GGCTATCCCCCTGAGGGGC AATTGTAAGAACTGCCCTC ATTGTAAGAACTGCCCTCC TTGTAAGAACTGCCCTCCC GTGTATATTGGTGGCTATC TGTATATTGGTGGCTATCC GTATATTGGTGGCTATCCC GTAATTGTAAGAACTGCCC TAATTGTAAGAACTGCCCT CCCCGTAAAGCTTTCACAC CCCGTAAAGCTTTCACACT CCGTAAAGCTTTCACACTT ACTCCCGGGCCGCCAGTCA CTCCCGGGCCGCCAGTCAT TCCCGGGCCGCCAGTCATT GCCTCAGTGTATATATGAG CCTCAGTGTATATATGAGG CTCAGTGTATATATGAGGC ACTCATCAGAAAACTGAGA CTCATCAGAAAACTGAGAA TCATCAGAAAACTGAGAAT GTCTTTACTGGTGCTCTTC TCTTTACTGGTGCTCTTCC CTTTACTGGTGCTCTTCCC TCCCCCTGACGGCCCGCCA CCCCCTGACGGCCCGCCAG CCCCTGACGGCCCGCCAGT TTTACTGGTGCTCTTCCCA TTACTGGTGCTCTTCCCAC TACTGGTGCTCTTCCCACT GAAAAATCATCAGAAAACT AAAAATCATCAGAAAACTG AAAAATCATCAGAAAACTA AAAATCATCAGAAAACTGA AAAATCATCAGAAAACTAA AGACAAACCCTTGAAAAAA GACAAACCCTTGAAAAAAA ACAAACCCTTGAAAAAAAG CTACCCCACTCCCGGGCCG TACCCCACTCCCGGGCCGC ACCCCACTCCCGGGCCGCC NNTCAGAAAACTGAGAATC NTCAGAAAACTGAGAATCA TCAGAAAACTGAGAATCAA AGTTATACTTTGAAAAATC GTTATACTTTGAAAAATCA TTATACTTTGAAAAATCAT ACACTTGCCTCAGTGTAAA CACTTGCCTCAGTGTAAAT ACTTGCCTCAGTGTAAATA CCCCCAGTCATAAAATTCA CCCCAGTCATAAAATTCAA CCCAGTCATAAAATTCAAG TATCCCACTGACAGGCCGC ATCCCACTGACAGGCCGCC TCCCACTGACAGGCCGCCA GAAAGTTCCTCCCCCTAAA AAAGTTCCTCCCCCTAAAG AAGTTCCTCCCCCTAAAGC TGACAGGCCCCCAGTCATT GACAGGCCCCCAGTCATTA ACAGGCCCCCAGTCATTAA GGCATTAAATTCAAGCTCC GCATTAAATTCAAGCTCCA CATTAAATTCAAGCTCCAA AGCTTTCACTCTTGCCTCA GCTTTCACTCTTGCCTCAG CTTTCACTCTTGCCTCAGT AAAAGCCAGCCTAGGAGAA AAAGCCAGCCTAGGAGAAA AAGCCAGCCTAGGAGAAAG AAGGGACAAAGCAGTAAAA AGGGACAAAGCAGTAAAAT GGGACAAAGCAGTAAAATG ACTCTTGCCTCAGTGTATA CTCTTGCCTCAGTGTATAT TCTTGCCTCAGTGTATATA CCTCGGAGAAAGCAACATG CTCGGAGAAAGCAACATGA TCGGAGAAAGCAACATGAT GCTTTCACACTTGCCTCAG CTTTCACACTTGCCTCAGT TTTCACACTTGCCTCAGTG CCGGACCCCCAGTCATAAA CGGACCCCCAGTCATAAAA GGACCCCCAGTCATAAAAT TTTGCCCTAAAGATTTCAC TTGCCCTAAAGATTTCACA TGCCCTAAAGATTTCACAC CAAGGGACAAAGCAGTAAA GCCAGTTATATTTTGAAAA CCAGTTATATTTTGAAAAA CAGTTATATTTTGAAAAAT ATACTTTGAAAAATCATCA TACTTTGAAAAATCATCAG ACTTTGAAAAATCATCAGA AAAATGTGTAATTTCATGA CCACTCCCGGGCCGCCAGT CACTCCCGGGCCGCCAGTC TAGAAAGTTCCTTCCCCTA AGAAAGTTCCTTCCCCTAA GAAAGTTCCTTCCCCTAAA AGGCTATACCACTGACGGG GGCTATACCACTGACGGGC GCTATACCACTGACGGGCC ATGCAAGCTCCAAGAGACA TGCAAGCTCCAAGAGACAA GCAAGCTCCAAGAGACAAA CATTAAATTCAACCACCAA ATTAAATTCAACCACCAAG TTAAATTCAACCACCAAGA ATCAAGGATAGACTTTCTA TCAAGGATAGACTTTCTAG CAAGGATAGACTTTCTAGA TTCAACCCTGGCCTCAGTG TCAACCCTGGCCTCAGTGT CAACCCTGGCCTCAGTGTA TGAAAAATCATCAGAAAAC CCCACTCCCGGGCCGCCAG CACACTTGCCTAGGTGAAT ACACTTGCCTAGGTGAATA CACTTGCCTAGGTGAATAT GTATATATGGGGCTATACC TATATATGGGGCTATACCA ATATATGGGGCTATACCAC CCTTTGACAGGCCGCCAGT CTTTGACAGGCCGCCAGTC TTTGACAGGCCGCCAGTCA AGCTTCCACACTTGCCTCA GCTTCCACACTTGCCTCAG CTTCCACACTTGCCTCAGT TCATGAGTGGGGTCTCCAG CATGAGTGGGGTCTCCAGT ATGAGTGGGGTCTCCAGTC CTTTGAAAAATCATCAGAA CCCCCTAAAGCTTCAACCC CCCCTAAAGCTTCAACCCT CCCTAAAGCTTCAACCCTG CAGGCATTAAATTCAAGCT AGGCATTAAATTCAAGCTC AAATGTGATTTGCCCAGGA AATGTGATTTGCCCAGGAG ATGTGATTTGCCCAGGAGG AGGGGCCGCCAGTCATTAA GGGGCCGCCAGTCATTAAA GGGCCGCCAGTCATTAAAT CCACTTCCCTCAGTGTATA CACTTCCCTCAGTGTATAT ACTTCCCTCAGTGTATATA GTGGCTATCCCACTGACGG TGGCTATCCCACTGACGGG GGCTATCCCACTGACGGGC AGCCCCAAGAGACAAACCC GCCCCAAGAGACAAACCCT CCCCAAGAGACAAACCCTT CTAGGAGAAAGAAACATGA TAGGAGAAAGAAACATGAT AGGAGAAAGAAACATGATT TTTTGGAAAAAAAGGCAGC TTTGGAAAAAAAGGCAGCC TTGGAAAAAAAGGCAGCCT TTCCTCCCCCTAAAGCTTT TCCTCCCCCTAAAGCTTTC CCTCCCCCTAAAGCTTTCA AGAAACAAACTCTTGAAAA GAAACAAACTCTTGAAAAA AAACAAACTCTTGAAAAAA ACTGGTGCTCTTCCCACTT CAGGCCGCCAGTCATTAAA AGGCCGCCAGTCATTAAAT GGCCGCCAGTCATTAAATT CATTAAATGCAAGCTCCAA ATTAAATGCAAGCTCCAAG TTAAATGCAAGCTCCAAGA CCCAAGAGACAAACCCTTG GAAAATTTGTGCCTACCCC AAAATTTGTGCCTACCCCA AAATTTGTGCCTACCCCAC AAGAATCAAGGATAGAATT AGAATCAAGGATAGAATTT GAATCAAGGATAGAATTTC ACCACCGACGGCCCGCCAG CCACCGACGGCCCGCCAGG CACCGACGGCCCGCCAGGC CACACTTGCCTCGGTGTAT ACACTTGCCTCGGTGTATA CACTTGCCTCGGTGTATAT GAGGCTATACCACTGACGG AAAGAGACAAACTCTTGAA AAGAGACAAACTCTTGAAA AGAGACAAACTCTTGAAAA GGCAGCCTAGGAGAAAGCA GCAGCCTAGGAGAAAGCAA CAGCCTAGGAGAAAGCAAC AGTGTATATATGTGGCTAT GTGTATATATGTGGCTATA GTGTATATATGTGGCTATC TGTATATATGTGGCTATAC TGTATATATGTGGCTATCC TCTAGAAAGTACCTTCCCC CTAGAAAGTACCTTCCCCT TAGAAAGTACCTTCCCCTA TATAGGTGGGTATCCCGCT ATAGGTGGGTATCCCGCTG TAGGTGGGTATCCCGCTGA TTTCCCACTTCCCTCAGTG TTCCCACTTCCCTCAGTGT TCCCACTTCCCTCAGTGTA GGATAGAAGTTCTAGAAAG GATAGAAGTTCTAGAAAGT ATAGAAGTTCTAGAAAGTT CAAAGCAGTAAAATGTGTA AAAGCAGTAAAATGTGTAA AAGCAGTAAAATGTGTAAT TAAAATGTGTAATTTCATG GCCTAGGTGAATATAGGTG CCTAGGTGAATATAGGTGG CTAGGTGAATATAGGTGGG GCCTGTTATATTTTGAAAA CCTGTTATATTTTGAAAAC CCTGTTATATTTTGAAAAA CTGTTATATTTTGAAAACT CTGTTATATTTTGAAAAAA CTGTTATATTTTGAAAAAT TTGACAGGCCGCCAGTCAT TGACAGGCCGCCAGTCATT GACAGGCCGCCAGTCATTA TAGAATTTCTAGAAATTTC AGAATTTCTAGAAATTTCC GAATTTCTAGAAATTTCCT AGGAGGGGGCGTCCAGTCA GGAGGGGGCGTCCAGTCAT AGGAGAAAGCAGCATGATT GGAGAAAGCAGCATGATTA GAGAAAGCAGCATGATTAT GGCCCGCCAGTCATTAAAT GCCCGCCAGTCATTAAATT CCCGCCAGTCATTAAATTC TGTAAGAACTGCCCTCCCC ATGTGCCTATACCACGGAC TGTGCCTATACCACGGACG GTGCCTATACCACGGACGG CCACCAAGAGACAAACTCT CACCAAGAGACAAACTCTT ACCAAGAGACAAACTCTTG TAAAGATTTCACACTTGTG AAAGATTTCACACTTGTGT AAGATTTCACACTTGTGTC CTTTCCCACTTCCCTCAGT TTNTNANAAATNNTCAGAA TNTNANAAATNNTCAGAAA NTNANAAATNNTCAGAAAA CCTAAAGATTTCACACTTG CTAAAGATTTCACACTTGT AAAGAAACATGATTTTTCA TAAATTCAACCACCAAGAG AAATTCAACCACCAAGAGA TGGCTATTCCTTTGACAGG GGCTATTCCTTTGACAGGC GCTATTCCTTTGACAGGCC CCTTCCCCTAAAGCTTTCA CTTCCCCTAAAGCTTTCAC TTCCCCTAAAGCTTTCACT TTCCCCTAAAGCTTTCACA AAGCTTTCACTCTTGCCTC TGCTCTTCCCACTTCCGGA GCTCTTCCCACTTCCGGAC CTCTTCCCACTTCCGGACC CCCCTGACAGGCCGCCAGT CCCTGACAGGCCGCCAGTC CCTGACAGGCCGCCAGTCA TATATTTTGAAAAAACATC ATATTTTGAAAAAACATCA TATTTTGAAAAAACATCAG ACTGGCCTCAGTGTATATA CTGGCCTCAGTGTATATAT TGGCCTCAGTGTATATATG ACAGGCCGCCAGTCATTAA CTAGAAATTTCCTTCCCCT TAGAAATTTCCTTCCCCTA AGAAATTTCCTTCCCCTAA CTTTCACACTGGCCTCAGT TTTCACACTGGCCTCAGTG TTCACACTGGCCTCAGTGT CCCACTGACAGGCCGCCAG CCACTGACAGGCCGCCAGT GCTGACAGGCCCCCAGTCA CTGACAGGCCCCCAGTCAT AACTCATCAGAAAACTGAG CCCTAAAGATTTCACACTT TCACACTGGCCTCAGTGTA NTTTCTGAATGTTTCTTAG AGAAAGTCCCTCCCCCTAA GAAAGTCCCTCCCCCTAAG CAGTCATTAAATTCAAACT AGTCATTAAATTCAAACTC GTCATTAAATTCAAACTCC ATGTGGCTATACCACTTAC TGTGGCTATACCACTTACG GTGGCTATACCACTTACGG GGGCGTCCAGTCATTAAAT GGCGTCCAGTCATTAAATT GCGTCCAGTCATTAAATTC ACGGCCCGCCAGTCATTAA CGGCCCGCCAGTCATTAAA CTCCAAGAGACAAACCCTT TCCAAGAGACAAACCCTTG CCAAGAGACAAACCCTTGA CTATTCCTTTGACAGGCCG CTCCCCGTAAAGCTTTCAC TCCCCGTAAAGCTTTCACA CAAGCTCCAAGAGACAAAC CTCCCCCTAAGGCTTTCAC TCCCCCTAAGGCTTTCACA CCCCCTAAGGCTTTCACAC CTAGAAAGTTCCTTCCCCT CAATATATGTGACTACACC AATATATGTGACTACACCA ATATATGTGACTACACCAC ATTAAATTCAAGCTCCAAG TTAAATTCAAGCTCCAAGA TAAATTCAAGCTCCAAGAG TAAATTCAAGCTCCAAGAA TAAATTCAAACTCCAAGAG AAATTCAAACTCCAAGAGA AATTCAAACTCCAAGAGAC GTGGCTATACCACTGACAG TGGCTATACCACTGACAGG GGCTATACCACTGACAGGC TGTGATTTGCCCAGGAGGG TCTAGAAAGTCCCTCCCCC CTAGAAAGTCCCTCCCCCT TAGAAAGTCCCTCCCCCTA TCAAAGAGACAAACTCTTG CAAAGAGACAAACTCTTGA AGGCAGCCTAGGAGAAAGA GGCAGCCTAGGAGAAAGAA GCAGCCTAGGAGAAAGAAA AAGCTCCAAGAGACAAACC AAGCTCCAAGAGACAAACT AGCTCCAAGAGACAAACCC AGCTCCAAGAGACAAACTC CATTAAATTCAAACTCCAA ATTAAATTCAAACTCCAAG TTAAATTCAAACTCCAAGA AAACTGAGAATCAAGGATA AACTGAGAATCAAGGATAG ACTGAGAATCAAGGATAGA TTGAAAAAAAGCCAGCCTA TGAAAAAAAGCCAGCCTAG GAAAAAAAGCCAGCCTAGG CTTGCCTCAGTGTATATAT TTGCCTCAGTGTATATATG TGCCTCAGTGTATATATGG TGCCTCAGTGTATATATGT TGCCTCAGTGTATATATGA AAAGCAACCGGATTTTTCA TAGAAAGTTCATTCCCCTA AGAAAGTTCATTCCCCTAA GAAAGTTCATTCCCCTAAA GACAAGTTTTGGAAAAAAA ACAAGTTTTGGAAAAAAAG CAAGTTTTGGAAAAAAAGG AATTTCTAGAAAGTTCCTT ATTTCTAGAAAGTTCCTTC TTTCTAGAAAGTTCCTTCC TTCCCACTTCCGGACCCCC TCCCACTTCCGGACCCCCA CCCACTTCCGGACCCCCAG ACTTGCCTCGGTGTATATA CTTGCCTCGGTGTATATAT AGCTTCAACCCTGGCCTCA GCTTCAACCCTGGCCTCAG CTTCAACCCTGGCCTCAGT AACAAATGTGATTTGCCCA ACAAATGTGATTTGCCCAG CAAATGTGATTTGCCCAGG AGACAAGTTTTGGAAAAAA TTATATATGTGGCTATCCC TATATATGTGGCTATCCCA ATATATGTGGCTATCCCAC TCAAGCTCCAAGAGACAAA TCCGAGGAGAAAGCAACCG CCGAGGAGAAAGCAACCGG CGAGGAGAAAGCAACCGGA ATGTGGGTATACCACTGAC TGTGGGTATACCACTGACA GTGGGTATACCACTGACAG ATGAGGCTATACCACTGAC TGAGGCTATACCACTGACG ACACTTGCCACAGTGAAAA CACTTGCCACAGTGAAAAT ACTTGCCACAGTGAAAATT TCTTGAAAAAAAGGCAGCC CTTGAAAAAAAGGCAGCCT TTGAAAAAAAGGCAGCCTA TTGAAAAAAAGGCAGCCTC GGCCTCAGTGTATATATGT GCCTCAGTGTATATATGTG CTGAGGGGCCGCCAGTCAT TGAGGGGCCGCCAGTCATT GAGGGGCCGCCAGTCATTA AATTTCCTTCCCCTAAACC ATTTCCTTCCCCTAAACCT TTTCCTTCCCCTAAACCTT CAAGAGACAAACCCTTGAA TGTATATATGAGGCTATAC GTATATATGAGGCTATACC TATATATGAGGCTATACCA TATTGGTGGCTATCCCCCT ATTGGTGGCTATCCCCCTG TTGGTGGCTATCCCCCTGA TAAAGCTTTCCCACTTCCC AAAGCTTTCCCACTTCCCT AAGCTTTCCCACTTCCCTC CCAAGAAACAAACTCTTGA CAAGAAACAAACTCTTGAA AAGAAACAAACTCTTGAAA TCCCGCTGACAGGCCCCCA TATGAGGCTATACCACTGA AGAAAGCAGCATGATTATT TTCCACACTTGCCTCAGTG CAAGCTCAAAGAGACAAAC AAGCTCAAAGAGACAAACT AGCTCAAAGAGACAAACTC GTATATATGTGGCTATACC TATATATGTGGCTATACCA ATATATGTGGCTATACCAC TTCTTTGCCCTAAAGATTT TCTTTGCCCTAAAGATTTC CTTTGCCCTAAAGATTTCA AAAGTTCCTTCCCCTAAAG AAGTTCCTTCCCCTAAAGC AGTTCCTTCCCCTAAAGCT ATATGTGCCTATACCACGG TATGTGCCTATACCACGGA TNNTCAGAAAACTGAGAAT TATATATGTGGGTATACCA ATATATGTGGGTATACCAC TATATGTGGGTATACCACT CACTGACGGGCCGCCAGTC ACTGACGGGCCGCCAGTCC ACTGACGGGCCGCCAGTCA CTGACGGGCCGCCAGTCCT CTGACGGGCCGCCAGTCAT AGTTCCTCCCCCTAAAGCT GTTCCTCCCCCTAAAGCTT CCACACTTGCCTCAGTGTA CACACTTGCCTCAGTGTAA AGGTGGGTATCCCGCTGAC CCACTTCCGGACCCCCAGT CACTTCCGGACCCCCAGTC CCCTTGCCTCAGTGTATAT CCTTGCCTCAGTGTATATA AAAAAAAGGCAGCCTCGGA AAAAAAGGCAGCCTCGGAG AAAAAGGCAGCCTCGGAGA TAAAGCTTTCACTCTTGCC AAAGCTTTCACTCTTGCCT CTAAAGCTTCAACCCTGGC TAAAGCTTCAACCCTGGCC AAAGCTTCAACCCTGGCCT GAGAAAGCAACATGATTTT AGAAAGCAACATGATTTTT GAAAGCAACATGATTTTTC GTGGCTATCCCCCTGACGG TGGCTATCCCCCTGACGGC GGCTATCCCCCTGACGGCC GTAAAGCTTTCACACTTGC TAAAGCTTTCACACTTGCC AAAGCTTTCACACTTGCCT TTCTAGAAAGTTCTTTGCC TCTAGAAAGTTCTTTGCCC CTAGAAAGTTCTTTGCCCT ACGGCCCGCCAGGCATTAA CGGCCCGCCAGGCATTAAA GGCCCGCCAGGCATTAAAT CAGTGTATATATGAGGCTA AGTGTATATATGAGGCTAT GTGTATATATGAGGCTATA ATATGTGGGTATACCACTG TATGTGGGTATACCACTGA ATTCAACCACCAAGAGACA TTCAACCACCAAGAGACAA TCAACCACCAAGAGACAAA TGAAAAAAAGGCAGCCTCG GAAAAAAAGGCAGCCTCGG GCTCCAAGAGACAAACTCT CTCCAAGAGACAAACTCTT CCCTAAAGCTTCCACACTT CCTAAAGCTTCCACACTTG CTAAAGCTTCCACACTTGC AGCTTCCACACTTGCCTAG GCTTCCACACTTGCCTAGG CTTCCACACTTGCCTAGGT CACTGGCCTCAGTGTATAT ATTTCATGAGTGGGGTCTC TTTCATGAGTGGGGTCTCC TTCATGAGTGGGGTCTCCA AGGATAGAATTTCTAGAAA ACTCAAGGGACAAAGCAGT CTCAAGGGACAAAGCAGTA TCAAGGGACAAAGCAGTAA CCTGAGGGGCCGCCAGTCA AAAAAAAGCCAGCCTAGGA GGGGCTATACCACTGACAG GGGCTATACCACTGACAGG AAATTTCCTTCCCCTAAAC GTAAAATGTGTAATTTCAT AGATTTCACACTTGTGTCA TCAAGCTCCAAGAGACAAG CAAGCTCCAAGAGACAAGT AAGCTCCAAGAGACAAGTT GCCAGTTATACTTTGAAAA CCAGTTATACTTTGAAAAA CAGTTATACTTTGAAAAAT CCAAGAGACAAGTTTTGGA CAAGAGACAAGTTTTGGAA AAGAGACAAGTTTTGGAAA AAGCTCCAAGAAACAAACT AGCTCCAAGAAACAAACTC GCTCCAAGAAACAAACTCT TAGAATTTCTAGAAAGTCC AGAATTTCTAGAAAGTCCC GAATTTCTAGAAAGTCCCT GTATACCACTGACAGGCCG TATACCACTGACAGGCCGC ATACCACTGACAGGCCGCC ACCACGGACAGGCCGCCAG CCACGGACAGGCCGCCAGT CACGGACAGGCCGCCAGTC AAAATTCAAGCTCCAAGAG AAATTCAAGCTCCAAGAGA AATTCAAGCTCCAAGAGAC GTGTAATTTCATGAGTGGG TGTAATTTCATGAGTGGGG GTAATTTCATGAGTGGGGT AACTGCCCTCCCCCTAAAG ACTGCCCTCCCCCTAAAGC CTGCCCTCCCCCTAAAGCT CCCTTAAAGCTTCCACACT CCTTAAAGCTTCCACACTT CTTAAAGCTTCCACACTTG AGCCTCGGAGAAAGCAACA GCCTCGGAGAAAGCAACAT AGAAAGTACCTTCCCCTAA ACTTCCGGACCCCCAGTCA CTTCCGGACCCCCAGTCAT TTCCGGACCCCCAGTCATA ATAGCAACATGATTTTTCA GAGACAAACCCTTGAAAAA TTTGTGCCTACCCCACTCC TTGTGCCTACCCCACTCCC TGTGCCTACCCCACTCCCG AGTTCATTCCCCTAAAGCC GTTCATTCCCCTAAAGCCT TTCATTCCCCTAAAGCCTT AAATCATCAGAAAACTAAG AATCATCAGAAAACTAAGA ATCATCAGAAAACTAAGAA TGGCTATACCACTTACGGG AATTTGTGCCTACCCCACT ATTTGTGCCTACCCCACTC CCGCCAGGCATTAAATTCA CGCCAGGCATTAAATTCAA GCCAGGCATTAAATTCAAG AGCTCCAAGAGACAAGTTT GCTCCAAGAGACAAGTTTT CTCCAAGAGACAAGTTTTG TATTCCTTTGACAGGCCGC ATTCCTTTGACAGGCCGCC CAGTGTATATTGGTGGCTA AGTGTATATTGGTGGCTAT TACCACTGACAGGCCGCCA ACCACTGACAGGCCGCCAG ACGGGCCGCCAGTCATTAA CGGGCCGCCAGTCATTAAA GAAAAAAAGGCAGCCTAGG AAAAAAAGGCAGCCTAGGC AAAAAAAGGCAGCCTAGGA AAAAAAGGCAGCCTAGGCG AAAAAAGGCAGCCTAGGAG CAGCCTCGGAGAAAGCAAC TCAAACTCCAAGAGACAAA CAAACTCCAAGAGACAAAC AAACTCCAAGAGACAAACT CTATCCCCCTGACGGCCCG TATCCCCCTGACGGCCCGC ATCCCCCTGACGGCCCGCC GTGATTTGCCCAGGAGGGG TGATTTGCCCAGGAGGGGG GATTTGCCCAGGAGGGGGC ACCTTTCACACTTGCCTCA CCTTTCACACTTGCCTCAG TCAGTGTATATATGAGGCT CCAAGAGACAAACTCTTGA AGTTATATTTTGAAAAATC GTTATATTTTGAAAAATCA TTATATTTTGAAAAATCAT AAGCTTCAACCCTGGCCTC GGACGGCCCGCCAGTCATT GACGGCCCGCCAGTCATTA AAGGCAGCCTCGGAGAAAG AGGCAGCCTCGGAGAAAGC GGCAGCCTCGGAGAAAGCA CCACAGTGAAAATTTGTGC CACAGTGAAAATTTGTGCC ACAGTGAAAATTTGTGCCT GAAAGTACCTTCCCCTAAA AAAGTACCTTCCCCTAAAG GTTCCTTCCCCTAAAGCTT TTCCTTCCCCTAAAGCTTT ACTTGCCTAGGTGAATATA TCCTTCCCCTAAAGCTTTC GAAGATAGACTCAAGGGAC AAGATAGACTCAAGGGACA AGATAGACTCAAGGGACAA AATNNTCAGAAAACTGAGA ATNNTCAGAAAACTGAGAA TTCACTCTTGCCTCAGTGT TCACTCTTGCCTCAGTGTA CACTCTTGCCTCAGTGTAT AGGTGGCTATTCCTTTGAC GGTGGCTATTCCTTTGACA GTGGCTATTCCTTTGACAG AAGGATAGACTTTCTAGAA GGTGGGTATCCCGCTGACA GTGGGTATCCCGCTGACAG AAGTTTTGGAAAAAAAGGC AGTTTTGGAAAAAAAGGCA GTTTTGGAAAAAAAGGCAG TTGCCTAGGTGAATATAGG TGCCTAGGTGAATATAGGT NNNNNNANATTNTNANAAA NNNNNANATTNTNANAAAT NNNNANATTNTNANAAATN TCAGTGTATATATGGGGCT CAGTGTATATATGGGGCTA AGTGTATATATGGGGCTAT AGGATAGACTTTCTAGAAA GGATAGACTTTCTAGAAAG TCCCCCTAAAGCTTTCACA CCCCCTAAAGCTTTCACAC CCCCTAAAGCTTTCACACT GCGCGAACCCACGGACAGG CGCGAACCCACGGACAGGC GCGAACCCACGGACAGGCC TGGGTATACCACTGACAGG GGGTATACCACTGACAGGC TGCCTATACCACGGACGGC GCCTATACCACGGACGGCC CAGAAAACTGAGAATCAAG GTGCCTACCCCACTCCCGG ATTCAAACTCCAAGAGACA TTCAAACTCCAAGAGACAA TGGAAAAAAAGGCAGCCTA GGAAAAAAAGGCAGCCTAG TCATTAAATTCAAGCTCCA TCCGGACCCCCAGTCATAA GCTTTCACACTTGCCTCGG CTTTCACACTTGCCTCGGT TTTCACACTTGCCTCGGTG CTTCCCTCAGTGTATATAT TTCCCTCAGTGTATATATG TCCCTCAGTGTATATATGT GTGAAAATTTGTGCCTACC TGAAAATTTGTGCCTACCC GGTATACCACTGACAGGCC AATGCAAGCTCCAAGAGAC AACCCTGGCCTCAGTGTAT ACCCTGGCCTCAGTGTATA CCCTGGCCTCAGTGTATAT AGTACCTTCCCCTAAAGCT GTACCTTCCCCTAAAGCTT TACCTTCCCCTAAAGCTTT ATACCACGGACGGCCCGCC TACCACGGACGGCCCGCCA ACCACGGACGGCCCGCCAG CCCACTGACGGGCCGCCAG CCACTGACGGGCCGCCAGT GCCGCCAGTCATTAAATTC CCGCCAGTCATTAAATTCA CGCCAGTCATTAAATTCAA ATTTTGAAAAAACATCAGA TTTTGAAAAAACATCAGAA TTTGAAAAAACATCAGAAA GAAAAAACATCAGAAAACT AAAAAACATCAGAAAACTG AAAAACATCAGAAAACTGA GTCATTAAATTCAACCACC TCATTAAATTCAACCACCA GGACAAAGCAGTAAAATGT CTATACCACTTACGGGCCG TATACCACTTACGGGCCGC ATACCACTTACGGGCCGCC TCATTAAATTCAAACTCCA AATATAGGTGGGTATCCCG ATATAGGTGGGTATCCCGC CCTGGCCTCAGTGTATATA GAGAAAGCAACCGGATTTT AGAAAGCAACCGGATTTTT GAAAGCAACCGGATTTTTC ATAGATTTTCTAGAAAGTT TAGATTTTCTAGAAAGTTC AGATTTTCTAGAAAGTTCC GATAGACTTTCTAGAAAGT GTATATATGTGGGTATACC CCTAAAGCTTCAACCCTGG CCCCTAAAGCTTTCACAGT CCCTAAAGCTTTCACAGTT CCTAAAGCTTTCACAGTTG TCCAAGAGACAAACTCTTG TCCCCTAAAGCTTTCACTC CCCCTAAAGCTTTCACTCT CCCTAAAGCTTTCACTCTT AGAATTTCTAGAAAGTTCA GAATTTCTAGAAAGTTCAT AATTTCTAGAAAGTTCATT ATTTTCTAGAAAGTTCCTT TTTTCTAGAAAGTTCCTTC AAGAATCAAGGATAGAAGT CTCAGTGTATATTGGTGGC TCAGTGTATATTGGTGGCT GCTATCCCCCTGACGGCCC TAAAGCTTCCACACTTGCC CTTGAAAAAAAGCCAGCCT CAGTAAAATGTGTAATTTC AGTAAAATGTGTAATTTCA NNNANATTNTNANAAATNN NNANATTNTNANAAATNNT NANATTNTNANAAATNNTC CCTAGGAGAAAGAAACATG TCTTCCCACTTCCGGACCC CTTCCCACTTCCGGACCCC TNANAAATNNTCAGAAAAC NANAAATNNTCAGAAAACT ANAAATNNTCAGAAAACTG GACGGGCCGCCAGTCCTTA ACGGGCCGCCAGTCCTTAA CGGGCCGCCAGTCCTTAAA TGTAAATATGTGGCTATAC GTAAATATGTGGCTATACC TAAATATGTGGCTATACCA CCCTGACGGCCCGCCAGTC CCTGACGGCCCGCCAGTCA CTGACGGCCCGCCAGTCAT CTACACCACCGACGGCCCG TACACCACCGACGGCCCGC ACACCACCGACGGCCCGCC TCATTCCCCTAAAGCCTTC CATTCCCCTAAAGCCTTCA ATTCCCCTAAAGCCTTCAC TGTTATATTTTGAAAACTC GTTATATTTTGAAAACTCA ATTAAATTCAAGCCCCAAG TTAAATTCAAGCCCCAAGA TAAATTCAAGCCCCAAGAG CCCCTAAAGCTTCCACACT GGATAGATTTTCTAGAAAG GATAGATTTTCTAGAAAGT GCTATACCACTGACAGGCC ACTCTTGAAAAAAAGGCAG CTCTTGAAAAAAAGGCAGC CACACTTGCCTCAGTGTAT ACACTTGCCTCAGTGTATA CACTTGCCTCAGTGTATAT CAGTCTTTACTGGTGCTCT AGTCTTTACTGGTGCTCTT ATGTAACAAATGTGATTTG TGTAACAAATGTGATTTGC GTAACAAATGTGATTTGCC AAAAGGCAGCCTCGGAGAA AAAGGCAGCCTCGGAGAAA GATAGAATTTCTAGAAAGT ATAGAATTTCTAGAAAGTT TAGAATTTCTAGAAAGTTC AAGTACCTTCCCCTAAAGC CAAGAGACAAACTCTTGAA GCTATCCCCCTGAGGGGCC CTATCCCCCTGAGGGGCCG TCCAGTCATTAAATTCAAG CCAGTCATTAAATTCAAGC CAGTCATTAAATTCAAGCC CAGTCATTAAATTCAAGCT CCCACGGACAGGCCGCCAG AACTCTTGAAAAAAAGGCA TTCCACACTTGCCTAGGTG TCCACACTTGCCTAGGTGA CCCCTAAGGCTTTCACACT CCCTAAGGCTTTCACACTT CTTGCCTCAGTGTAAATAT TTGCCTCAGTGTAAATATG TGCCTCAGTGTAAATATGT CTCTTGAAAAAAAGCCAGC TCTTGAAAAAAAGCCAGCC AACCACCAAGAGACAAACT ACCACCAAGAGACAAACTC TGGCTATCCCCCTGACAGG GGCTATCCCCCTGACAGGC GCTATCCCCCTGACAGGCC AGCTTTCACACTTGCCTCA TATACTTTGAAAAATCATC CTTGCCTAGGTGAATATAG TTGAAAAAACATCAGAAAA TGAAAAAACATCAGAAAAC TACCACTTACGGGCCGCCA GCCCTAAAGATTTCACACT GCCGCCAGTCATTAAATGC CCGCCAGTCATTAAATGCA CGCCAGTCATTAAATGCAA GAAATTTCCTTCCCCTAAA TTGCCCAGGAGGGGGCGTC TGCCCAGGAGGGGGCGTCC GCCCAGGAGGGGGCGTCCA AACAAACTCTTGAAAAAAA ACAAACTCTTGAAAAAAAG AAATCATCAGAAAACTGAG TCCCCTAAAGCTTTCACAC TCCCCTAAAGCTTTCACAG CTGGTGCTCTTCCCACTTC TGGTGCTCTTCCCACTTCC TCCAAGAAACAAACTCTTG AAGCTTCCACACTTGCCTA AAAGCAACATGATTTTTCT AAAGCAACATGATTTTTCA GACGGGCCGCCAGTCATTA GGGGGCGTCCAGTCATTAA GGGGCGTCCAGTCATTAAA GGAGATAGCAACATGATTT GAGATAGCAACATGATTTT AGATAGCAACATGATTTTT TAGGAGAAAGCAGCATGAT TGCCCTCCCCCTAAAGCTT GCCCTCCCCCTAAAGCTTC CCCTCCCCCTAAAGCTTCA TGGGGTCTCCAGTCATTAA GGGGTCTCCAGTCATTAAA GGGTATCCCGCTGACAGGC GGTATCCCGCTGACAGGCC GTATCCCGCTGACAGGCCC AAAGTTCTTTGCCCTAAAG AAGTTCTTTGCCCTAAAGA AGTTCTTTGCCCTAAAGAT GCCCGCCAGGCATTAAATT GCCTCAGTGTATATATGGG CCTCAGTGTATATATGGGG CTCAGTGTATATATGGGGC ACCTTCCCCTAAAGCTTTC AGCCAGCCTAGGAGAAAGC GGCCGCCAGTCATTAAATG CAGCCTAGGAGATAGCAAC AGCCTAGGAGATAGCAACA GAGGGGGCGTCCAGTCATT AGGGGGCGTCCAGTCATTA ATTTTGAAAACTCATCAGA TTTTGAAAACTCATCAGAA TTTGAAAACTCATCAGAAA ATTTGCCCAGGAGGGGGCG TTTGCCCAGGAGGGGGCGT AAATTCAAGCTCCAAGAAA AATTCAAGCTCCAAGAAAC GGACAGGCCGCCAGTCATT AACTCCAAGAGACAAACTC ACTCCAAGAGACAAACTCT GCTATCCCACTGACGGGCC CTATCCCACTGACGGGCCG TATCCCACTGACGGGCCGC CAGGCCCCCAGTCATTAAA CGTAAAGCTTTCACACTTG TAATTTCATGAGTGGGGTC AATTTCATGAGTGGGGTCT GGGCCGCCAGTCCTTAAAT ATGTGGCTATCCCACTGAC TGTGGCTATCCCACTGACA TGTGGCTATCCCACTGACG GTGGCTATCCCACTGACAG GCTATACCACTTACGGGCC ACAAAGCAGTAAAATGTGT TATATGTGGCTATACCACT ATATGTGGCTATACCACTG ATATGTGGCTATACCACTT ATCCCGCTGACAGGCCCCC GTTCTTTGCCCTAAAGATT CAGCCTAGGAGAAAGAAAC AGAATCAAGGATAGAAGTT TCCAAGAGACAAGTTTTGG GTATATATGTGGCTATCCC TATATATGTGGCTATCCCC ATATATGTGGCTATCCCCC TGACGGGCCGCCAGTCATT CTTGCCACAGTGAAAATTT TTGCCACAGTGAAAATTTG TGCCACAGTGAAAATTTGT CCCTCAGTGTATATATGTG CCTCAGTGTATATATGTGG AAAACTAAGAATCAAGGAT AAACTAAGAATCAAGGATA AACTAAGAATCAAGGATAG GATAGCAACATGATTTTTC GCCTCAGTGTAAATATGTG CCTCAGTGTAAATATGTGG CACTGACAGGCCGCCAGTC ACTGACAGGCCGCCAGTCA CTGACAGGCCGCCAGTCAT CCCTGAGGGGCCGCCAGTC CCTATACCACGGACGGCCC CTATACCACGGACGGCCCG ATGTGACTACACCACCGAC TGTGACTACACCACCGACG GTGACTACACCACCGACGG TTCCTTTGACAGGCCGCCA TCCTTTGACAGGCCGCCAG AGAGACAAGTTTTGGAAAA ANATTNTNANAAATNNTCA AATCAAGGATAGAATTTCT CCCACTTCCCTCAGTGTAT TATCCCGCTGACAGGCCCC TTAACACTTGCCACAGTGA TAACACTTGCCACAGTGAA AACACTTGCCACAGTGAAA CTCCAGTCATTAAATTCAA TATACCACGGACGGCCCGC NNNNNNTTTCTGAATGTTT NNNNNTTTCTGAATGTTTC NNNNTTTCTGAATGTTTCT GTCATTAAATTCAAGCTCA TCATTAAATTCAAGCTCAA CATAAAATTCAAGCTCCAA CATTAAATTCAAGCTCAAA ATAAAATTCAAGCTCCAAG TAAAATTCAAGCTCCAAGA CTATCCCCCTGACAGGCCG TATCCCCCTGACAGGCCGC CACTTACGGGCCGCCAGTC ACTTACGGGCCGCCAGTCA CTTACGGGCCGCCAGTCAT TGTTATATTTTGAAAAATC GAATCAAGGATAGACTTTC AATCAAGGATAGACTTTCT AGGAGAAAGCAACATGATT GGAGAAAGCAACATGATTT NAAATNNTCAGAAAACTGA AAATNNTCAGAAAACTGAG GGCTATACCACTTACGGGC CATCAGAAAACTAAGAATC ATCAGAAAACTAAGAATCA TCAGAAAACTAAGAATCAA TTCCTTCCCCTAAACCTTT TCCTTCCCCTAAACCTTTC CCTTCCCCTAAACCTTTCA TGGCTATCCCACTGACAGG AATTTCTAGAAATTTCCTT ATATATGAGGCTATACCAC CCCGGGCCGCCAGTCATTA CCGGGCCGCCAGTCATTAA GAATATAGGTGGGTATCCC GCTTTAACACTTGCCACAG CTTTAACACTTGCCACAGT TTTAACACTTGCCACAGTG GAAAACTGAGAATCAAGGA AAAACTGAGAATCAAGGAT CAGTCATTAAATTCAACCA AGTCATTAAATTCAACCAC CTCCAAGAAACAAACTCTT AACCTTTCACACTTGCCTC CGCAGCCTAGGAGATAGCA GCAGCCTAGGAGATAGCAA AAAACATCAGAAAACTGAG AAACATCAGAAAACTGAGA AACATCAGAAAACTGAGAA CCAGTCATAAAATTCAAGC TATATTGGTGGCTATCCCC ATATTGGTGGCTATCCCCC TCGGTGTATATATGTGGCT CGGTGTATATATGTGGCTA GGTGTATATATGTGGCTAT TGTTATATTTTGAAAAAAC AGAAAGTTCCTCCCCCTAA GCTCAAAGAGACAAACTCT AGCCTAGGAGAAAGCAACA GACTACACCACCGACGGCC ACTACACCACCGACGGCCC TCCCACTGACGGGCCGCCA AGTCATAAAATTCAAGCTC GTCATAAAATTCAAGCTCC TCATAAAATTCAAGCTCCA TCCACACTTGCCTCAGTGT ATTTTGAAAAATCATCAGA TTTTGAAAAATCATCAGAA TTTGAAAAATCATCAGAAA TAGAAGTTCTAGAAAGTTC AGAAGTTCTAGAAAGTTCT GAAGTTCTAGAAAGTTCTT ATGTGGCTATACCACTGAC TGTGGCTATACCACTGACA TGTGGCTATACCACTGACG GTGGCTATACCACTGACGG AGTCATTAAATTCAAGCTC GTCATTAAATTCAAGCTCC CTAAAGCTTTCCCACTTCC ATCCCACTGACGGGCCGCC CCTCCCCCTAAGGCTTTCA CGAACCCACGGACAGGCCG GATTTTCTAGAAAGTTCCT AGTGTATATATGTGGGTAT GTGTATATATGTGGGTATA TGTATATATGTGGGTATAC AAATATGTGGCTATACCAC NNNTTTCTGAATGTTTCTT NNTTTCTGAATGTTTCTTA TATGTGGCTATACCACTGA TATGTGGCTATACCACTTA CACCACCGACGGCCCGCCA NATTNTNANAAATNNTCAG GAGAATCAAGGATAGAATT GGCTATCCCACTGACAGGC GCTATCCCACTGACAGGCC CTATCCCACTGACAGGCCG TAGAAAGTTCCTCCCCCTA CCTCCCCCTAAAGCTTCAA CTCCCCCTAAAGCTTCAAC TCCCCCTAAAGCTTCAACC GGTGGCTATCCCCCTGAGG CCACGGACGGCCCGCCAGT ATTCAAGCCCCAAGAGACA TTCAAGCCCCAAGAGACAA TCAAGCCCCAAGAGACAAA TATATGAGGCTATACCACT CTAAGAATCAAGGATAGAC TAAGAATCAAGGATAGACT TGGTGGCTATCCCCCTGAG AGTCATTAAATTCAAGCCC GTCATTAAATTCAAGCCCC CCAGGCATTAAATTCAAGC AAGTTCATTCCCCTAAAGC CCCTAAAGCTTTCACACTT CCTAAAGCTTTCACACTTG CTAAAGCTTTCACACTTGC GTAAGAACTGCCCTCCCCC AAACTCTTGAAAAAAAGGC CCCTAAACCTTTCACACTT CCTAAACCTTTCACACTTG CTAAACCTTTCACACTTGC TATCCCCCTGAGGGGCCGC ATCCCCCTGAGGGGCCGCC AAAAGGCAGCCTAGGCGAA AAAGGCAGCCTAGGCGAAA AAGGCAGCCTAGGCGAAAG GCTCCAAGAGACAAACCCT TTTCACTCTTGCCTCAGTG TAAACCTTTCACACTTGCC AAACCTTTCACACTTGCCT TATATGTGGCTATCCCCCT ATATGTGGCTATCCCCCTG TATGTGGCTATCCCCCTGA TTCCCCTAAACCTTTCACA TCCCCTAAACCTTTCACAC CCCCTAAACCTTTCACACT CCACACTTGCCTAGGTGAA AGACTCAAGGGACAAAGCA GACTCAAGGGACAAAGCAG GAGGAGAAAGCAACCGGAT AGGAGAAAGCAACCGGATT TATATTTTGAAAAATCATC ATATTTTGAAAAATCATCA TATTTTGAAAAATCATCAG CCAGTCATTAAATTCAAAC TTCTAGAAAGTTCCTTCCC ACCACTGACGGGCCGCCAG CCTCCCCGTAAAGCTTTCA GAAAGAAACATGATTTTTC ATTNTNANAAATNNTCAGA CCCCCTAAAGCTTTCCCAC CCCCTAAAGCTTTCCCACT CCCTAAAGCTTTCCCACTT AAGAATCAAGGATAGACTT AGAATCAAGGATAGACTTT GTCATTAAATGCAAGCTCC TCATTAAATGCAAGCTCCA TCAGTGTAAATATGTGGCT CAGTGTAAATATGTGGCTA AGTGTAAATATGTGGCTAT CTATACCACTGACGGGCCG TATACCACTGACGGGCCGC GAGACAAGTTTTGGAAAAA GGTGAATATAGGTGGGTAT GTGAATATAGGTGGGTATC TGAATATAGGTGGGTATCC AACTCTTGAAAAAAAGCCA ACTCTTGAAAAAAAGCCAG TGCCTCGGTGTATATATGT GCCTCGGTGTATATATGTG CCTCGGTGTATATATGTGG AATATGTGGCTATACCACT AAAAAGGCAGCCTAGGAGA GAATTTCTAGAAAGTTCCT ACCCACGGACAGGCCGCCA AATTTCTAGAAAGTCCCTC ATTTCTAGAAAGTCCCTCC CAAACTCTTGAAAAAAAGG CAAACTCTTGAAAAAAAGC AGGCAGCCTAGGAGAAAGC ATATGTGACTACACCACCG TATGTGACTACACCACCGA GTTATATTTTGAAAAAACA ATTTCTAGAAAGTTCATTC TTTCTAGAAAGTTCATTCC TTCACACTTGCCTCGGTGT AGCCTAGGAGAAAGAAACA GCCTAGGAGAAAGAAACAT GTGTAAATATGTGGCTATA AGGCAGCCTAGGCGAAAGC GGCAGCCTAGGCGAAAGCA TCATCAGAAAACTAAGAAT TAGGAGATAGCAACATGAT AGGAGATAGCAACATGATT ATATGGGGCTATACCACTG TATGGGGCTATACCACTGA ATGGGGCTATACCACTGAC GACAAAGCAGTAAAATGTG CTAGAAAGTTCATTCCCCT CTAGAAAGTTCCTCCCCCT AGAGACAAACCCTTGAAAA NNNNNNNTTTCTGAATGTT GACGGCCCGCCAGGCATTA GTCCAGTCATTAAATTCAA TTATATTTTGAAAAAACAT GGAGAAAGCAACCGGATTT TGCCTACCCCACTCCCGGG GCCTACCCCACTCCCGGGC TGAAAACTCATCAGAAAAC GAAAACTCATCAGAAAACT AAAACTCATCAGAAAACTG AGTCAATATATGTGACTAC GTCAATATATGTGACTACA TCAATATATGTGACTACAC AAGTTCTAGAAAGTTCTTT AGTTCTAGAAAGTTCTTTG TTGCCTCGGTGTATATATG AAACTCATCAGAAAACTGA TGAAAAAAAGGCAGCCTAG TAACAAATGTGATTTGCCC ATCCCCCTGACAGGCCGCC AAAGGCAGCCTAGGAGAAA AAGGCAGCCTAGGAGAAAG AAATTCAAGCTCAAAGAGA AATTCAAGCTCAAAGAGAC ATTCAAGCTCAAAGAGACA TCCCACCAGTCATTAAATT CCCACCAGTCATTAAATTC CCACCAGTCATTAAATTCA GGAGAAAGAAACATGATTT GAGAAAGAAACATGATTTT TACCACGGACAGGCCGCCA CAACCACCAAGAGACAAAC AAAAAAGCCAGCCTAGGAG AAATTCAAGCCCCAAGAGA AATTCAAGCCCCAAGAGAC CCTCAGTCAATATATGTGA CTCAGTCAATATATGTGAC TCAGTCAATATATGTGACT AGAAAACTGAGAATCAAGG ATTTCTAGAAATTTCCTTC TTTCTAGAAATTTCCTTCC TAAATGCAAGCTCCAAGAG GAAAGCAGCATGATTATTC TTTCTAGAAAGTCCCTCCC CCCTCCCCCTAAGGCTTTC GCTTTCCCACTTCCCTCAG TGGGTATCCCGCTGACAGG CACCAGTCATTAAATTCAA ACCAGTCATTAAATTCAAC TGACGGGCCGCCAGTCCTT GTCCCTCCCCCTAAGGCTT TCCCTCCCCCTAAGGCTTT TCCCCTAAAGCTTCCACAC CCTAAAGCTTTCCCACTTC ATGTGGCTATCCCCCTGAC TGTGGCTATCCCCCTGACA TGTGGCTATCCCCCTGACG GTGGCTATCCCCCTGACAG AATTCAACCACCAAGAGAC AAAGCAGCATGATTATTCA TAAGAACTGCCCTCCCCCT AAGAACTGCCCTCCCCCTA GTGTATATATGGGGCTATA GCTTTCACAGTTGACTCAG CTTTCACAGTTGACTCAGT TTTCACAGTTGACTCAGTG AGCTTTCCCACTTCCCTCA CCTAAAGCTTTCACTCTTG CCCAGTCATTAAATTCAAG CTAAAGCTTTCACTCTTGC TAAGGCTTTCACACTTGCC AAGGCTTTCACACTTGCCT AGGCTTTCACACTTGCCTC GGCCCCCAGTCATTAAATT GCCCCCAGTCATTAAATTC CCCCCAGTCATTAAATTCA ACCACTTACGGGCCGCCAG ATCAAGGATAGAATTTCTA TTCTAGAAATTTCCTTCCC TCTAGAAATTTCCTTCCCC GAACTGCCCTCCCCCTAAA CAGGAGGGGGCGTCCAGTC CTCAGTGTAAATATGTGGC GGCTTTCACACTTGCCTCA GCCACAGTGAAAATTTGTG AAATATGTGCCTATACCAC AATATGTGCCTATACCACG TTCAAGCTCCAAGAGACAA CTAAGGCTTTCACACTTGC TCCCCCTGACAGGCCGCCA CCCCCTGACAGGCCGCCAG TACCACTGACGGGCCGCCA AATCATCAGAAAACTGAGA GAGACAAACTCTTGAAAAA CGACGGCCCGCCAGGCATT ATATGAGGCTATACCACTG CAAGCCCCAAGAGACAAAC CCCCACTCCCGGGCCGCCA ATATGTGGCTATCCCACTG TATGTGGCTATCCCACTGA AAATGCAAGCTCCAAGAGA AGGATAGAAGTTCTAGAAA AAGAGACAAACCCTTGAAA TTGAAAACTCATCAGAAAA CAAGCTCCAAGAAACAAAC TAGGAGAAAGCAACATGAT TCCCCCTGAGGGGCCGCCA CGGACAGGCCGCCAGTCAT AGGCCCCCAGTCATTAAAT CCTACCCCACTCCCGGGCC CCAGTCATTAAATTCAACC ATAGACTCAAGGGACAAAG TAGACTCAAGGGACAAAGC CTCGGTGTATATATGTGGC TCAGTGTATATATGTGGGT CAGTGTATATATGTGGGTA TAGGTGGCTATTCCTTTGA CCCCAGTCATTAAATTCAA GTTTATATATGTGGCTATC TTTATATATGTGGCTATCC AGACAAACTCTTGAAAAAA GACAAACTCTTGAAAAAAA CACACTGGCCTCAGTGTAT ACACTGGCCTCAGTGTATA GCCAGTCATTAAATTCAAG GCCAGTCATTAAATTCAAA TTAAAGCTTCCACACTTGC ATTAAATTCAAGCTCAAAG AAACTCTTGAAAAAAAGCC TAGGTGAATATAGGTGGGT CCGACGGCCCGCCAGGCAT TGGGGCTATACCACTGACA GCCAGTCATTAAATGCAAG CCAGTCATTAAATGCAAGC CAGTCATTAAATGCAAGCT TTAAATTCAAGCTCAAAGA TAAATTCAAGCTCAAAGAG TTGAAAAATCATCAGAAAA CGGAGAAAGCAACATGATT CCCCTAAAGCCTTCACACT CCCTAAAGCCTTCACACTT CCTAAAGCCTTCACACTTG AAAAGGCAGCCTAGGAGAA AAAGCTTTCACAGTTGACT AAGCTTTCACAGTTGACTC AGCTTTCACAGTTGACTCA TATATGTGGCTATCCCACT TATATGTGACTACACCACC TATATGGGGCTATACCACT AGTCATTAAATGCAAGCTC TCACACTTGCCTCGGTGTA TCAAGCTCAAAGAGACAAA CCTCAGTGTATATTGGTGG AAAAAGCCAGCCTAGGAGA CCACTTACGGGCCGCCAGT CAGTCAATATATGTGACTA CCCCCCAGTCATTAAATTC AACCCACGGACAGGCCGCC CTTCCCCTAAACCTTTCAC CCAGTCTTTACTGGTGCTC TTCACACTTGCCTCAGTGT TCACACTTGCCTCAGTGTA GTCCCACCAGTCATTAAAT ATACCACTGACGGGCCGCC AAGCCCCAAGAGACAAACC CCCCCTGAGGGGCCGCCAG CCCCTGAGGGGCCGCCAGT GCAGCCTCGGAGAAAGCAA TGACGGCCCGCCAGTCATT AGCAGTAAAATGTGTAATT TCTAGAAAGTTCCTCCCCC AACCTTTCACACTGGCCTC ACCTTTCACACTGGCCTCA CCTTTCACACTGGCCTCAG CTCAAAGAGACAAACTCTT GCAGTAAAATGTGTAATTT TTCCCCTAAAGCCTTCACA TCCCCTAAAGCCTTCACAC CCCAGGAGGGGGCGTCCAG CCAGGAGGGGGCGTCCAGT GTGCTCTTCCCACTTCCGG CGTCCCACCAGTCATTAAA GACCCCCAGTCATAAAATT AGGTGAATATAGGTGGGTA TGGCTATACCACTGACGGG CTATACCACTGACAGGCCG AGAAAGAAACATGATTTTT GAATCAAGGATAGAAGTTC AATCAAGGATAGAAGTTCT ATTCAAGCTCCAAGAGACA TTCTAGAAAGTTCATTCCC TCTAGAAAGTTCATTCCCC ACTTGCCTCAGTGTATATA GGGTCTCCAGTCATTAAAT GGTCTCCAGTCATTAAATT CGTCCAGTCATTAAATTCA TTATATTTTGAAAACTCAT TATATTTTGAAAACTCATC GATTTCACACTTGTGTCAT ATTTCACACTTGTGTCATT GGCGTCCCACCAGTCATTA GCGTCCCACCAGTCATTAA AAAGCTTCCACACTTGCCT AAGCTTCCACACTTGCCTC AGAACTGCCCTCCCCCTAA ACCGACGGCCCGCCAGGCA ATAGAATTTCTAGAAATTT ATCATCAGAAAACTGAGAA GAACCCACGGACAGGCCGC TCCCTTAAAGCTTCCACAC TATTTTGAAAACTCATCAG TCAGTGTATATATGTGGCT CAGTGTATATATGTGGCTA TTACGGGCCGCCAGTCATT GGATAGAATTTCTAGAAAT GATAGAATTTCTAGAAATT ATCAAGGATAGAAGTTCTA TCAAGGATAGAAGTTCTAG CAAGGATAGAAGTTCTAGA AAAAAGGCAGCCTAGGCGA TCTAGAAAGTTCCTTCCCC ATATTTTGAAAACTCATCA TGAGTGGGGTCTCCAGTCA ACGGACAGGCCGCCAGTCA CCCGCCAGGCATTAAATTC TGTATATATGGGGCTATAC ACCCCCAGTCATAAAATTC TTCAAGCTCAAAGAGACAA CCCCTTGCCTCAGTGTATA GGATAGAATTTCTAGAAAG CATGTAACAAATGTGATTT GGTGCTCTTCCCACTTCCG AAGGATAGAAGTTCTAGAA CAGTCATAAAATTCAAGCT NNNNNNNANATTNTNANAA TGACTACACCACCGACGGC TACGGGCCGCCAGTCATTA CCCAGTCTTTACTGGTGCT CCTAAGGCTTTCACACTTG AGTAATTGTAAGAACTGCC CCCTCCCCGTAAAGCTTTC
  • 27. 200bp of a human genome! GGTTTTTCTCATAAAATGA TTTTTCTCATAAAATGATT TTCTCATAAAATGGTTTCT TCTCATAAAATGGTTTCTG TCTCATAAAATGGTTTCTA TTTTCTCATAAAATGGTCT TTTCTCATAAAATGGTCTC TTTGTATGTTTCTTAGCTT TTGTATGTTTCTTAGCTTT GTTTCTAAATGTTTCTTAG ATGTTTCTTAGCTTTCAGT TTAGCTTCCAATGGGCAAT TTAGCTTTCAATGGGGAAT TCCAATGGGCAATAAATAA TTCAATGGGCAGTAAATAA TAAATAACTTTTAGTGAAA AAATAACTTTTAGTGAAAT AATAACTTTTAGTGAAATA CAATCTGAGGAAGTCTTTG AATCTGAGGAAGTCTTTGA ATCTGAGGAAGTCTTTGAG GAAGTCTTTGAGATGGAGG AAGTCTTTGAGATGGAGGG TGAGATGGAGGGAAAGCTT GAGATGGAGGGAAAGCTTT CTATGAGGAGTGCATTAGA GAATAGAATCGCTCCAGGA AATAGAATCGCTCCAGGAA TTATGAGGTGACATTTAAA ATGATTCTTAGGTTTCAAT TGATTCTTAGGTTTCAATG GATTCTTAGGTTTCAATGG TTTTCTCATAAAATGATTT TTTCTCATAAAATGATTTC TAGCTTCCAATGGGCAATA AGCTTCCAATGGGCAATAA GCTTCCAATGGGCAATAAA TTTTTTCTCATAAAATGGT TTTCTAAATGTTTCTTAGC TTCTAAATGTTTCTTAGCT TCTAAATGTTTCTTAGCTT TTTTTCTCATAAAATGGTT TTTTCTCATAAAATGGTTT TTTCAATGGGCAATAAATT ACTTTTCGAGATATTGTTG ATGAAGCGTAGGCTATGCT TGAAGCGTAGGCTATGCTG GAAGCGTAGGCTATGCTGC TTTTTGTATGTTTCTTAGC TTTTGTATGTTTCTTAGCT CAATAAATAACTTTTAGGG AATAAATAACTTTTAGGGA ATAAATAACTTTTAGGGAA AATAACTTTTAGGAAAATA ATAACTTTTAGGAAAATAG TAACTTTTAGGAAAATAGA CTGAGATGAAGAGAAGGCT TGAGATGAAGAGAAGGCTT GAGATGAAGAGAAGGCTTT AGCCATTCTGAGGAAGTTT GCCATTCTGAGGAAGTTTT CCATTCTGAGGAAGTTTTT CATTCTGAGGAAGTTTTTG ATAAAATGGTCTCTGAATG TAAAATGGTCTCTGAATGT AAAATGGTCTCTGAATGTT GCTTTGCTTTCTATGAGGA CTTTGCTTTCTATGAGGAG TTTGCTTTCTATGAGGAGT GTTTCTTAGCTTCAATGGG TCAATGGGCAATAAAAAAC CAATGGGCAATAAAAAACT AATGGGCAATAAAAAACTT AATGGGCAGTAAATAACTT ATGGGCAGTAAATAACTTT AACTTTTAGGGAAATAGAT ACTTTTAGGGAAATAGATG CTTTTAGGGAAATAGATGT GGAAGCATCTGAGATGAAG AGTATTTGAGATGAAGAGA AGCTTTCAATGGGGAATAA GCTTTCAATGGGGAATAAA CTTTCAATGGGGAATAAAT GTATTTGAGATGAAGAGAA TATTTGAGATGAAGAGAAG CCAATCTGAGGAAGCATCT CAATCTGAGGAAGCATCTG AATCTGAGGAAGCATCTGA ATTTGAGATGAAGAGAAGG TAGAAGTGAGCCAATCTGA AGAAGTGAGCCAATCTGAG GAAGTGAGCCAATCTGAGG CTATGCTGCCTTTGATGTG TATGCTGCCTTTGATGTGT ATGCTGCCTTTGATGTGTG AACTTTTAGGGAAATAGAA ACTTTTAGGGAAATAGAAG CTTTTAGGGAAATAGAAGT TTTTTGAGATGAAGCGAAG TTTTGAGATGAAGCGAAGG TTTGAGATGAAGCGAAGGC TGTTTTTCTCATAAAATGG GTTTTTCTCATAAAATGGT TTTTTCTCATAAAATGGTC AGTTTTTCTCATAAAATGG TCATAAAATGATTTCTGAA CATAAAATGATTTCTGAAT ATAAAATGATTTCTGAATG AGTCTTTGAGATGGAGGGA GTCTTTGAGATGGAGGGAA TCTTTGAGATGGAGGGAAA TGAAGCGAAGGCTTTGCTG GTCTATGAGGAGAGCATTA TCTATGAGGAGAGCATTAG CTATGAGGAGAGCATTAGA CCAATCTGTGGAAGCATTT CAATCTGTGGAAGCATTTG AATCTGTGGAAGCATTTGA ATAAAATGGTTTTTGTATG TAAAATGGTTTTTGTATGT AAAATGGTTTTTGTATGTT TTCTCATAAATTGGTTTCT TCTCATAAATTGGTTTCTG CTCATAAATTGGTTTCTGA GTGGGCAATAAATAAATTA TGGGCAATAAATAAATTAT AAGCGTAGGCTATGCTGCC TGATTGCCTTTATGAGGTG GATTGCCTTTATGAGGTGA ATTGCCTTTATGAGGTGAC TGGTTTTTGTATGTTTCTT GGTTTTTGTATGTTTCTTA GTTTTTGTATGTTTCTTAG CTAAATGTTTCTTAGCTTT GGGAAAGCTTTGCTGTCTA GGAAAGCTTTGCTGTCTAT GAGAAGGCTGTGCTGTCTA AGAAGGCTGTGCTGTCTAT GAAGGCTGTGCTGTCTATG TGTATGTTTCTTAGCTTTC GTATGTTTCTTAGCTTTCA TATGTTTCTTAGCTTTCAA TTCTTAGCTTCCAATGGGC TCTTAGCTTCCAATGGGCA CTTAGCTTCCAATGGGCAA AGATGAAGCGAAGGCTTTG GATGAAGCGAAGGCTTTGC ATGAAGCGAAGGCTTTGCT GGAAGCATTTGAGATGAAG GAAGCATTTGAGATGAAGA GAAGCATTTGAGATGAAGC AAGCATTTGAGATGAAGAG AAGCATTTGAGATGAAGCG AATAACTTTTAGGGAAATA ATAACTTTTAGGGAAATAG TAACTTTTAGGGAAATAGA TTTGAGATGAAGAGAAGGC TTTGAGATGAAGAGAAGGG CTTTGAGATGGAGGGAAAG TTTGAGATGGAGGGAAAGC TTGAGATGGAGGGAAAGCT TTTCAATGGGGAATAAATA TTCAATGGGGAATAAATAA TCAATGGGGAATAAATAAC CCAATCTGAGGAAGTATCT CTGAGGAAGTATCTGAGAT TGAGGAAGTATCTGAGATG GAGGAAGTATCTGAGATGA AGGAAGTATCTGAGATGAA GGAAGTATCTGAGATGAAG TGCATTAGAATAGAATCGC GCATTAGAATAGAATCGCT CATTAGAATAGAATCGCTC TTCAATGGGCAATAAATAA TCAATGGGCAATAAATAAC CAATGGGCAATAAATAACT GTGAGCTAATCTGAGTAGG TGAGCTAATCTGAGTAGGT GAGCTAATCTGAGTAGGTA AGATGGAGGGAAAGCTTTG GATGGAGGGAAAGCTTTGC ATGGAGGGAAAGCTTTGCT AGATGAAGAGAAGGCTGTG GATGAAGAGAAGGCTGTGC ATGAAGAGAAGGCTGTGCT AGGGAAAGCTTTGCTGTCT GCTTTGCTGTCTATGAGGA CTTTGCTGTCTATGAGGAG TTTGCTGTCTATGAGGAGA TTTGCTGTCTATGAGGAGT TTGAGATGAAGAGAAGGCT TGAGATGAAGAGAAGGCTG GAGATGAAGAGAAGGCTGT AAGAGAAGGCTTTGCTTTC AGAGAAGGCTTTGCTTTCT GAGAAGGCTTTGCTTTCTA GAAAAGGGCACCTGTGTTG AAAAGGGCACCTGTGTTGA AAAGGGCACCTGTGTTGAT AGCGTAGGCTATGCTGCCT TTGCTTTCTATGAGGAGTG TGCTTTCTATGAGGAGTGC GCTTTCTATGAGGAGTGCA TGAATGATTCTTAGGTTTC GAATGATTCTTAGGTTTCA AATGATTCTTAGGTTTCAA AAGAGAAGGCTTTGCTGTC AGAGAAGGCTTTGCTGTCT GAGAAGGCTTTGCTGTCTA TTTCTGAATGTTTCTTAGC TTCTGAATGTTTCTTAGCT CGCCAATCTGTGGAAGCAT GCCAATCTGTGGAAGCATT TATGAGGAGAGCATTAGAA ATGAGGAGAGCATTAGAAT TGAGGAGAGCATTAGAATA GCTGTCTATGAGGAGTGTA CTGTCTATGAGGAGTGTAT TGTCTATGAGGAGTGTATT AGGAGAGCATTAGAATAGA GGAGAGCATTAGAATAGAA GAGAGCATTAGAATAGAAT TGGAGGGAAAGCTTTGCTG GGAGGGAAAGCTTTGCTGT GAGGGAAAGCTTTGCTGTC CAATGGGCAATAAATTACT AATGGGCAATAAATTACTT ATGGGCAATAAATTACTTT AGAGCATTAGAATAGAATC GAGCATTAGAATAGAATCG AGCATTAGAATAGAATCGC TTAGCTTTCAATGGGCAAT TAGCTTTCAATGGGCAATA AGCTTTCAATGGGCAATAA GTGCGCCAATCTGTGGAAG TGCGCCAATCTGTGGAAGC GCGCCAATCTGTGGAAGCA GAGGAGAGCATTAGAATAG TTTTAGGGAAATAGAAGTG GCAATAAATTACTTTTCGA CAATAAATTACTTTTCGAG AATAAATTACTTTTCGAGA GAGCCAATCTGAGGAAGTC AGCCAATCTGAGGAAGTCT GCCAATCTGAGGAAGTCTT AGATGAAGAGAAGGCTTTG GATGAAGAGAAGGCTTTGC ATAGAATCGCTCCAGGAAA TAGAATCGCTCCAGGAAAA AGAATCGCTCCAGGAAAAG GGGCAGTAAATAACTTTTA GGCAGTAAATAACTTTTAG GCAGTAAATAACTTTTAGG AATCTGAGGAAGCATTTGA ATCTGAGGAAGCATTTGAG TCTGAGGAAGCATTTGAGA GGTTTTTCTCATAAAATGG ATGGGCAATAAATAGCTTT TGGGCAATAAATAGCTTTT AAGCATCTGAGATGAAGAG AGCATCTGAGATGAAGAGA GCATCTGAGATGAAGAGAA TTCTTAGCTTTCAATGGGG TCTTAGCTTTCAATGGGGA CTTAGCTTTCAATGGGGAA AGTGCATTAGAATAGAATT GTGCATTAGAATAGAATTG TGCATTAGAATAGAATTGC AAAGGTCACCTGTGTTGAT AAGGTCACCTGTGTTGATT AGGTCACCTGTGTTGATTG ATCGCTCCAGGAAAAGGGC TCGCTCCAGGAAAAGGGCA CGCTCCAGGAAAAGGGCAC TAGATGTGAGCTAATCTGA AGATGTGAGCTAATCTGAG GATGTGAGCTAATCTGAGT CCAGGAAAAGGGCACCTGT CAGGAAAAGGGCACCTGTG AGGAAAAGGGCACCTGTGT TAAATAACTTTTAGGAAAA AAATAACTTTTAGGAAAAT GGAAAAGGTCACCTGTGTT GAAAAGGTCACCTGTGTTG AAAAGGTCACCTGTGTTGA TCATAAATTGGTTTCTGAA CATAAATTGGTTTCTGAAT ATAAATTGGTTTCTGAATG GTATTAGAATAGAATCGCT TATTAGAATAGAATCGCTC ATTAGAATAGAATCGCTCC GAGATGAAGAGAAGGGTTT AGATGAAGAGAAGGGTTTG GATGAAGAGAAGGGTTTGC ATCTGAGGAAGTATTTGAG TCTGAGGAAGTATTTGAGA CTGAGGAAGTATTTGAGAT GCTGTGCTGTCTATGAGGA CTGTGCTGTCTATGAGGAG TGTGCTGTCTATGAGGAGT AGAATTGCTCCAGGAAAAG GAATTGCTCCAGGAAAAGG AATTGCTCCAGGAAAAGGT AAGTTTTTGAGATGAAGCG AGTTTTTGAGATGAAGCGA GTTTTTGAGATGAAGCGAA AATAGAAGTGAGCCAATCT ATAGAAGTGAGCCAATCTG CTCATAAAATGGTTTCTGA TCATAAAATGGTTTCTGAA CATAAAATGGTTTCTGAAT AATAGAATTGCTCCAGGAA ATAGAATTGCTCCAGGAAA TAGAATTGCTCCAGGAAAA CCAATGGGCAATAAATAAC AATGGGCAATAAATAACTT TAGCTTTCAATGGGGAATA TTGCTGTCTATGAGGAGAG TGCTGTCTATGAGGAGAGC TTTCTCATAAAATGGTTTC TTTCTCATAAAATGGTTTT ATTGCTCCAGGAAAAGGTC TTGCTCCAGGAAAAGGTCA TGCTCCAGGAAAAGGTCAC CTGAATGTTTCTTAGCTTT TGAATGTTTCTTAGCTTTC GAATGTTTCTTAGCTTTCA GCTTTCAATGGGCAATAAA CTTTCAATGGGCAATAAAT TCTCATAAAATGGTCTCTG CTCATAAAATGGTCTCTGA TCATAAAATGGTCTCTGAA GTTTCTGAATGATTCTTAG TTTCTGAATGATTCTTAGG TTCTGAATGATTCTTAGGT CAGGAAAAGGTAACGTGAG AGGAAAAGGTAACGTGAGG GGAAAAGGTAACGTGAGGT CTTCAATGGGCAATAAAAA TTCAATGGGCAATAAAAAA TGTTTCTTAGCTTTCAATG GTTTCTTAGCTTTCAATGG TTTCTTAGCTTTCAATGGG GGGCAATAAATTACTTTTC GGCAATAAATTACTTTTCG CTTGCAATGGGCAATAAAT TTGCAATGGGCAATAAATA TGCAATGGGCAATAAATAA CAATGGGGAATAAATAACT AATGGGGAATAAATAACTT GCGAAGGCTTTGCTGTCTA CGAAGGCTTTGCTGTCTAT GAAGGCTTTGCTGTCTATG TTTAGGGAAATAGATGTGA TTAGGGAAATAGATGTGAG TAGGGAAATAGATGTGAGC GAATGTTTCTTAGCTTCCA AATGTTTCTTAGCTTCCAA ATGTTTCTTAGCTTCCAAT TCTGAATGATTCTTAGGTT TTCAGTGGGCAATAAATAA TCAGTGGGCAATAAATAAA CAGTGGGCAATAAATAAAT GAAGAGAAGGCTTTGCTTT GCTGTCTATGAGGAGTGCA CTGTCTATGAGGAGTGCAT TGTCTATGAGGAGTGCATT AACTTTTAGGAAAATAGAT ACTTTTAGGAAAATAGATG AAGGCTTTGCTGTCTATGA TAGCTTTCAATGGGCAGTA AGCTTTCAATGGGCAGTAA GCTTTCAATGGGCAGTAAA AGAGAAGGCTGTGCTGTCT CCTGTGTTGATTGCCTTTA CTGTGTTGATTGCCTTTAT TGTGTTGATTGCCTTTATG TGAGGAAGTATTTGAGATG GAGGAAGTATTTGAGATGA GGAATAAATAACTTTTACG GAATAAATAACTTTTACGG AATAAATAACTTTTACGGA AAACTTTTAGGGAAATAGA TAGAATAGAATTGCTCCAG AGAATAGAATTGCTCCAGG GAATAGAATTGCTCCAGGA ATGGTTTCTGAATGTTTCT TGGTTTCTGAATGTTTCTT GGTTTCTGAATGTTTCTTA TTTTCTCATAAATTGGTTT TTTCTCATAAATTGGTTTC TATGAGGAGTGCATTAGAA ATGAGGAGTGCATTAGAAT TGAGGAGTGCATTAGAATA TTCTCATAAAATGATTTCT CCAATCTGAGGAAGTCTTT AGCTAATCTGAGTAGGTAT CAATCTGAGGAAGTATCTG AATCTGAGGAAGTATCTGA ATCTGAGGAAGTATCTGAG TGTGAGCCATTCTGAGGAA GTGAGCCATTCTGAGGAAG TGAGCCATTCTGAGGAAGT AAAACTTTTAGGGAAATAG ATGAGGAGTGTATTAGAAT TGAGGAGTGTATTAGAATA GAGGAGTGTATTAGAATAG GGTTTTTCTCATAAATTGG GTTTTTCTCATAAATTGGT AAATGTTTCTTAGCTTTCA AATGTTTCTTAGCTTTCAA ATGTTTCTTAGCTTTCAAT TTGAGATGAAGCGTAGGCT TGAGATGAAGCGTAGGCTA GAGATGAAGCGTAGGCTAT TATGAGGAGTGTATTAGAA TGGGGAATAAATAACTTTT GGGGAATAAATAACTTTTA GGGAATAAATAACTTTTAC GGTTTCTGAATGATTCTTA CGCTCCAGGAAAAGGTCAC GCTCCAGGAAAAGGTCACC CTCCAGGAAAAGGTCACCT GTCACCTGTGTTGATTGCC TCACCTGTGTTGATTGCCT CACCTGTGTTGATTGCCTT GTGTTGATTGCCTTTATGA AGAAGGCTTTGCTTTCTAT GAAGGCTTTGCTTTCTATG AAGGCTTTGCTTTCTATGA GAGATGAAGCGAAGGCTTT TCTGAGGAAGTATCTGAGA TGAGGAAGCATCTGAGATG GAGGAAGCATCTGAGATGA AGGAAGCATCTGAGATGAA GCTCCAGGAAAAGGGCACC TCGCTCCAGGAAAAGGTCA TCAATGGGCAGTAAATAAC CAATGGGCAGTAAATAACT ATTCTTAGGTTTCAATGGG TTCTTAGGTTTCAATGGGC TCTTAGGTTTCAATGGGCA CTGTCTATGAGGAGAGCAT TGTCTATGAGGAGAGCATT AGCCAATCTGAGGAAGCAT GCCAATCTGAGGAAGCATC GCCAATCTGAGGAAGCATT CCAATCTGAGGAAGCATTT TTTCTATGAGGAGTGCATT TTCTATGAGGAGTGCATTA TCTATGAGGAGTGCATTAG AAGGCTGTGCTGTCTATGA GAAGCATCTGAGATGAAGA ATGAAGAGAAGGGTTTGCT TGAAGAGAAGGGTTTGCTG GAAATAGATGTGAGCCAAT AAATAGATGTGAGCCAATC AATAGATGTGAGCCAATCT TTAGAATAGAATCGCTCCA TTCTCATAAAATGGTTTTT TCTCATAAAATGGTTTTTG CTCATAAAATGGTTTTTGT CTCCAGGAAAAGGTAACGT TCCAGGAAAAGGTAACGTG CCAGGAAAAGGTAACGTGA TTGATTGCCTTTATGAGGT GAAGTATCTGAGATGAAGA AGGAAAAGGTCACCTGTGT ACCTGTGTTGATTGCCTTT AGGAGTGTATTAGAATAGA GGAGTGTATTAGAATAGAA GAGTGTATTAGAATAGAAT TAAATGTTTCTTAGCTTTC TCTTAGCTTCAATGGGCAA CTTAGCTTCAATGGGCAAT TTAGCTTCAATGGGCAATA AAGGGCACCTGTGTTGATT TCTGTGGAAGCATTTGAGA CTGTGGAAGCATTTGAGAT TGTGGAAGCATTTGAGATG TTGGTTTCTGAATGATTCT TGGTTTCTGAATGATTCTT AATGGTTTCTAAATGTTTC ATGGTTTCTAAATGTTTCT TGGTTTCTAAATGTTTCTT TTGAGATGAAGCGAAGGCT TGAGATGAAGCGAAGGCTT ATCTGAGGAAGCATCTGAG TCTGAGGAAGCATCTGAGA CTCCAGGAAAAGGGCACCT TAACTTTTACGGAAATAGA AACTTTTACGGAAATAGAT ACTTTTACGGAAATAGATG AGCATTTGAGATGAAGAGA GCATTTGAGATGAAGAGAA TAGGTTTCAATGGGCATTA AGGTTTCAATGGGCATTAA GGTTTCAATGGGCATTAAA ATAAATTACTTTTCGAGAT GGAAAATAGATGTGAGCCA GAAAATAGATGTGAGCCAA AAAATAGATGTGAGCCAAT AGGCTATGCTGCCTTTGAT GGCTATGCTGCCTTTGATG GCTATGCTGCCTTTGATGT AGGCTTTGCTGTCTATGAG GGCTTTGCTGTCTATGAGG TTTCGAGATATTGTTGTGC TTCGAGATATTGTTGTGCG TCGAGATATTGTTGTGCGC AGAAGGCTTTGCTGTCTAT CTGAATGTTTCTTAGCTTC TGAATGTTTCTTAGCTTCC GTAGGCTATGCTGCCTTTG TAGGCTATGCTGCCTTTGA GCCTTTATGAGGTGACATT CCTTTATGAGGTGACATTT CTTTATGAGGTGACATTTA TTTCAATGGGCAATAAATA TTTTTCTCATAAATTGGTT TTAGCTTTCAATGGGCAGT TCAATGGGCAATAAATAGC CAATGGGCAATAAATAGCT AATGGGCAATAAATAGCTT AAATGGTTTTTGTATGTTT AATGTTTCTTAGCTTTCAG AATCGCTCCAGGAAAAGGT ATCGCTCCAGGAAAAGGTA ATCGCTCCAGGAAAAGGTC TCGCTCCAGGAAAAGGTAA TCGCTCCAGGAAAAGGTCC TAAAAAACTTTTAGGGAAA AAAAAACTTTTAGGGAAAT AAAAACTTTTAGGGAAATA GAATCGCTCCAGGAAAAGG ATTGTTGTGCGCCAATCTG TTGTTGTGCGCCAATCTGT TGTTGTGCGCCAATCTGTG CAGGAAAAGGTCACCTGTG AGGGCACCTGTGTTGATTG GGGCACCTGTGTTGATTGC GGCACCTGTGTTGATTGCC TTTCTTAGCTTCAATGGGC TTCTTAGCTTCAATGGGCA CTCATAAAATGGTTTCTAA TCATAAAATGGTTTCTAAA CATAAAATGGTTTCTAAAT ATGTGAGCTAATCTGAGTA GTCTATGAGGAGTGCATTA TCTCATAAAATGATTTCTG CTCATAAAATGATTTCTGA CATTTGAGATGAAGAGAAG GTTGTGCGCCAATCTGTGG TTGTGCGCCAATCTGTGGA ATGAAGAGAAGGCTTTGCT TAAAATGGTTTCTAAATGT AAAATGGTTTCTAAATGTT AAATGGTTTCTAAATGTTT TTCTTAGCTTTCAGTGGGC TCTTAGCTTTCAGTGGGCA CTTAGCTTTCAGTGGGCAA ATAAATAACTTTTACGGAA TAAATAACTTTTACGGAAA CTGAGGAAGTCTTTGAGAT TGAGGAAGTCTTTGAGATG GAGGAAGTCTTTGAGATGG CTTTCTATGAGGAGTGCAT AATCGCTCCAGGAAAAGGG GTGGAAGCATTTGAGATGA TGGAAGCATTTGAGATGAA TTGAGATGAAGAGAAGGGT TGAGATGAAGAGAAGGGTT GCATTTGAGATGAAGCGTA CATTTGAGATGAAGCGTAG ATTTGAGATGAAGCGTAGG AGGCTTTGCTTTCTATGAG GGCTTTGCTTTCTATGAGG TAAATAACTTTTAGGGAAA AAATAACTTTTAGGGAAAT TCAATGGGCAATAAATTAC TAGGGAAATAGAAGTGAGC AGGGAAATAGAAGTGAGCC GGGAAATAGAAGTGAGCCA CTGAGGAAGCATCTGAGAT ATAAAATGGTTTCTAAATG AATAACTTTTACGGAAATA ATAACTTTTACGGAAATAG TCTGAGTAGGTATTTGAGA CTGAGTAGGTATTTGAGAT TGAGTAGGTATTTGAGATG TCCAGGAAAAGGTCACCTG TCATAAAATGGTTTTTGTA TGCTGCCTTTGATGTGTGC GCTGCCTTTGATGTGTGCT TGTTTCTTAGCTTCCAATG GTTTCTTAGCTTCCAATGG GCAATGGGCAATAAATAAC AATTACTTTTCGAGATATT ATTACTTTTCGAGATATTG TTACTTTTCGAGATATTGT ATGGGCAATAAAAAACTTT TGGGCAATAAAAAACTTTT CAATCTGAGGAAGCATTTG CAATAAATAACTTTTAGGA AATAAATAACTTTTAGGAA ATAAATAACTTTTAGGAAA AAATTGGTTTCTGAATGAT AATTGGTTTCTGAATGATT ATTGGTTTCTGAATGATTC TTTATGAGGTGACATTTAA AGGGAAATAGATGTGAGCC AGGGAAATAGATGTGAGCT GGGAAATAGATGTGAGCCA GGGAAATAGATGTGAGCTA GTTTTTCTCATAAAATGAT CTTCCAATGGGCAATAAAT TTCCAATGGGCAATAAATA AGGAGTGCATTAGAATAGA GGAGTGCATTAGAATAGAA GAGTGCATTAGAATAGAAT CCAATCTGAGGAAGTATTT CAATCTGAGGAAGTATTTG AATCTGAGGAAGTATTTGA CAGTAAATAACTTTTAGGG AGTAAATAACTTTTAGGGA TTCTGAGGAAGTTTTTGAG TCTGAGGAAGTTTTTGAGA CTGAGGAAGTTTTTGAGAT TTTTAGGGAAATAGATGTG TTCAATGGGCAATAAATAG AATAAATAACTTTTAGTGA ATAAATAACTTTTAGTGAA TTGCCTTTATGAGGTGACA TGCCTTTATGAGGTGACAT AGTGGGCAATAAATAAATT AATGTTTCTTAGCTTCAAT ATGTTTCTTAGCTTCAATG TGTTTCTTAGCTTCAATGG GTAAATAACTTTTAGGGAA ATTCTGAGGAAGTTTTTGA TGGGCAGTAAATAACTTTT CGCTCCAGGAAAAGGTAAC GCTCCAGGAAAAGGTAACG AGGCTGTGCTGTCTATGAG GGCTGTGCTGTCTATGAGG GCGTAGGCTATGCTGCCTT CGTAGGCTATGCTGCCTTT CCAGGAAAAGGTCACCTGT CTTAGGTTTCAATGGGCAT TTAGGTTTCAATGGGCATT TTCAATGGGCAATAAATTA TGGGCAATAAATTACTTTT TGAGCCAATCTGAGGAAGC GAGCCAATCTGAGGAAGCA ATGGGGAATAAATAACTTT TGTGAGCTAATCTGAGTAG AGGTATTTGAGATGAAGAG GGTATTTGAGATGAAGAGA GCTGTCTATGAGGAGAGCA GCATTAGAATAGAATTGCT CATTAGAATAGAATTGCTC ATTAGAATAGAATTGCTCC AGTGAGCCAATCTGAGGAA GTGAGCCAATCTGAGGAAG TGAGCCAATCTGAGGAAGT TCTATGAGGAGTGTATTAG CTATGAGGAGTGTATTAGA GGAAATAGAAGTGAGCCAA GTTTCTGAATGTTTCTTAG CATCTGAGATGAAGAGAAG TAAATTACTTTTCGAGATA AAATTACTTTTCGAGATAT GGAAAAGGGCACCTGTGTT AAGTGAGCCAATCTGAGGA TTCTTAGCTTTCAATGGGC TCTTAGCTTTCAATGGGCA CTTAGCTTTCAATGGGCAA AGCTTTCAGTGGGCAATAA GCTTTCAGTGGGCAATAAA CTTTCAGTGGGCAATAAAT TGTATTAGAATAGAATCGC AAATAACTTTTACGGAAAT AGATGAAGCGTAGGCTATG GATGAAGCGTAGGCTATGC GAAATAGAAGTGAGCCAAT AAATAGAAGTGAGCCAATC TAAATTGGTTTCTGAATGA GAGTAGGTATTTGAGATGA AGTAGGTATTTGAGATGAA TTTCTTAGCTTCCAATGGG TGAAGAGAAGGCTGTGCTG TCTGAATGTTTCTTAGCTT TTAGCTTTCAGTGGGCAAT TAGCTTTCAGTGGGCAATA TGTGAGCCAATCTGAGGAA TGAGGAAGTTTTTGAGATG GAGGAAGTTTTTGAGATGA AATGGTTTTTGTATGTTTC TTTAGGAAAATAGATGTGA TTAGGAAAATAGATGTGAG TAGGAAAATAGATGTGAGC GTTTTTTCTCATAAAATGG TAGCTTCAATGGGCAATAA AGCTTCAATGGGCAATAAA GCTTCAATGGGCAATAAAA AGGAAGTCTTTGAGATGGA GGAAGTCTTTGAGATGGAG GTGTATTAGAATAGAATCG GAAGTTTTTGAGATGAAGC GTCTATGAGGAGTGTATTA AGCATTTGAGATGAAGCGT TAATCTGAGTAGGTATTTG AATCTGAGTAGGTATTTGA ATCTGAGTAGGTATTTGAG AGATGTGAGCCAATCTGAG GATGTGAGCCAATCTGAGG ATGTGAGCCAATCTGAGGA TGTGCGCCAATCTGTGGAA ATCTGAGATGAAGAGAAGG TCTGAGATGAAGAGAAGGC GAAGAGAAGGGTTTGCTGT ATGGGCAATAAATAACTTT GGAAGTATTTGAGATGAAG GAAGTATTTGAGATGAAGA AAGTATTTGAGATGAAGAG GAGCCATTCTGAGGAAGTT GCCAATCTGAGGAAGTATT GCTAATCTGAGTAGGTATT TTTCAGTGGGCAATAAATA AGGAAGTATTTGAGATGAA TAGGTATTTGAGATGAAGA GGGCAATAAATAACTTTTA GGCAATAAATAACTTTTAG GCAATAAATAACTTTTAGT GCAATAAATAACTTTTAGG CTTTCAATGGGCAGTAAAT GGAAATAGATGTGAGCCAA TTTTAGGAAAATAGATGTG GTTGATTGCCTTTATGAGG AAGAGAAGGGTTTGCTGTC AGAGAAGGGTTTGCTGTCT CGAGATATTGTTGTGCGCC GAGATATTGTTGTGCGCCA AGATATTGTTGTGCGCCAA GAGGAAGCATTTGAGATGA AGGAAGCATTTGAGATGAA TTGCTGTCTATGAGGAGTG TGCTGTCTATGAGGAGTGC TGCTGTCTATGAGGAGTGT TCCAGGAAAAGGGCACCTG TTTCTTAGCTTTCAGTGGG ATCTGTGGAAGCATTTGAG GGAAATAGATGTGAGCTAA GAAATAGATGTGAGCTAAT ATAAAATGGTTTCTGAATG GAAGCGAAGGCTTTGCTGT TGGGCAATAAATAACTTTT AAATGGTTTCTGAATGTTT AATGGTTTCTGAATGTTTC AGGAAGTTTTTGAGATGAA GGAAGTTTTTGAGATGAAG TTCTCATAAAATGGTCTCT TTTGAGATGAAGCGTAGGC AGTGCATTAGAATAGAATC TTTAGGGAAATAGAAGTGA TTAGGGAAATAGAAGTGAG AGTGTATTAGAATAGAATC GGCAATAAAAAACTTTTAG GCAATAAAAAACTTTTAGG CAATAAAAAACTTTTAGGG ATAGATGTGAGCTAATCTG CTGAATGATTCTTAGGTTT GTTTCTTAGCTTTCAGTGG CATAAAATGGTCTCTGAAT ATAAAAAACTTTTAGGGAA TACTTTTCGAGATATTGTT TGTTGATTGCCTTTATGAG CATAAAATGGTTTTTGTAT AAGAGAAGGCTGTGCTGTC GGGCAATAAAAAACTTTTA AATAGATGTGAGCTAATCT GTAGGTATTTGAGATGAAG AGGAAAATAGATGTGAGCC CTGAGGAAGCATTTGAGAT TGAGGAAGCATTTGAGATG AGCGAAGGCTTTGCTGTCT CTAATCTGAGTAGGTATTT CTTTTAGGAAAATAGATGT TTAGAATAGAATTGCTCCA GTGCATTAGAATAGAATCG ATATTGTTGTGCGCCAATC TATTGTTGTGCGCCAATCT TCTGAGGAAGTCTTTGAGA GAAGAGAAGGCTGTGCTGT AGCCAATCTGAGGAAGTAT GCCAATCTGAGGAAGTATC GATATTGTTGTGCGCCAAT TAAAATGGTTTCTGAATGT AAAATGGTTTCTGAATGTT CAATAAATAACTTTTAGTG GGTTTCTAAATGTTTCTTA TAGAATAGAATCGCTCCAG ATGGTTTTTGTATGTTTCT GAGGAGTGCATTAGAATAG AATAAAAAACTTTTAGGGA TTTTCGAGATATTGTTGTG TGTTTCTTAGCTTTCAGTG GAAGAGAAGGCTTTGCTGT TTTCAATGGGCAGTAAATA GGTCACCTGTGTTGATTGC ATAGATGTGAGCCAATCTG TAGATGTGAGCCAATCTGA CTGCCTTTGATGTGTGCTT GAGCCAATCTGAGGAAGTA AAGCGAAGGCTTTGCTGTC AAATAGATGTGAGCTAATC GCACCTGTGTTGATTGCCT GTTTCAATGGGCATTAAAT AGAATAGAATCGCTCCAGG CTTTTCGAGATATTGTTGT TGAAGAGAAGGCTTTGCTG TGAAGAGAAGGCTTTGCTT TGCCTTTGATGTGTGCTTT There are over 3000 20-mers,! and over 30 valid paths!
  • 28. Help‽ What Can We Do? • For some errors, we can inspect the de Brujin graph directly, and eliminate edges from the graph • More generally, we can look at the distribution of k-mers, and try to make corrections to the reads
  • 29. Trimming Spurs • Since errors are at the ends of reads, we see spurious branches off of the graph • Use heuristics to determine whether we can remove these nodes • E.g., if these nodes are only present in 1 read, probably OK
  • 30. The k-mer Spectrum • If we look at the frequencies of k-mers, we see something interesting…
  • 31. What Is This Spike?
  • 32. Those Are Our Errors! • Errors create low-frequency substrings • We can identify errors with a mixture model: • Mixture of poissons • Distribution with lowest mean —> errors • From here, we can remove those “erroneous” strings, and pick likely replacements
  • 33. How Do We Define Likely? • Can use edit distance of replacement as a heuristic • Can define a probabilistic measure for the quality of a replacement:
  • 34. Dealing With Repeats • A cycle in a de Brujin graph is caused by repeated sequence • In real genomes, there is a lot of repetition: • Structural variation —> duplicated sequences • Transposons/Mobile Elements • Centromeres and Telomeres
  • 35. Increased k-mer Length ACA CAC ACT GCA TGC CTG ACACTGCACT ACACT CACTG ACTGC GCACT TGCAC CTGCA • If we have a sequence which is less than b bases long, we can resolve the repeat by using k-mers with k > b
  • 36. Scaffolding It was the best of times, it was the worst of times… the best of best of times was the worst It was the worst of times times, it was • Current sequencing technology gives us paired reads, with approximately known distance between reads
  • 37. Scaffolding • We can use this to estimate repeat sizes: • Or, to estimate the size of gaps: smaller! bigger!
  • 38. How About Large Repeats? Twitter, @infoecho, 9/12/2014
  • 39. Long Reads To The Rescue!
  • 40. Opportunities • New read technologies are available • Provide much longer reads (250bp vs. >10kbp) • Different error model… (15% INDEL errors, vs. 2% SNP errors) • Generally, lower sequence specific bias • But, need to improve OLC assembler performance! Left: PacBio homepage, Right: Wired, http://www.wired.com/2012/03/oxford-nanopore-sequencing-usb/
  • 41. Can we turn an expensive, serial problem into a cheap, parallel problem?
  • 42. Fast Overlapping with MinHashing • Wonderful realization by Berlin et al1: overlapping is similar to document similarity problem • Use MinHashing to approximate similarity: 1: Berlin et al, bioRxiv 2014 Per document/read, compute signature:! ! 1. Cut into shingles 2. Apply random hashes to shingles 3. Take min over all random hashes Hash into buckets:! ! Signatures of length l can be hashed into b buckets, so we expect to compare all elements with similarity ≥ (1/b)^(b/l) Compare:! ! For two documents with signatures of length l, Jaccard similarity is estimated by (# equal hashes) / l ! Can reduce complexity from O(n2) to O(nb)!
  • 43. MapReduce • Intuition: if we have a data parallel algorithm, we can run the algorithm across many computers • Many popular systems: • MapReduce at Google • Hadoop • (from Berkeley!) • Provide special programming models for graphs…
  • 44. MinHash On MR Per document/read, compute signature:! ! 1. Cut into shingles 2. Apply random hashes to shingles 3. Take min over all random hashes Hash into buckets:! ! Signatures of length l can be hashed into b buckets, so we expect to compare all elements with similarity ≥ (1/b)^(b/l) Compare:! ! For two documents with signatures of length l, Jaccard similarity is estimated by (# equal hashes) / l ! map groupBy map + filter
  • 45. Transitive Reduction • We can find a consensus between clique members • Or, we can reduce down: • Can be implemented efficiently using graph-optimized MapReduce libraries!