2015 pag-chicken

C. Titus Brown
Associate Professor
School of Veterinary Medicine
UC Davis
Jan 2015
Adventures in improving the chicken genome &
transcriptome

Current state of chicken genome
● galGal2 (2004)
o Sanger sequencing (6.6X)
o Physical and genetic linkage maps
● galGal3 (2006)
o 198K additional reads
 Contigs end
 Regions of poor quality
o SNP mapping
o chrZ and chrW
● galGal4 (2011)
o 454 (12X)
o - 10Mb artifactual duplications
o +15Mb mapped to chromosomes
o increases in N50 contig size

2. Microchromosomes...
● 10 macrochromosomes
● 28 microchromosomes
o GC rich
o high recombination rate
o high gene density
o low intron size
● not sequencing friendly!

Moleculo vs PacBio
Moleculo
● Cheaper
o High throughput
● Low error rate
o ~0%
● Same problems as Illumina…
PacBio
● No 3' bias
● No PCR
● High error rate
o ~15%
● Lower throughput
● "$$-plated genome"

Moleculo library preparation
Kuleshov et al (2014), Nature Biotechnology 32, 261–266

Exploring Moleculo
● 1,578,022 reads
● Covers 88% of galGal4
● 326 reads unmapped to galGal4 (0.02%)
o Searched 5 random in ENA (exonerate)
o 3 matched Sediminibacterium sp...
Luiz Irber

Long reads, indeed!
Luiz Irber

Moleculo: fraction of reference
covered
Luiz Irber

But Moleculo does not contain
missing genes… ;(
Search for de novo-assembled UniProt orthologs
from chicken in (a) galGal4 genome, and (b)
Moleculo data.
Luiz Irber

Moleculo data. Might be in
PacBio.
So, now working with PacBio.
● Dealing with PacBio data
o Most tools break horribly
 (It's getting better)
● Assembling PacBio data
o High error rate (~15%)
o Most assemblers target short reads
o PacBio recommended assemblers interact poorly
with MSU HPCC
Would like to produce a step-by-step protocol to
do genome improvement or assembly with
PacBio… Luiz Irber

2) Evaluating effects of gene models
on pathway prediction
Likit Preeyanon
Vertically integrated comparison.

GIMME: Software for Merging Gene Models
Assembly-
based
Local
Assembly
GIMME
Reference
-guided
Merged
Models
In-house software
ENSEMBL
Cufflinks can incorporate
ENSEMBL

Exon Graph approach (“Gimme”)
intron1 intron2exon1
exon2 exons2
exon3
exon1 exon2 exon3
Exon3.bExon3.a
Likit Preeyanonhttps://github.com/ged-lab/gimme.git

Ensembl Enriched KEGG Pathway
Term Count Benjamin
Cytokine-cytokine receptor interaction 36 6.2E-02
Lysosome 25 1.2E-01
Apoptosis 19 3.5E-01
Arginine and proline metabolism 12 3.1E-01
Starch and sucrose metabolism 9 3.4E-01
Toll-like receptor signaling pathway 19 3.7E-01
Natural killer cell mediated cytotoxicity 17 3.4E-01
Cytosolic DNA-sensing pathway 9 4.2E-01
Valine, leucine and isoleucine degradation 11 4.1E-01
Glutathione metabolism 10 4.3E-01
NOD-line receptor signaling pathway 11 4.6E-01
Intestinal immune network for IgA production 9 5.6E-01
VEGF signaling pathway 14 5.6E-01
PPAR signaling pathway 13 6E-01

Gimme Enriched KEGG Pathway
Term Count Benjamin
Cytokine-cytokine receptor interaction 34 3.7E-02
Toll-like receptor signaling pathway 22 2.7E-02
Jak-STAT signaling pathway 28 3.4E-02
Arginine and proline metabolism 13 4.5E-02
Lysosome 22 1.3E-01
Natural killer cell mediated cytotoxicity 17 1.6E-01
Alanine, aspartate and glutamate metabolism 9 1.8E-01
Amino sugar and nucleotide sugar metabolism 10 3.6E-01
Cysteine and methionine metabolism 9 4E-01
ECM-receptor interaction 16 3.7E-01
Apoptosis 16 3.7E-01
Glycosis / Gluconeogenesis 11 4E-01
DNA replication 8 3.8E-01
Cell adhesion molecules (CAMs) 19 4.6E-01
PPAR signaling pathway 12 6E-01
Intestinal immune network for IgA production 8 6.1E-01

Compared Enriched KEGG Pathway
Term
Cytokine-cytokine receptor interaction
Toll-like receptor signaling pathway
Lysosome
Apoptosis
Arginine and proline metabolism
Natural killer cells
Intestinal immune network for IgA production
PPAR signaling pathway
Starch and sucrose
Valine, leucine and isoleucine degradation
Glutathione metabolism
NOD-like receptor signaling pathway
VEGF signaling pathway
Jak-STAT signaling pathway
Alanine, aspartate and glutamate metabolism
Amino sugar and nucleotide sugar metabolism
ECM-receptor interaction
Cell adhesion molecules (CAMs)
DNA replication
Common
Ensembl
Gimme

INFB – we annotate UTR not
present in other gene models.

INFB – 3’ bias + missing UTR =>
insensitive

Predicted Enriched Pathways
GOseq FDR 0.05
20 pathways
17 pathways

GOseq FDR 0.05
Chicken + Human
KEGG Pathway
40 pathways

RNAseq: your models matter
Our methods for generating hypotheses from mRNAseq
data are sensitive to references & technical details of the
approaches.
(This is expected but Bad.)
More RNAseq data coming every day.
…but we are not regularly updating gene models…
… and the genome that we have is Not Great.
 Follow on Smith & Burt (2014) to continually regenerate
gene models for differential expression use.
 A general model for vet/ag animals?

Thanks!
Please contact me at ctbrown@ucdavis.edu!

2015 pag-chicken

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 2015 pag-chicken

Similar to 2015 pag-chicken (20)

More from c.titus.brown

More from c.titus.brown (20)

Recently uploaded

Recently uploaded (20)

2015 pag-chicken

Editor's Notes