This document provides an introduction to metagenomics. It defines metagenomics as the study of microbial communities directly in their natural environments using modern genomics techniques. The document outlines the historical context and basic purpose of metagenomics. It describes some of the applications of metagenomics, such as understanding the human microbiome, bioremediation, bioenergy production, and smart farming. Finally, it introduces some basic concepts in metagenomics analysis including binning, OTUs, alpha and beta diversity measurements, and challenges around estimating diversity from samples.
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformatics for Biological Researchers Course - CSIC, Blanes)
1. Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Bioinformatics for
Biological Researchers
http://eib.stat.ub.edu/2014BBR
Ferran Briansó
ferran.brianso@vhir.org
28/05/2014
INTRODUCTION TO METAGENOMICSINTRODUCTION TO METAGENOMICS
4. Introduction | Metagenomics definition1
4
First use of the term metagenome, referencing the idea that a collection of
genes sequenced from the environment could be analyzed in a way analogous
to the study of a single genome.
Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998).
"Molecular biological access to the chemistry of unknown soil microbes: A new
frontier for natural products".
Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9.
PMID 9818143
5. 1
First use of the term metagenome, referencing the idea that a collection of
genes sequenced from the environment could be analyzed in a way analogous
to the study of a single genome.
Handelsman, J.; Rondon, M. R.; Brady, S. F.; Clardy, J.; Goodman, R. M. (1998).
"Molecular biological access to the chemistry of unknown soil microbes: A new
frontier for natural products".
Chemistry & Biology 5 (10): R245–R249. doi:10.1016/S1074-5521(98)90108-9.
PMID 9818143
Chen, K.; Pachter, L. (2005).
"Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities".
PLoS Computational Biology 1 (2): e24. doi:10.1371/journal.pcbi.0010024
Current definition:
“The application of modern genomics techniques to the
study of communities of microbial organisms directly in
their natural environments, bypassing the need for
isolation and lab cultivation of individual species.”
5
Introduction | Metagenomics definition
11. 2
11
Applications | What metagenomics can do
● Global Impacts. The role of microbes is critical in maintaining atmospheric
balances, as they are
● the main photosynthetic agents
● responsible for the generation and consumption of greenhouse
gases
● involved at all levels in ecosystems and trophic chains
12. 2
12
Applications | What metagenomics can do
● Global Impacts. The role of microbes is critical in maintaining atmospheric
balances, as they are
● the main photosynthetic agents
● responsible for the generation and consumption of greenhouse
gases
● involved at all levels in ecosystems and trophic chains
● Bioremediation. Cleaning up environmental contamination, such as
● the waste from water treatment facilities
● gasoline leaks on lands or oil spills in the oceans
● toxic chemicals
13. 2
13
Applications | What metagenomics can do
● Bioenergy. We are harnessing microbial power in order to produce
● ethanol (from cellulose), hydrogen, methane, butanol...
● Smart Farming. Microbes help our crops by
● the “supressive soil” phenomenon
(buffer effect against disease-causing organisms)
● soil enrichment and regeneration
14. 2
14
Applications | What metagenomics can do
● Bioenergy. We are harnessing microbial power in order to produce
● ethanol (from cellulose), hydrogen, methane, butanol...
● Smart Farming. Microbes help our crops by
● the “supressive soil” phenomenon
(buffer effect against disease-causing organisms)
● soil enrichment and regeneration
● The World Within. Studying the human microbiome may lead
to valuable new tools and guidelines in
● human and animal nutrition
● better understanding of complex diseases
(obesity, cancer, asthma...)
● drug discovery
● preventative medicine
Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome,
Annu. Rev. Genomics Human Genet. 13, 151-170
17. 3
17
Concepts | Trimming
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
18. 18
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
3 Concepts | Binning, OTUs
http://shuixia100.weebly.com/1/post/2011/12/mothur-tutorial-1.html / Wikipedia: Biological classification
19. 19
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
3 Concepts | Binning, OTUs
http://shuixia100.weebly.com/1/post/2011/12/mothur-tutorial-1.html / Wikipedia: Biological classification
20. 20
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a
template derived from a different but similar sequence. This then acts as a primer that is extended to form a
chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998,
Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to
produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological
sequence.
3 Concepts | Chimeras
Hass B.J. et al (2011) Chimeric 16S rRNA sequence formation and detection in
Sanger and 454-pyrosequenced PCR amplicons, Genome Res. 21: 494-504.
21. 3
21
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a
template derived from a different but similar sequence. This then acts as a primer that is extended to form a
chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998,
Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to
produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological
sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e.,
species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species
change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity
according to Hunter (2002:448).
Concepts | Diversities
Zinger L. et al. (2012) Two decades of describing the unseen majority of
aquatic microbial diversity, Molecular Ecology 21, 1878–1896.
22. 3
22
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a
template derived from a different but similar sequence. This then acts as a primer that is extended to form a
chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998,
Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to
produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological
sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e.,
species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species
change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity
according to Hunter (2002:448).
Concepts | Diversity measurement issues
Zhou J. et al. (2010) Random Sampling Process Leads to Overestimation of β-Diversity
of Microbial Communities, mBio 4(3):e00324-13. doi:10.1128/mBio.00324-13.
Diversity can virtually never
be measured directly,
rather it must be estimated
or inferred from available
data. Our estimates are
anchored in the sample
itself.
Magurran (Ed.), Biological Diversity,
Oxford U.P. 2010. Ch. 16 Microbial
Diversity and Ecology
23. 3
23
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a
template derived from a different but similar sequence. This then acts as a primer that is extended to form a
chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998,
Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to
produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological
sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e.,
species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species
change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity
according to Hunter (2002:448).
● Rarefaction allows the calculation of species richness for a given number of individual samples, based on the
construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the
number of samples.
Concepts | Rarefaction
most or all species
have been sampled
species rich habitat, only a small
fraction has been sampled
this habitat has not been
exhaustively sampled
Wooley J.C. et al. (2010) A Primer on Metagenomics, PLoS Computational Biology 6 (2) e1000667
29. 3
29
● Trimming: is the pre-processing step of cleaning sequence data (primers, multiplexing barcodes...) from
automated DNA sequencers prior to sequence assembly and other downstream uses.
● Binning is the process of grouping reads or contigs and assigning them to operational taxonomic units (OTUs).
● OTU (Operational Taxonomic Unit): Taxonomic level of sampling selected by the user to be used in a study.
Typically using a percent sequence similarity threshold for classifying microbes within the same, or different,
OTUs.
● Chimeras: Artificial sequences formed during PCR amplification. The majority of them are believed to arise
from incomplete extension. During subsequent cycles of PCR, a partially extended strand can bind to a
template derived from a different but similar sequence. This then acts as a primer that is extended to form a
chimeric sequence (Smith et al. 2010, Thompson et al., 2002, Meyerhans et al., 1990, Judo et al., 1998,
Odelberg, 1995). A chimeric template is created during one round, then amplified by subsequent rounds to
produce chimeric amplicons that are difficult to distinguish from amplicons derived from a single biological
sequence.
● Alpha diversity: the diversity within a particular area or ecosystem; expressed by the number of species (i.e.,
species richness) in that ecosystem, or by one or more diversity indices.
● Beta diversity: a comparison of of diversity between ecosystems, usually measured as the amount of species
change between the ecosystems.
● Gamma diversity: a measure of the overall diversity within a large region. Geographic-scale species diversity
according to Hunter (2002:448).
● Rarefaction allows the calculation of species richness for a given number of individual samples, based on the
construction of so-called rarefaction curves. This curve is a plot of the number of species as a function of the
number of samples.
● Metadata, reads, fasta/fastq files, counts, OTU tables/networks, .biom files, PCoA, p-values, diversity
metrics, robustness, scores, jackniffed, clustering, UPGMA, trees, bootstrap, Bi-Plots, ...
Concepts | Summary
32. 4
32
Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome,
Annu. Rev. Genomics Human Genet. 13, 151-170
Workflows | Overview
Sample collection
DNA extraction
and preparation
Sequencing
Analysis
33. 4
33
Grice E.A. & Segre J.A. (2012) The Human Microbiome: Our Second Genome,
Annu. Rev. Genomics Human Genet. 13, 151-170
Workflows | Overview
Sample collection
DNA extraction
and preparation
Sequencing
Analysis
Experimental design
Sample Quality Controls
Sequence Quality Controls
Biological interpretation
72. 6
72
More resources, courses...
Resources & Projects:
MEGAN DB http://www.megan-db.org/megan-db/ (MEtaGenomics ANalysis)
CAMERA http://camera.calit2.net/ (community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis)
MG-RAST Search http://metagenomics.anl.gov/metagenomics.cgi?page=MetagenomeSearch
IMG http://img.jgi.doe.gov/ (Integrated Microbial Genomes and metagenomes)
MetaBioME http://metasystems.riken.jp/metabiome/ (Comprehensive Metagenomic BioMining Engine)
BOLD http://www.boldsystems.org/ (Barcoding Of Live Database)
GOS Expedition http://www.jcvi.org/cms/research/projects/gos/overview (Global Ocean Sampling)
...
73. 6
73
More resources, courses...
Courses:
EBI http://www.ebi.ac.uk/training/course/metagenomics2014
EMBO http://cymeandcystidium.com/?tag=metagenomics
Coursera https://www.coursera.org/course/genomescience
... and a lot of seminars and workshops everywhere
74. Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Thanks for your attentionThanks for your attention
and also thanks to
Josep Gregori (VHIR, ROCHE)
for providing some materials
INTRODUCTION TO METAGENOMICSINTRODUCTION TO METAGENOMICS
Bioinformatics for
Biological Researchers
http://eib.stat.ub.edu/2014BBR
Ferran Briansó
ferran.brianso@vhir.org
28/05/2014