Article
Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions
Registro en:
DUTILH, Bas E. et al. Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions. BMC Genomics, San Diego, v. 15, n. 654, p. 1-11, 2014.
1471-2164
10.1186/1471-2164-15-654
Autor
Dutilh, Bas E.
Thompson, Cristiane C.
Vicente, Ana C. P.
Marin, Michel A.
Lee, Clarence
Schmieder, Robert
Andrade, Bruno G. N.
Chimetto, Luciane
Cuevas, Daniel
Garza, Daniel R.
Okeke, Iruka N.
Aboderin, Aaron Oladipo
Spangler, Jessica
Ross, Tristen
Dinsdale, Elizabeth A.
Thompson, Fabiano L.
Harkins, Timothy T.
Edwards, Robert A.
Resumen
Background: Vibrio cholerae is a globally dispersed pathogen that has evolved with humans for centuries, but also
includes non-pathogenic environmental strains. Here, we identify the genomic variability underlying this remarkable
persistence across the three major niche dimensions space, time, and habitat.
Results: Taking an innovative approach of genome-wide association applicable to microbial genomes (GWAS-M),
we classify 274 complete V. cholerae genomes by niche, including 39 newly sequenced for this study with the Ion
Torrent DNA-sequencing platform. Niche metadata were collected for each strain and analyzed together with
comprehensive annotations of genetic and genomic attributes, including point mutations (single-nucleotide
polymorphisms, SNPs), protein families, functions and prophages.
Conclusions: Our analysis revealed that genomic variations, in particular mobile functions including phages,
prophages, transposable elements, and plasmids underlie the metadata structuring in each of the three niche
dimensions. This underscores the role of phages and mobile elements as the most rapidly evolving elements in
bacterial genomes, creating local endemicity (space), leading to temporal divergence (time), and allowing the
invasion of new habitats. Together, we take a data-driven approach for comparative functional genomics that
exploits high-volume genome sequencing and annotation, in conjunction with novel statistical and machine
learning analyses to identify connections between genotype and phenotype on a genome-wide scale.