Article
Assembly of a pan-genome from deep sequencing of 910 humans of African descent
Registro en:
SHERMAN, R. M. et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nature Genetics, 2018.
1061-4036
10.1038/s41588-018-0273-y
Autor
Sherman, Rachel M
Forman, Juliet
Antonescu, Valentin
Puiu, Daniela
Daya, Michelle
Rafaels, Nicholas
Boorgula, Meher Preethi
Chavan, Sameer
Vergara, Candelaria
Ortega, Victor E
Levin, Albert M
Eng, Celeste
Yazdanbakhsh, Maria
Wilson, James G
Marrugo, Javier
Lange, Leslie A
Williams, L Keoki
Watson, Harold
Ware, Lorraine B
Olopade, Christopher O
Olopade, Olufunmilayo
Oliveira, Ricardo Riccio
Ober, Carole
Nicolae, Dan L
Meyers, Deborah A
Mayorga, Alvaro
Knight-Madden, Jennifer
Hartert, Tina
Hansel, Nadia N
Foreman, Marilyn G
Ford, Jean G
Faruque, Mezbah U
Dunston, Georgia M
Caraballo, Luis
Burchard, Esteban G
Bleecker, Eugene R
Araujo, Maria I
Herrera-Paz, Edwin F
Campbell, Monica
Foster, Cassandra
Taub, Margaret A
Beaty, Terri H
Ruczinski, Ingo
Mathias, Rasika A
Barnes, Kathleen C
Salzberg, Steven L
Resumen
Oliveira, Ricardo Riccio. Fundação Oswaldo Cruz. Instituto Gonçalo Moniz. Salvador, BA, Brasil. 1Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. 2Department of
Computer Science, Johns Hopkins University, Baltimore, MD, USA. 3Departments of Computer Science, Biology, and Mathematics, Harvey Mudd College,
Claremont, CA, USA. 4Department of Medicine, University of Colorado Denver, Aurora, CO, USA. 5Department of Medicine, Johns Hopkins University,
Baltimore, MD, USA. 6Department of Internal Medicine, Section on Pulmonary, Critical Care, Allergy and Immunologic Diseases, Center for Precision
Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA. 7Department of Public Health Sciences, Henry Ford Health System, Detroit, MI,
USA. 8Department of Medicine, University of California, San Francisco, San Francisco, CA, USA. 9Department of Parasitology, Leiden University Medical
Center, Leiden, The Netherlands. 10Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA. 11Institute for
Immunological Research, Universidad de Cartagena, Cartagena, Colombia. 12Department of Internal Medicine, Henry Ford Health System, Detroit, MI,
USA. 13Faculty of Medical Sciences Cave Hill Campus, The University of the West Indies, Bridgetown, Barbados. 14Department of Medicine, Vanderbilt
University, Nashville, TN, USA. 15Department of Medicine and Center for Global Health, University of Chicago, Chicago, IL, USA. 16Department of Medicine,
University of Chicago, Chicago, IL, USA. 17Laboratório de Patologia Experimental, Centro de Pesquisas Gonçalo Moniz, Salvador, Brazil. 18Department
of Human Genetics, University of Chicago, Chicago, IL, USA. 19Department of Medicine, University of Arizona College of Medicine, Tucson, AZ, USA.
20Centro de Neumologia y Alergias, San Pedro Sula, Honduras. 21Caribbean Institute for Health Research, The University of the West Indies, Kingston,
Jamaica. 22Pulmonary and Critical Care Medicine, Morehouse School of Medicine, Atlanta, GA, USA. 23Department of Medicine, Einstein Medical Center,
Philadelphia, PA, USA. 24National Human Genome Center, Howard University College of Medicine, Washington, DC, USA. 25Department of Microbiology,
Howard University College of Medicine, Washington, DC, USA. 26Departments of Bioengineering & Therapeutic Sciences and Medicine, University of
California, San Francisco, San Francisco, CA, USA. 27Immunology Service, Universidade Federal da Bahia, Salvador, Brazil. 28Facultad de Ciencias Médicas,
Universidad Tecnológica Centroamericana (UNITEC), Tegucigalpa, Honduras. 29Department of Biostatistics, Bloomberg School of Public Health, Johns
Hopkins University, Baltimore, MD, USA. 30Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD,
USA. 31Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. GenBank with accession code PDBU00000000. We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.