TY - JOUR
T1 - Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning
AU - Ag1000g Consortium
AU - Xue, Alexander T.
AU - Schrider, Daniel
AU - Kern, Andrew
AU - Della Torre, Alessandra
AU - Kern, Andrew
AU - Caputo, Beniamino
AU - Kabula, Bilali
AU - White, Bradley
AU - Godfray, Charles
AU - Edi, Constant
AU - Wilding, Craig
AU - Neafsey, Dan
AU - Schrider, Daniel
AU - Conway, David
AU - Weetman, David
AU - Ayala, Diego
AU - Kwiatkowski, Dominic
AU - Sharakhov, Igor
AU - Midega, Janet
AU - Xu, Jiannong
AU - Pinto, João
AU - Essandoh, John
AU - Matowo, Johnson
AU - Vernick, Ken
AU - Djogbenou, Luc S.
AU - Coulibaly, Mamadou
AU - Lawniczak, Mara
AU - Donnelly, Martin
AU - Hahn, Matthew
AU - Fontaine, Michaël
AU - Riehle, Michelle
AU - Besansky, Nora
AU - Cornejo, Omar
AU - Mccann, Robert
AU - O'loughlin, Sam
AU - Robert, Vincent
AU - Xue, Alexander
AU - Miles, Alistair
AU - Clarkson, Chris
AU - Battey, C. J.
AU - Champion, Cody
AU - Labbe, Frederic
AU - Bottà, Giordano
AU - Adrion, Jeffrey
AU - Nelson, Joel
AU - Harding, Nick
AU - Wang, Richard
AU - Small, Scott T.
AU - Redmond, Seth
AU - Antão, Tiago
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
PY - 2021/3/1
Y1 - 2021/3/1
N2 - Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.
AB - Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics.
KW - convolutional neural networks
KW - machine learning
KW - partial sweeps
KW - population genomics
KW - selective sweeps
UR - http://www.scopus.com/inward/record.url?scp=85102911117&partnerID=8YFLogxK
U2 - 10.1093/molbev/msaa259
DO - 10.1093/molbev/msaa259
M3 - Article
C2 - 33022051
AN - SCOPUS:85102911117
SN - 0737-4038
VL - 38
SP - 1168
EP - 1183
JO - Molecular Biology And Evolution
JF - Molecular Biology And Evolution
IS - 3
ER -