PCI Evolutionary Biology

URICCHIO Lawrence

Recommendations: 0

Review: 1

29 Nov 2022

Joint inference of adaptive and demographic history from temporal population genomic data

Vitor A. C. Pavinato, Stéphane De Mita, Jean-Michel Marin, Miguel de Navascués https://doi.org/10.1101/2021.03.12.435133

Inference of genome-wide processes using temporal population genomic data

Recommended by Aurelien Tellier based on reviews by Lawrence Uricchio and 2 anonymous reviewers

Evolutionary genomics, and population genetics in particular, aim to decipher the respective influence of neutral and selective forces shaping genetic polymorphism in a species/population. This is a much-needed requirement before scanning genome data for footprints of species adaptation to their biotic and abiotic environment (Johri et al. 2022). In general, we would like to quantify the proportion of the genome evolving neutrally and under selective (positive, balancing and negative) pressures (Kern and Hahn 2018, Johri et al. 2021). We thus need to understand patterns of linked selection along the genome, that is how the distribution of genetic polymorphisms is shaped by selected sites and the recombination landscape. The present contribution by Pavinato et al. (2022) provides an additional method in the population genomics toolbox to quantify the extent of linked positive and negative selection using temporal data.

The availability of genomics data for model and non-model species has led to improvement of the modeling framework for demography and selection (Johri et al. 2022), but also new inference methods making use of the full genome data based on the Sequential Markovian Coalescent (SMC, Li and Durbin 2011), Approximate Bayesian Computation (ABC, Jay et al. 2019), ABC and machine learning (Pudlo et al. 2016, Raynal et al. 2019) or Deep Learning (Sanchez et al. 2021). These methods are based on one sample in time and the use of the coalescent theory to reconstruct the past (demographic) history. However, it is also possible to obtain for many species temporal data sampled over several time points. For species with short generation time (in experimental evolution or monitored populations), one can sample a population every couple of generations as exemplified with Drosophila melanogaster (Bergland et al. 2010). For species with longer generation times that cannot be easily regularly sampled in time, it becomes possible to sequence available specimens from museums (e.g. Cridland et al. 2018) or ancient DNA samples. Methods using temporal data are based on the classical population genomics assumption that demography (migration, population subdivision, population size changes) leaves a genome-wide signal, while selection leaves a localized signal in the close vicinity of the causal mutation. Several methods do assess the demography of a population (change in effective population size, Ne, in time) using temporal data (e.g. Jorde and Ryman 2007) which can be used to calibrate the detection of loci under strong positive selection (Foll et al. 2014). Recently Buffalo and Coop (2020) used genome-wide covariance between allele frequency changes across time samples (and across replicates) to quantify the effects of linked selection over short timescales.

In the present contribution, Pavinato et al. (2022) make use of temporal data to draw the joint estimation of demographic and selective parameters using a simulation-based method (ABC-Random Forests). This study by Pavinato et al. (2022) builds a framework allowing to infer the census size of the population in time (N) separately from the effect of genetic drift, which is determined by change in effective population size (Ne) in time, as well estimates of genome-wide parameters of selection. In a nutshell, the authors use a forward simulator and summarize genome data by genomic windows using classic statistics (nucleotide diversity, Tajima’s D, FST, heterozygosity) between time samples and for each sample. They specifically use the distributions (higher moments) of these statistics among all windows. The authors combine as input for the ABC-RF, vectors of summary statistics, model parameters and five latent variables: Ne, the ratio Ne/N, the number of beneficial mutations under strong selection, the average selection coeﬀicient of strongly selected mutations, and the average substitution load. Indeed, the authors are interested in three different types of selection components: 1) the adaptive potential of a population which is estimated as the population mutation rate of beneficial mutations (θb), 2) the number of mutations under strong selection (irrespective of whether they reached fixation or not), and 3) the overall population fitness which is a function of the genetic load. In other words, the novelty of this method is not to focus on the detection of loci under selection, but to infer key parameters/distributions summarizing the genome-wide signal of demography and (positive and negative) selection. As a proof of principle, the authors then apply their method to a dataset of feral populations of honey bees (Apis mellifera) collected in California across many years and recovered from Museum samples (Cridland et al. 2018). The approach yields estimates of Ne which are on the same order of magnitude of previous estimates in hymenopterans, and the authors discuss why the different populations show various values of Ne and N which can be explained by different history of admixture with wild but also domesticated lineages of bees.

This study focuses on quantifying the genome-wide joint footprints of demography, and strong positive and negative selection to determine which proportion of the genome evolves neutrally or not. Further application of this method can be anticipated, for example, to study species with ecological and life-history traits which generate discrepancies between census size and Ne, for example for plants with selfing or seed banking (Sellinger et al. 2020), and for which the genome-wide effect of linked selection is not fully understood.

References

Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD (2022) Recommendations for improving statistical inference in population genomics. PLOS Biology, 20, e3001669. https://doi.org/10.1371/journal.pbio.3001669

Kern AD, Hahn MW (2018) The Neutral Theory in Light of Natural Selection. Molecular Biology and Evolution, 35, 1366–1371. https://doi.org/10.1093/molbev/msy092

Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD (2021) The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects. Molecular Biology and Evolution, 38, 2986–3003. https://doi.org/10.1093/molbev/msab050

Pavinato VAC, Mita SD, Marin J-M, Navascués M de (2022) Joint inference of adaptive and demographic history from temporal population genomic data. bioRxiv, 2021.03.12.435133, ver. 6 peer-reviewed and recommended by Peer Community in Evolutionary Biology. https://doi.org/10.1101/2021.03.12.435133

Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature, 475, 493–496. https://doi.org/10.1038/nature10231

Jay F, Boitard S, Austerlitz F (2019) An ABC Method for Whole-Genome Sequence Data: Inferring Paleolithic and Neolithic Human Expansions. Molecular Biology and Evolution, 36, 1565–1579. https://doi.org/10.1093/molbev/msz038

Pudlo P, Marin J-M, Estoup A, Cornuet J-M, Gautier M, Robert CP (2016) Reliable ABC model choice via random forests. Bioinformatics, 32, 859–866. https://doi.org/10.1093/bioinformatics/btv684

Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A (2019) ABC random forests for Bayesian parameter inference. Bioinformatics, 35, 1720–1728. https://doi.org/10.1093/bioinformatics/bty867

Sanchez T, Cury J, Charpiat G, Jay F (2021) Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Molecular Ecology Resources, 21, 2645–2660. https://doi.org/10.1111/1755-0998.13224

Bergland AO, Behrman EL, O’Brien KR, Schmidt PS, Petrov DA (2014) Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila. PLOS Genetics, 10, e1004775. https://doi.org/10.1371/journal.pgen.1004775

Cridland JM, Ramirez SR, Dean CA, Sciligo A, Tsutsui ND (2018) Genome Sequencing of Museum Specimens Reveals Rapid Changes in the Genetic Composition of Honey Bees in California. Genome Biology and Evolution, 10, 458–472. https://doi.org/10.1093/gbe/evy007

Jorde PE, Ryman N (2007) Unbiased Estimator for Genetic Drift and Effective Population Size. Genetics, 177, 927–935. https://doi.org/10.1534/genetics.107.075481

Foll M, Shim H, Jensen JD (2015) WFABC: a Wright–Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Molecular Ecology Resources, 15, 87–98. https://doi.org/10.1111/1755-0998.12280

Buffalo V, Coop G (2020) Estimating the genome-wide contribution of selection to temporal allele frequency change. Proceedings of the National Academy of Sciences, 117, 20672–20680. https://doi.org/10.1073/pnas.1919039117

Sellinger TPP, Awad DA, Moest M, Tellier A (2020) Inference of past demography, dormancy and self-fertilization rates from whole genome sequence data. PLOS Genetics, 16, e1008698. https://doi.org/10.1371/journal.pgen.1008698