Inference of genome-wide processes using temporal population genomic data
Joint inference of adaptive and demographic history from temporal population genomic data
Recommendation: posted 21 November 2022, validated 29 November 2022
Evolutionary genomics, and population genetics in particular, aim to decipher the respective influence of neutral and selective forces shaping genetic polymorphism in a species/population. This is a much-needed requirement before scanning genome data for footprints of species adaptation to their biotic and abiotic environment (Johri et al. 2022). In general, we would like to quantify the proportion of the genome evolving neutrally and under selective (positive, balancing and negative) pressures (Kern and Hahn 2018, Johri et al. 2021). We thus need to understand patterns of linked selection along the genome, that is how the distribution of genetic polymorphisms is shaped by selected sites and the recombination landscape. The present contribution by Pavinato et al. (2022) provides an additional method in the population genomics toolbox to quantify the extent of linked positive and negative selection using temporal data.
The availability of genomics data for model and non-model species has led to improvement of the modeling framework for demography and selection (Johri et al. 2022), but also new inference methods making use of the full genome data based on the Sequential Markovian Coalescent (SMC, Li and Durbin 2011), Approximate Bayesian Computation (ABC, Jay et al. 2019), ABC and machine learning (Pudlo et al. 2016, Raynal et al. 2019) or Deep Learning (Sanchez et al. 2021). These methods are based on one sample in time and the use of the coalescent theory to reconstruct the past (demographic) history. However, it is also possible to obtain for many species temporal data sampled over several time points. For species with short generation time (in experimental evolution or monitored populations), one can sample a population every couple of generations as exemplified with Drosophila melanogaster (Bergland et al. 2010). For species with longer generation times that cannot be easily regularly sampled in time, it becomes possible to sequence available specimens from museums (e.g. Cridland et al. 2018) or ancient DNA samples. Methods using temporal data are based on the classical population genomics assumption that demography (migration, population subdivision, population size changes) leaves a genome-wide signal, while selection leaves a localized signal in the close vicinity of the causal mutation. Several methods do assess the demography of a population (change in effective population size, Ne, in time) using temporal data (e.g. Jorde and Ryman 2007) which can be used to calibrate the detection of loci under strong positive selection (Foll et al. 2014). Recently Buffalo and Coop (2020) used genome-wide covariance between allele frequency changes across time samples (and across replicates) to quantify the effects of linked selection over short timescales.
In the present contribution, Pavinato et al. (2022) make use of temporal data to draw the joint estimation of demographic and selective parameters using a simulation-based method (ABC-Random Forests). This study by Pavinato et al. (2022) builds a framework allowing to infer the census size of the population in time (N) separately from the effect of genetic drift, which is determined by change in effective population size (Ne) in time, as well estimates of genome-wide parameters of selection. In a nutshell, the authors use a forward simulator and summarize genome data by genomic windows using classic statistics (nucleotide diversity, Tajima’s D, FST, heterozygosity) between time samples and for each sample. They specifically use the distributions (higher moments) of these statistics among all windows. The authors combine as input for the ABC-RF, vectors of summary statistics, model parameters and five latent variables: Ne, the ratio Ne/N, the number of beneficial mutations under strong selection, the average selection coeﬀicient of strongly selected mutations, and the average substitution load. Indeed, the authors are interested in three different types of selection components: 1) the adaptive potential of a population which is estimated as the population mutation rate of beneficial mutations (θb), 2) the number of mutations under strong selection (irrespective of whether they reached fixation or not), and 3) the overall population fitness which is a function of the genetic load. In other words, the novelty of this method is not to focus on the detection of loci under selection, but to infer key parameters/distributions summarizing the genome-wide signal of demography and (positive and negative) selection. As a proof of principle, the authors then apply their method to a dataset of feral populations of honey bees (Apis mellifera) collected in California across many years and recovered from Museum samples (Cridland et al. 2018). The approach yields estimates of Ne which are on the same order of magnitude of previous estimates in hymenopterans, and the authors discuss why the different populations show various values of Ne and N which can be explained by different history of admixture with wild but also domesticated lineages of bees.
This study focuses on quantifying the genome-wide joint footprints of demography, and strong positive and negative selection to determine which proportion of the genome evolves neutrally or not. Further application of this method can be anticipated, for example, to study species with ecological and life-history traits which generate discrepancies between census size and Ne, for example for plants with selfing or seed banking (Sellinger et al. 2020), and for which the genome-wide effect of linked selection is not fully understood.
Johri P, Aquadro CF, Beaumont M, Charlesworth B, Excoffier L, Eyre-Walker A, Keightley PD, Lynch M, McVean G, Payseur BA, Pfeifer SP, Stephan W, Jensen JD (2022) Recommendations for improving statistical inference in population genomics. PLOS Biology, 20, e3001669. https://doi.org/10.1371/journal.pbio.3001669
Kern AD, Hahn MW (2018) The Neutral Theory in Light of Natural Selection. Molecular Biology and Evolution, 35, 1366–1371. https://doi.org/10.1093/molbev/msy092
Johri P, Riall K, Becher H, Excoffier L, Charlesworth B, Jensen JD (2021) The Impact of Purifying and Background Selection on the Inference of Population History: Problems and Prospects. Molecular Biology and Evolution, 38, 2986–3003. https://doi.org/10.1093/molbev/msab050
Pavinato VAC, Mita SD, Marin J-M, Navascués M de (2022) Joint inference of adaptive and demographic history from temporal population genomic data. bioRxiv, 2021.03.12.435133, ver. 6 peer-reviewed and recommended by Peer Community in Evolutionary Biology. https://doi.org/10.1101/2021.03.12.435133
Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature, 475, 493–496. https://doi.org/10.1038/nature10231
Jay F, Boitard S, Austerlitz F (2019) An ABC Method for Whole-Genome Sequence Data: Inferring Paleolithic and Neolithic Human Expansions. Molecular Biology and Evolution, 36, 1565–1579. https://doi.org/10.1093/molbev/msz038
Pudlo P, Marin J-M, Estoup A, Cornuet J-M, Gautier M, Robert CP (2016) Reliable ABC model choice via random forests. Bioinformatics, 32, 859–866. https://doi.org/10.1093/bioinformatics/btv684
Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A (2019) ABC random forests for Bayesian parameter inference. Bioinformatics, 35, 1720–1728. https://doi.org/10.1093/bioinformatics/bty867
Sanchez T, Cury J, Charpiat G, Jay F (2021) Deep learning for population size history inference: Design, comparison and combination with approximate Bayesian computation. Molecular Ecology Resources, 21, 2645–2660. https://doi.org/10.1111/1755-0998.13224
Bergland AO, Behrman EL, O’Brien KR, Schmidt PS, Petrov DA (2014) Genomic Evidence of Rapid and Stable Adaptive Oscillations over Seasonal Time Scales in Drosophila. PLOS Genetics, 10, e1004775. https://doi.org/10.1371/journal.pgen.1004775
Cridland JM, Ramirez SR, Dean CA, Sciligo A, Tsutsui ND (2018) Genome Sequencing of Museum Specimens Reveals Rapid Changes in the Genetic Composition of Honey Bees in California. Genome Biology and Evolution, 10, 458–472. https://doi.org/10.1093/gbe/evy007
Jorde PE, Ryman N (2007) Unbiased Estimator for Genetic Drift and Effective Population Size. Genetics, 177, 927–935. https://doi.org/10.1534/genetics.107.075481
Foll M, Shim H, Jensen JD (2015) WFABC: a Wright–Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Molecular Ecology Resources, 15, 87–98. https://doi.org/10.1111/1755-0998.12280
Buffalo V, Coop G (2020) Estimating the genome-wide contribution of selection to temporal allele frequency change. Proceedings of the National Academy of Sciences, 117, 20672–20680. https://doi.org/10.1073/pnas.1919039117
Sellinger TPP, Awad DA, Moest M, Tellier A (2020) Inference of past demography, dormancy and self-fertilization rates from whole genome sequence data. PLOS Genetics, 16, e1008698. https://doi.org/10.1371/journal.pgen.1008698
Aurelien Tellier (2022) Inference of genome-wide processes using temporal population genomic data. Peer Community in Evolutionary Biology, 100158. https://doi.org/10.24072/pci.evolbiol.100158
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.
Evaluation round #2
DOI or URL of the preprint: https://doi.org/10.1101/2021.03.12.435133
Version of the preprint: 4
Author's Reply, 09 Nov 2022
Decision by Aurelien Tellier, posted 10 Nov 2022, validated 17 Oct 2022
The three reviewers and myseld acknowledge the efforts you have made in preparing this revision which addresses satisfactorily most critical points and comments. The reviewers have few last minor comments for you to incorporate before I can proceed with the recommendation. I anticipate that this can be done relatively quickly. I anticipate to evaluate your reply to these comments myself.
Best regards and sorry for the delay in obtaining all reviews.
Reviewed by anonymous reviewer, 14 Sep 2022
Reviewed by Lawrence Uricchio, 07 Oct 2022
Reviewed by anonymous reviewer, 13 Oct 2022
Evaluation round #1
DOI or URL of the preprint: https://doi.org/10.1101/2021.03.12.435133
Version of the preprint: 2
Author's Reply, 07 Sep 2022
Decision by Aurelien Tellier, posted 17 Jan 2022
I am sorry for the delay in the evaluation of your manuscript. I had trouble finding reviewers at first, but then found three very enthusiastic reviewers, but the end of the year holidays delayed their reviews. Your manuscript is very interesting and I hope you will consider submitting a revised version.
You will find that all reviewers are very enthusiastic about the manuscript and very supportive (see attached reviews). The reviewers evaluated thoroughly your manuscript and suggest a number of minor changes to improve clarity of the methods and results. These should be easily doable.
In addition, two reviewers suggest potential additional simulations to test for the effect of heterogeneous recombination, heterogeneity of deleterious mutations in genes and demography on the accuracy of your estimations. While I am aware that it would require a substantial amount of work to do all these tests, I would recommend to test by simulations the effect of heterogeneous recombination on your method accuracy. The other potential biases can be dealt with in the discussion (you will find also some useful suggestions to improve the discussion and add some additional references).
I look forward to receive your revised version and will make sure the review process get faster in the next round,
Many thanks for submitting to PCI,