Submit a preprint

Latest recommendationsrsstwitter

IdTitleAuthorsAbstract▲PictureThematic fieldsRecommenderReviewersSubmission date
06 Sep 2022
article picture

Masculinization of the X-chromosome in aphid soma and gonads

Sex-biased gene expression is not tissue-specific in Pea Aphids

Recommended by and based on reviews by Ann Kathrin Huylmans and 1 anonymous reviewer

Sexual antagonism (SA), wherein the fitness interests of the sexes do not align, is inherent to organisms with two (or more) sexes.  SA leads to intra-locus sexual conflict, where an allele that confers higher fitness in one sex reduces fitness in the other [1, 2].  This situation leads to what has been referred to as "gender load", resulting from the segregation of SA alleles in the population.  Gender load can be reduced by the evolution of sex-specific (or sex-biased) gene expression.  A specific prediction is that gene-duplication can lead to sub- or neo-functionalization, in which case the two duplicates partition the function in the different sexes.  The conditions for invasion by a SA allele differ between sex-chromosomes and autosomes, leading to the prediction that (in XY or XO systems) the X should accumulate recessive male-favored alleles and dominant female-favored alleles; similar considerations apply in ZW systems ([3, but see 4].

Aphids present an interesting special case, for several reasons: they have XO sex-determination, and three distinct reproductive morphs (sexual females, parthenogenetic females, and males).  Previous theoretical work by the lead author predict that the X should be optimized for male function, which was borne out by whole-animal transcriptome analysis [5].  

Here [6], the authors extend that work to investigate “tissue”-specific (heads, legs and gonads), sex-specific gene expression.  They argue that, if intra-locus SA is the primary driver of sex-biased gene expression, it should be generally true in all tissues.  They set up as an alternative the possibility that sex-biased gene expression could also be driven by dosage compensation.  They cite references supporting their argument that "dosage compensation (could be) stronger in the brain", although the underlying motivation for that argument appears to be based on empirical evidence rather than theoretical predictions.      

At any rate, the results are clear: all tissues investigated show masculinization of the X.  Further, X-linked copies of gene duplicates were more frequently male-biased than duplicated autosomal genes or X-linked single-copy genes.

To sum up, this is a nice empirical study with clearly interpretable (and interpreted) results, the most obvious of which is the greater sex-biased expression in sexually-dimorphic tissues.  Unfortunately, as the authors emphasize, there is no general theory by which SA, variable dosage-compensation, and meiotic sex chromosome inactivation can be integrated in a predictive framework.  It is to be hoped that empirical studies such as this one will motivate deeper and more general theoretical investigations.

References

[1] Rice WR, Chippindale AK (2001) Intersexual ontogenetic conflict. Journal of Evolutionary Biology 14: 685-693. https://doi.org/10.1046/j.1420-9101.2001.00319.x

[2] Bonduriansky R, Chenoweth SF (2009) Intralocus sexual conflict. Trends Ecol Evol 24: 280-288. https://doi.org/10.1016/j.tree.2008.12.005

[3] Rice WR. (1984) Sex chromosomes and the evolution of sexual dimorphism. Evolution 38: 735-742. https://doi.org/10.1086/595754

[4] Fry JD (2010) The genomic location of sexually antagonistic variation: some cautionary comments. Evolution 64: 1510-1516. https://doi.org/10.1111%2Fj.1558-5646.2009.00898.x

[5] Jaquiéry J, Rispe C, Roze D, Legeai F, Le Trionnaire G, Stoeckel S, et al. (2013) Masculinization of the X Chromosome in the Pea Aphid. PLoS Genetics 9. https://doi.org/10.1371/journal.pgen.1003690

[6] Jaquiéry J, Simon J-C, Robin S, Richard G, Peccoud J, Boulain H, Legeai F, Tanguy S, Prunier-Leterme N, Le Trionnaire G (2022) Masculinization of the X-chromosome in aphid soma and gonads. bioRxiv, 2021.08.13.453080, ver. 4 peer-reviewed and recommended by Peer Community in Evolutionary Biology. https://doi.org/10.1101/2021.08.13.453080 

Masculinization of the X-chromosome in aphid soma and gonadsJulie Jaquiery, Jean-Christophe Simon, Stephanie Robin, Gautier Richard, Jean Peccoud, Helene Boulain, Fabrice Legeai, Sylvie Tanguy, Nathalie Prunier-Leterme, Gael Letrionnaire<p>Males and females share essentially the same genome but differ in their optimal values for many phenotypic traits, which can result in intra-locus conflict between the sexes. Aphids display XX/X0 sex chromosomes and combine unusual X chromosome...Genetic conflicts, Genome Evolution, Reproduction and SexCharles Baer2021-08-16 08:56:08 View
12 Nov 2020
article picture

Limits and Convergence properties of the Sequentially Markovian Coalescent

Review and Assessment of Performance of Genomic Inference Methods based on the Sequentially Markovian Coalescent

Recommended by ORCID_LOGO based on reviews by 3 anonymous reviewers

The human genome not only encodes for biological functions and for what makes us human, it also encodes the population history of our ancestors. Changes in past population sizes, for example, affect the distribution of times to the most recent common ancestor (tMRCA) of genomic segments, which in turn can be inferred by sophisticated modelling along the genome.
A key framework for such modelling of local tMRCA tracts along genomes is the Sequentially Markovian Coalescent (SMC) (McVean and Cardin 2005, Marjoram and Wall 2006) . The problem that the SMC solves is that the mosaic of local tMRCAs along the genome is unknown, both in their actual ages and in their positions along the genome. The SMC allows to effectively sum across all possibilities and handle the uncertainty probabilistically. Several important tools for inferring the demographic history of a population have been developed built on top of the SMC, including PSMC (Li and Durbin 2011), diCal (Sheehan et al 2013), MSMC (Schiffels and Durbin 2014), SMC++ (Terhorst et al 2017), eSMC (Sellinger et al. 2020) and others.
In this paper, Sellinger, Abu Awad and Tellier (2020) review these SMC-based methods and provide a coherent simulation design to comparatively assess their strengths and weaknesses in a variety of demographic scenarios (Sellinger, Abu Awad and Tellier 2020). In addition, they used these simulations to test how breaking various key assumptions in SMC methods affects estimates, such as constant recombination rates, or absence of false positive SNP calls.
As a result of this assessment, the authors not only provide practical guidance for researchers who want to use these methods, but also insights into how these methods work. For example, the paper carefully separates sources of error in these methods by observing what they call “Best-case convergence” of each method if the data behaves perfectly and separating that from how the method applies with actual data. This approach provides a deeper insight into the methods than what we could learn from application to genomic data alone.
In the age of genomics, computational tools and their development are key for researchers in this field. All the more important is it to provide the community with overviews, reviews and independent assessments of such tools. This is particularly important as sometimes the development of new methods lacks primary visibility due to relevant testing material being pushed to Supplementary Sections in papers due to space constraints. As SMC-based methods have become so widely used tools in genomics, I think the detailed assessment by Sellinger et al. (2020) is timely and relevant.
In conclusion, I recommend this paper because it bridges from a mere review of the different methods to an in-depth assessment of performance, thereby addressing both beginners in the field who just seek an initial overview, as well as experienced researchers who are interested in theoretical boundaries and assumptions of the different methods.

References

[1] Li, H., and Durbin, R. (2011). Inference of human population history from individual whole-genome sequences. Nature, 475(7357), 493-496. doi: https://doi.org/10.1038/nature10231
[2] Marjoram, P., and Wall, J. D. (2006). Fast"" coalescent"" simulation. BMC genetics, 7(1), 16. doi: https://doi.org/10.1186/1471-2156-7-16
[3] McVean, G. A., and Cardin, N. J. (2005). Approximating the coalescent with recombination. Philosophical Transactions of the Royal Society B: Biological Sciences, 360(1459), 1387-1393. doi: https://doi.org/10.1098/rstb.2005.1673
[4] Schiffels, S., and Durbin, R. (2014). Inferring human population size and separation history from multiple genome sequences. Nature genetics, 46(8), 919-925. doi: https://doi.org/10.1038/ng.3015
[5] Sellinger, T. P. P., Awad, D. A., Moest, M., and Tellier, A. (2020). Inference of past demography, dormancy and self-fertilization rates from whole genome sequence data. PLoS Genetics, 16(4), e1008698. doi: https://doi.org/10.1371/journal.pgen.1008698
[6] Sellinger, T. P. P., Awad, D. A. and Tellier, A. (2020) Limits and Convergence properties of the Sequentially Markovian Coalescent. bioRxiv, 2020.07.23.217091, ver. 3 peer-reviewed and recommended by PCI Evolutionary Biology. doi: https://doi.org/10.1101/2020.07.23.217091
[7] Sheehan, S., Harris, K., and Song, Y. S. (2013). Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics, 194(3), 647-662. doi: https://doi.org/10.1534/genetics.112.149096
[8] Terhorst, J., Kamm, J. A., and Song, Y. S. (2017). Robust and scalable inference of population history from hundreds of unphased whole genomes. Nature genetics, 49(2), 303-309. doi: https://doi.org/10.1038/ng.3748

Limits and Convergence properties of the Sequentially Markovian CoalescentThibaut Sellinger, Diala Abu Awad, Aurélien Tellier<p>Many methods based on the Sequentially Markovian Coalescent (SMC) have been and are being developed. These methods make use of genome sequence data to uncover population demographic history. More recently, new methods have extended the original...Population Genetics / GenomicsStephan SchiffelsAnonymous2020-07-25 10:54:48 View
06 Oct 2017
article picture

Evolutionary analysis of candidate non-coding elements regulating neurodevelopmental genes in vertebrates

Combining molecular information on chromatin organisation with eQTLs and evolutionary conservation provides strong candidates for the evolution of gene regulation in mammalian brains

Recommended by based on reviews by Marc Robinson-Rechavi and Charles Danko

In this manuscript [1], Francisco J. Novo proposes candidate non-coding genomic elements regulating neurodevelopmental genes.

What is very nice about this study is the way in which public molecular data, including physical interaction data, is used to leverage recent advances in our understanding to molecular mechanisms of gene regulation in an evolutionary context. More specifically, evolutionarily conserved non coding sequences are combined with enhancers from the FANTOM5 project, DNAse hypersensitive sites, chromatin segmentation, ChIP-seq of transcription factors and of p300, gene expression and eQTLs from GTEx, and physical interactions from several Hi-C datasets. The candidate regulatory regions thus identified are linked to candidate regulated genes, and the author shows their potential implication in brain development.

While the results are focused on a small number of genes, this allows to verify features of these candidates in great detail. This study shows how functional genomics is increasingly allowing us to fulfill the promises of Evo-Devo: understanding the molecular mechanisms of conservation and differences in morphology.

References

[1] Novo, FJ. 2017. Evolutionary analysis of candidate non-coding elements regulating neurodevelopmental genes in vertebrates. bioRxiv, 150482, ver. 4 of Sept 29th, 2017. doi: 10.1101/150482

Evolutionary analysis of candidate non-coding elements regulating neurodevelopmental genes in vertebratesFrancisco J. Novo<p>Many non-coding regulatory elements conserved in vertebrates regulate the expression of genes involved in development and play an important role in the evolution of morphology through the rewiring of developmental gene networks. Available biolo...Genome EvolutionMarc Robinson-Rechavi Marc Robinson-Rechavi, Charles Danko2017-06-29 08:55:41 View
12 Feb 2024
article picture

How do plant RNA viruses overcome the negative effect of Muller s ratchet despite strong transmission bottlenecks?

How to survive the mutational meltdown: lessons from plant RNA viruses

Recommended by based on reviews by Brent Allman, Ana Morales-Arce and 1 anonymous reviewer

Although most mutations are deleterious, the strongly deleterious ones do not spread in a very large population as their chance of fixation is very small. Another mechanism via which the deleterious mutations can be eliminated is via recombination or sexual reproduction. However, in a finite asexual population, the subpopulation without any deleterious mutation will eventually acquire a deleterious mutation resulting in the reduction of the population size or in other words, an increase in the genetic drift. This, in turn, will lead the population to acquire deleterious mutations at a faster rate eventually leading to a mutational meltdown.

This irreversible (or, at least over some long time scales) accumulation of deleterious mutations is especially relevant to RNA viruses due to their high mutation rate, and while the prior work has dealt with bacteriophages and RNA viruses, the study by Lafforgue et al. [1] makes an interesting contribution to the existing literature by focusing on plants.

In this study, the authors enquire how despite the repeated increase in the strength of genetic drift, how the RNA viruses manage to survive in plants. Following a series of experiments and some numerical simulations, the authors find that as expected, after severe bottlenecks, the fitness of the population decreases significantly. But if the bottlenecks are followed by population expansion, the Muller’s ratchet can be halted due to the genetic diversity generated during population growth. They hypothesize this mechanism as a potential way by which the RNA viruses can survive the mutational meltdown.

As a theoretician, I find this investigation quite interesting and would like to see more studies addressing, e.g., the minimum population growth rate required to counter the potential extinction for a given bottleneck size and deleterious mutation rate. Of course, it would be interesting to see in future work if the hypothesis in this article can be tested in natural populations.

References

[1] Guillaume Lafforgue, Marie Lefebvre, Thierry Michon, Santiago F. Elena (2024) How do plant RNA viruses overcome the negative effect of Muller s ratchet despite strong transmission bottlenecks? bioRxiv, ver. 3 peer-reviewed and recommended by Peer Community In Evolutionary Biology
https://doi.org/10.1101/2023.08.01.550272

How do plant RNA viruses overcome the negative effect of Muller s ratchet despite strong transmission bottlenecks?Guillaume Lafforgue, Marie Lefebvre, Thierry Michon, Santiago F. Elena<p>Muller's ratchet refers to the irreversible accumulation of deleterious mutations in small populations, resulting in a decline in overall fitness. This phenomenon has been extensively observed in experiments involving microorganisms, including ...Experimental Evolution, Genome EvolutionKavita Jain2023-08-04 09:37:08 View
05 Jun 2018
article picture

The dynamics of preferential host switching: host phylogeny as a key predictor of parasite prevalence and distribution

Shift or stick? Untangling the signatures of biased host switching, and host-parasite co-speciation

Recommended by based on reviews by Damien de Vienne and Nathan Medd

Many emerging diseases arise by parasites switching to new host species, while other parasites seem to remain with same host lineage for very long periods of time, even over timescales where an ancestral host species splits into two or more new species. The ability to understand these dynamics would form an important part of our understanding of infectious disease.

Experiments are clearly important for understanding these processes, but so are comparative studies, investigating the variation that we find in nature. Such comparative data do show strong signs of non-randomness, and this suggests that the epidemiological and ecological processes might be predictable, at least in part. For example, when we map patterns of parasite presence/absence onto host phylogenies, we often find that certain host clades harbour many more parasites than expected, or that closely-related hosts harbour closely-related parasites. Nevertheless, it remains difficult to interpret these patterns to make inferences about ecological and epidemiological processes. This is partly because non-random associations can arise in multiple ways. For example, parasites might be inherited from the common ancestor of related hosts, or might switch to new hosts, but preferentially establish on novel hosts that are closely related to their existing host. Infection might also influence the shape of host phylogeny, either by increasing the rate of host extinction or, conversely, increasing the rate of speciation (as with manipulative symbionts that might induce reproductive isolation).

These various processes have, by and large, been studied in isolation, but the model introduced by Engelstädter and Fortuna [1], makes an important first step towards studying them together. Without such combined analyses, we will not be able to tell if the processes have their own unique signatures, or whether the same sort of non-randomness can arise in multiple ways.

A major finding of the work is that the size of a host clade can be an important determinant of its overall infection level. This had been shown in previous work, assuming that the host phylogeny was fixed, but the current paper shows that it extends also to situations where host extinction and speciation takes place at a comparable rate to host shifting. This finding, then, calls into question the natural assumption that a clade of host species that is highly parasite ridden, must have some genetic or ecological characteristic that makes them particularly prone to infection, arguing that the clade size, rather than any characteristic of the clade members, might be the important factor. It will be interesting to see whether this prediction about clade size is borne out with comparative studies.

Another feature of the study is that the framework is naturally extendable, to include further processes, such as the influence of parasite presence on extinction or speciation rates. No doubt extensions of this kind will form the basis of important future work.

References

[1] Engelstädter J and Fortuna NZ. 2018. The dynamics of preferential host switching: host phylogeny as a key predictor of parasite prevalence and distribution. bioRxiv 209254 ver. 5 peer-reviewed by Peer Community In Evolutionary Biology. doi: 10.1101/209254

The dynamics of preferential host switching: host phylogeny as a key predictor of parasite prevalence and distributionJan Engelstaedter & Nicole Fortuna<p>New parasites commonly arise through host-shifts, where parasites from one host species jump to and become established in a new host species. There is much evidence that the probability of host-shifts decreases with increasing phylogenetic dist...Bioinformatics & Computational Biology, Evolutionary Epidemiology, Evolutionary Theory, Macroevolution, Phylogenetics / Phylogenomics, Species interactionsLucy Weinert2017-10-30 02:06:06 View
06 Jun 2019
article picture

Multi-model inference of non-random mating from an information theoretic approach

Tell me who you mate with, I’ll tell you what’s going on

Recommended by and based on reviews by Alexandre Courtiol and 2 anonymous reviewers

The study of sexual selection goes as far as Darwin himself. Since then, elaborate theories concerning both intra- and inter-sexual sexual have been developed, and elegant experiments have been designed to test this body of theory. It may thus come as a surprise that the community is still debating on the correct way to measure simple components of sexual selection, such as the Bateman gradient (i.e., the covariance between the number of matings and the number of offspring)[1,2], or to quantify complex behaviours such as mate choice (the non-random choice of individuals with particular characters as mates)[3,4] and their consequences.
One difficulty in the study of sexual selection is evaluating the consequences of non-random mating. Indeed, when non-random mating is observed in a population, it is often difficult to establish whether such mating pattern leads to i) sexual selection per se (selection pressures favouring certain phenotypes), and/or ii) the non-random association of parental genes in their offspring or not. These two processes differ. In particular, assortative (and disassortative) mating can shape genetic covariances without leading to changes in gene frequencies in the population. Their distinction matters because these two processes lead to different evolutionary outcomes, which can have large ripple effects in the evolution of sexual behaviours, sexual ornamentation, and speciation.
In his paper, entitled “Multi-model inference of non-random mating from an information theoretic approach” [5], Carvajal-Rodríguez tackled this issue. The author generated a simple model in which the consequences of non-random mating can be inferred from information on the population frequencies before and after mating. The procedure is as follows: from the initial population frequencies of phenotypes (or genotypes) of both sexes, the model generates predictions on the frequencies after mating, assuming that particular mating patterns have occurred. This leads to different predictions for the phenotypic (or genotypic) frequencies after mating. The particular mating pattern leading to the best fit with the real frequencies is then identified via a model selection procedure (performing model averaging to combine different mating patterns is also possible).
This study builds on a framework introduced by Carvajal-Rodríguez’s colleagues [6] and encompasses later methodological developments involving the author himself [7]. Compared to early work, the new method proposed by the author builds on the relationship between mating pattern and information [8] to distinguish among scenarios that would lead to non-random mating due to different underlying processes, using simple model selection criterion such as the AICc.
The great asset of the proposed method is that it can be applied to the study of natural populations in which the study of mate choice and sexual selection is notoriously difficult. In the manuscript, the procedure is tested on a population of marine gastropods (Littorina saxatilis). This allows the reader to grasp how the method can be applied to a real system. In fact, anyone can try out the method thanks to the freely available software InfoMating programmed by the author. One important assumption underlying the current method is that the frequencies of unmated individuals do not change during the mating season. If this is not the case, the reader may refer to another publication of the same author which relaxes this assumption [9]. These papers are both instrumental for empiricists interested in testing sexual selection theory.

References

[1] Bateman, A. J. (1948). Intra-sexual selection in Drosophila. Heredity, 2(3), 349-368. doi: 10.1038/hdy.1948.21
[2] Jones, A. G. (2009). On the opportunity for sexual selection, the Bateman gradient and the maximum intensity of sexual selection. Evolution: International Journal of Organic Evolution, 63(7), 1673-1684. doi: 10.1111/j.1558-5646.2009.00664.x
[3] Andersson, M., & Simmons, L. W. (2006). Sexual selection and mate choice. Trends in ecology & evolution, 21(6), 296-302. doi: 10.1016/j.tree.2006.03.015
[4] Kuijper, B., Pen, I., & Weissing, F. J. (2012). A guide to sexual selection theory. Annual Review of Ecology, Evolution, and Systematics, 43, 287-311. doi: 10.1146/annurev-ecolsys-110411-160245
[5] Carvajal-Rodríguez, A. (2019). Multi-model inference of non-random mating from an information theoretic approach. bioRxiv, 305730, ver. 5 peer-reviewed and recommended by PCI Evolutionary Biology. doi: 10.1101/305730
[6] Rolán‐Alvarez, E., & Caballero, A. (2000). Estimating sexual selection and sexual isolation effects from mating frequencies. Evolution, 54(1), 30-36. doi: 10.1111/j.0014-3820.2000.tb00004.x
[7] Carvajal-Rodríguez, A., & Rolan-Alvarez, E. (2006). JMATING: a software for the analysis of sexual selection and sexual isolation effects from mating frequency data. BMC Evolutionary Biology, 6(1), 40. doi: 10.1186/1471-2148-6-40
[8] Carvajal-Rodríguez, A. (2018). Non-random mating and information theory. Theoretical population biology, 120, 103-113. doi: 10.1016/j.tpb.2018.01.003
[9] Carvajal-Rodríguez, A. (2019). A generalization of the informational view of non-random mating: Models with variable population frequencies. Theoretical population biology, 125, 67-74. doi: 10.1016/j.tpb.2018.12.004

Multi-model inference of non-random mating from an information theoretic approachAntonio Carvajal-Rodríguez<p>Non-random mating has a significant impact on the evolution of organisms. Here, I developed a modelling framework for discrete traits (with any number of phenotypes) to explore different models connecting the non-random mating causes (mate comp...Evolutionary Ecology, Evolutionary Theory, Sexual SelectionSara Magalhaes2019-02-08 19:24:03 View
13 Sep 2019
article picture

Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the U.S.A.

New curation method for microsatellite markers improves population genetics analyses

Recommended by based on reviews by Eric Petit, Martin Husemann and 2 anonymous reviewers

Genetic markers are used for in modern population genetics/genomics to uncover the past neutral and selective history of population and species. Besides Single Nucleotide Polymorphisms (SNPs) obtained from whole genome data, microsatellites (or Short Tandem Repeats, SSR) have been common markers of choice in numerous population genetics studies of non-model species with large sample sizes [1]. Microsatellites can be used to uncover and draw inference of the past population demography (e.g. expansion, decline, bottlenecks…), population split, population structure and gene flow, but also life history traits and modes of reproduction (e.g. [2,3]). These markers are widely used in conservation genetics [4] or to study parasites or disease vectors [5]. Microsatellites do show higher mutation rate than SNPs increasing, on the one hand, the statistical power to infer recent events (for example crop domestication, [2,3]), while, on the other hand, decreasing their statistical power over longer time scales due to homoplasy [6].
To perform such analyses, however, an excellent and reliable quality of data is required. As emphasized in the article by De Meeûs et al. [7] three main issues do bias the observed heterozygosity at microsatellites: null alleles, short allele dominance (SAD) and stuttering. These originates from poor PCR amplification. As a result, an excess of homozygosity is observed at the microsatellite loci leading to overestimation of the variation statistics FIS and FST as well as increased linage disequilibrium (LD). For null alleles, several methods and software do help to reduce the bias, and in the present study, De Meeûs et al. [7] propose a way to tackle issues with SAD and stuttering.
The authors study a dataset consisting of 387 samples from 61 subsamples genotyped at nine loci of the species Ixodes scapularis, i.e. ticks transmitting the Lyme disease. Based on correlation methods and FST, FIS they can uncover null alleles and SAD. Stuttering is detected by evaluating the heterozygote deficit between alleles displaying a single repeat difference. Without correction, six loci are affected by one of these amplification problems generating a large deficit of heterozygotes (measured by significant FIS and FST) remaining so after correction for the false discovery rate (FDR). These results would be classically interpreted as a strong Wahlund effect and/or selection at several loci.
After correcting for null alleles, the authors apply two novel corrections: 1) a re-examination of the chromatograms reveals previously disregarded larger alleles thus decreasing SAD, and 2) pooling alleles close in size decreasing stuttering. The corrected dataset shows then a significant excess of heterozygotes as could be expected in a dioecious species with strong population structure. The FDR correction removes then the significant excess of homozygotes and LD between pairs of loci. FST on the cured dataset is used to demonstrate the strong population structure and small effective subpopulation sizes. This is confirmed by a clustering analysis using discriminant analysis of principal components (DAPC).
While based on a specific dataset of ticks from different populations sampled across the USA, the generality of the authors’ approach is presented in Figure 6 in which they provide a step by step flowchart to cure microsatellite datasets from null alleles, SAD and stuttering. Several criteria based on FIS, FST and LD between loci are used as decision keys in the flowchart. An excel file is also provided as help for the curation steps. This study and the proposed methodology are thus extremely useful for all population geneticists working on non-model species with large number of samples genotyped at microsatellite markers. The method not only allows more accurate estimates of heterozygosity but also prevents the thinning of datasets due to the removal of problematic loci. As a follow-up and extension of this work, an exhaustive simulation study could investigate the influence of these data quality issues on past demographic and population structure inference under a wide range of scenarios. This would allow to quantify the current biases in the literature and the robustness of the methodology devised by De Meeûs et al. [7].

References

[1] Jarne, P., and Lagoda, P. J. (1996). Microsatellites, from molecules to populations and back. Trends in ecology & evolution, 11(10), 424-429. doi: 10.1016/0169-5347(96)10049-5
[2] Cornille, A., Giraud, T., Bellard, C., Tellier, A., Le Cam, B., Smulders, M. J. M., Kleinschmit, J., Roldan-Ruiz, I. and Gladieux, P. (2013). Postglacial recolonization history of the E uropean crabapple (Malus sylvestris M ill.), a wild contributor to the domesticated apple. Molecular Ecology, 22(8), 2249-2263. doi: 10.1111/mec.12231
[3] Parat, F., Schwertfirm, G., Rudolph, U., Miedaner, T., Korzun, V., Bauer, E., Schön C.-C. and Tellier, A. (2016). Geography and end use drive the diversification of worldwide winter rye populations. Molecular ecology, 25(2), 500-514. doi: 10.1111/mec.13495
[4] Broquet, T., Ménard, N., & Petit, E. (2007). Noninvasive population genetics: a review of sample source, diet, fragment length and microsatellite motif effects on amplification success and genotyping error rates. Conservation Genetics, 8(1), 249-260. doi: 10.1007/s10592-006-9146-5
[5] Koffi, M., De Meeûs, T., Séré, M., Bucheton, B., Simo, G., Njiokou, F., Salim, B., Kaboré, J., MacLeod, A., Camara, M., Solano, P., Belem, A. M. G. and Jamonneau, V. (2015). Population genetics and reproductive strategies of African trypanosomes: revisiting available published data. PLoS neglected tropical diseases, 9(10), e0003985. doi: 10.1371/journal.pntd.0003985
[6] Estoup, A., Jarne, P., & Cornuet, J. M. (2002). Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Molecular ecology, 11(9), 1591-1604. doi: 10.1046/j.1365-294X.2002.01576.x
[7] De Meeûs, T., Chan, C. T., Ludwig, J. M., Tsao, J. I., Patel, J., Bhagatwala, J., and Beati, L. (2019). Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the USA. bioRxiv, 622373, ver. 4 peer-reviewed and recommended by Peer Community In Evolutionary Biology. doi: 10.1101/622373

Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the U.S.A.Thierry De Meeûs, Cynthia T. Chan, John M. Ludwig, Jean I. Tsao, Jaymin Patel, Jigar Bhagatwala, and Lorenza Beati<p>Null alleles, short allele dominance (SAD), and stuttering increase the perceived relative inbreeding of individuals and subpopulations as measured by Wright’s FIS and FST. Ascertainment bias, due to such amplifying problems are usually caused ...Evolutionary Ecology, Other, Population Genetics / GenomicsAurelien Tellier2019-05-02 20:52:08 View
11 Dec 2020
article picture

Quantifying transmission dynamics of acute hepatitis C virus infections in a heterogeneous population using sequence data

Phylodynamics of hepatitis C virus reveals transmission dynamics within and between risk groups in Lyon

Recommended by based on reviews by Chris Wymant and Louis DuPlessis

Genomic epidemiology seeks to better understand the transmission dynamics of infectious pathogens using molecular sequence data. Phylodynamic methods have given genomic epidemiology new power to track the transmission dynamics of pathogens by combining phylogenetic analyses with epidemiological modeling. In recent year, applications of phylodynamics to chronic viral infections such as HIV and hepatitis C virus (HVC) have provided some of the best examples of how phylodynamic inference can provide valuable insights into transmission dynamics within and between different subpopulations or risk groups, allowing for more targeted interventions.
However, conducting phylodynamic inference under complex epidemiological models comes with many challenges. In some cases, it is not always straightforward or even possible to perform likelihood-based inference. Structured SIR-type models where infected individuals can belong to different subpopulations provide a classic example. In this case, the model is both nonlinear and has a high-dimensional state space due to tracking different types of hosts. Computing the likelihood of a phylogeny under such a model involves complex numerical integration or data augmentation methods [1]. In these situations, Approximate Bayesian Computation (ABC) provides an attractive alternative, as Bayesian inference can be performed without computing likelihoods as long as one can efficiently simulate data under the model to compare against empirical observations [2].
Previous work has shown how ABC approaches can be applied to fit epidemiological models to phylogenies [3,4]. Danesh et al. [5] further demonstrate the real world merits of ABC by fitting a structured SIR model to HCV data from Lyon, France. Using this model, they infer viral transmission dynamics between “classical” hosts (typically injected drug users) and “new” hosts (typically young MSM) and show that a recent increase in HCV incidence in Lyon is due to considerably higher transmission rates among “new” hosts . This study provides another great example of how phylodynamic analysis can help epidemiologists understand transmission patterns within and between different risk groups and the merits of expanding our toolkit of statistical methods for phylodynamic inference.

References

[1] Rasmussen, D. A., Volz, E. M., and Koelle, K. (2014). Phylodynamic inference for structured epidemiological models. PLoS Comput Biol, 10(4), e1003570. doi: https://doi.org/10.1371/journal.pcbi.1003570
[2] Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics, 162(4), 2025-2035.
[3] Ratmann, O., Donker, G., Meijer, A., Fraser, C., and Koelle, K. (2012). Phylodynamic inference and model assessment with approximate bayesian computation: influenza as a case study. PLoS Comput Biol, 8(12), e1002835. doi: https://doi.org/10.1371/journal.pcbi.1002835
[4] Saulnier, E., Gascuel, O., and Alizon, S. (2017). Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study. PLoS computational biology, 13(3), e1005416. doi: https://doi.org/10.1371/journal.pcbi.1005416
[5] Danesh, G., Virlogeux, V., Ramière, C., Charre, C., Cotte, L. and Alizon, S. (2020) Quantifying transmission dynamics of acute hepatitis C virus infections in a heterogeneous population using sequence data. bioRxiv, 689158, ver. 5 peer-reviewed and recommended by PCI Evol Biol. doi: https://doi.org/10.1101/689158

Quantifying transmission dynamics of acute hepatitis C virus infections in a heterogeneous population using sequence dataGonche Danesh, Victor Virlogeux, Christophe Ramière, Caroline Charre, Laurent Cotte, Samuel Alizon<p>Opioid substitution and syringes exchange programs have drastically reduced hepatitis C virus (HCV) spread in France but HCV sexual transmission in men having sex with men (MSM) has recently arisen as a significant public health concern. The fa...Evolutionary Epidemiology, Phylogenetics / PhylogenomicsDavid Rasmussen2019-07-11 13:37:23 View
06 Apr 2021
article picture

How robust are cross-population signatures of polygenic adaptation in humans?

Be careful when studying selection based on polygenic score overdispersion

Recommended by ORCID_LOGO based on reviews by Lawrence Uricchio, Mashaal Sohail, Barbara Bitarello and 1 anonymous reviewer

The advent of genome-wide association studies (GWAS) has been a great promise for our understanding of the connection between genotype and phenotype. Today, the NHGRI-EBI GWAS catalog contains 251,401 associations from 4,961 studies (1). This wealth of studies has also generated interest to use the summary statistics beyond the few top hits in order to make predictions for individuals without known phenotype, e.g. to predict polygenic risk scores or to study polygenic selection by comparing different groups. For instance, polygenic selection acting on the most studied polygenic trait, height, has been subject to multiple studies during the past decade (e.g. 2–6). They detected north-south gradients in Europe which were consistent with expectations. However, their GWAS summary statistics were based on the GIANT consortium data set, a meta-analysis of GWAS conducted in different European cohorts (7,8). The availability of large data sets with less stratification such as the UK Biobank (9) has led to a re-evaluation of those results. The nature of the GIANT consortium data set was realized to represent a potential problem for studies of polygenic adaptation which led several of the authors of the original articles to caution against the interpretations of polygenic selection on height (10,11). This was a great example on how the scientific community assessed their own earlier results in a critical way as more data became available. At the same time it left the question whether there is detectable polygenic selection separating populations more open than ever.

Generally, recent years have seen several articles critically assessing the portability of GWAS results and risk score predictions to other populations (12–14). Refoyo-Martínez et al. (15) are now presenting a systematic assessment on the robustness of cross-population signatures of polygenic adaptation in humans. They compiled GWAS results for complex traits which have been studied in more than one cohort and then use allele frequencies from the 1000 Genomes Project data (16) set to detect signals of polygenic score overdispersion. As the source for the allele frequencies is kept the same across all tests, differences between the signals must be caused by the underlying GWAS. The results are concerning as the level of overdispersion largely depends on the choice of GWAS cohort. Cohorts with homogenous ancestries show little to no overdispersion compared to cohorts of mixed ancestries such as meta-analyses. It appears that the meta-analyses fail to fully account for stratification in their data sets.

The authors based most of their analyses on the heavily studied trait height. Additionally, they use educational attainment (measured as the number of school years of an individual) as an example. This choice was due to the potential over- or misinterpretation of results by the media, the general public and by far right hate groups. Such traits are potentially confounded by unaccounted cultural and socio-economic factors. Showing that previous results about polygenic selection on educational attainment are not robust is an important result that needs to be communicated well. This forms a great example for everyone working in human genomics. We need to be aware that our results can sometimes be misinterpreted. And we need to make an effort to write our papers and communicate our results in a way that is honest about the limitations of our research and that prevents the misuse of our results by hate groups.

This article represents an important contribution to the field. It is cruicial to be aware of potential methodological biases and technical artifacts. Future studies of polygenic adaptation need to be cautious with their interpretations of polygenic score overdispersion. A recommendation would be to use GWAS results obtained in homogenous cohorts. But even if different biobank-scale cohorts of homogeneous ancestry are employed, there will always be some remaining risk of unaccounted stratification. These conclusions may seem sobering but they are part of the scientific process. We need additional controls and new, different methods than polygenic score overdispersion for assessing polygenic selection. Last year also saw the presentation of a novel approach using sequence data and GWAS summary statistics to detect directional selection on a polygenic trait (17). This new method appears to be robust to bias stemming from stratification in the GWAS cohort as well as other confounding factors. Such new developments show light at the end of the tunnel for the use of GWAS summary statistics in the study of polygenic adaptation.

References

1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research. 2019 Jan 8;47(D1):D1005–12. doi: https://doi.org/10.1093/nar/gky1120

2. Turchin MC, Chiang CW, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genetics. 2012 Sep;44(9):1015–9. doi: https://doi.org/10.1038/ng.2368

3. Berg JJ, Coop G. A Population Genetic Signal of Polygenic Adaptation. PLOS Genetics. 2014 Aug 7;10(8):e1004412. doi: https://doi.org/10.1371/journal.pgen.1004412

4. Robinson MR, Hemani G, Medina-Gomez C, Mezzavilla M, Esko T, Shakhbazov K, et al. Population genetic differentiation of height and body mass index across Europe. Nature Genetics. 2015 Nov;47(11):1357–62. doi: https://doi.org/10.1038/ng.3401

5. Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015 Dec;528(7583):499–503. doi: https://doi.org/10.1038/nature16152

6. Racimo F, Berg JJ, Pickrell JK. Detecting polygenic adaptation in admixture graphs. Genetics. 2018. Arp;208(4):1565–1584. doi: https://doi.org/10.1534/genetics.117.300489

7. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010 Oct;467(7317):832–8. doi: https://doi.org/10.1038/nature09410

8. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014 Nov;46(11):1173–86. doi: https://doi.org/10.1038/ng.3097

9. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018 Oct;562(7726):203–9. doi: https://doi.org/10.1038/s41586-018-0579-z

10. Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019 Mar 21;8:e39725. doi: https://doi.org/10.7554/eLife.39725

11. Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019 Mar 21;8:e39702. doi: https://doi.org/10.7554/eLife.39702

12. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 2019 Apr;51(4):584–91. doi: https://doi.org/10.1038/s41588-019-0379-x

13. Bitarello BD, Mathieson I. Polygenic Scores for Height in Admixed Populations. G3: Genes, Genomes, Genetics. 2020 Nov 1;10(11):4027–36. doi: https://doi.org/10.1534/g3.120.401658

14. Uricchio LH, Kitano HC, Gusev A, Zaitlen NA. An evolutionary compass for detecting signals of polygenic selection and mutational bias. Evolution Letters. 2019;3(1):69–79. doi: https://doi.org/10.1002/evl3.97

15. Refoyo-Martínez A, Liu S, Jørgensen AM, Jin X, Albrechtsen A, Martin AR, Racimo F. How robust are cross-population signatures of polygenic adaptation in humans? bioRxiv, 2021, 2020.07.13.200030, version 5 peer-reviewed and recommended by Peer community in Evolutionary Biology. doi: https://doi.org/10.1101/2020.07.13.200030

16. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al. A global reference for human genetic variation. Nature. 2015 Sep 30;526(7571):68–74. doi: https://doi.org/10.1038/nature15393

17. Stern AJ, Speidel L, Zaitlen NA, Nielsen R. Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies. bioRxiv. 2020 May 8;2020.05.07.083402. doi: https://doi.org/10.1101/2020.05.07.083402

How robust are cross-population signatures of polygenic adaptation in humans?Alba Refoyo-Martínez, Siyang Liu, Anja Moltke Jørgensen, Xin Jin, Anders Albrechtsen, Alicia R. Martin, Fernando Racimo<p>Over the past decade, summary statistics from genome-wide association studies (GWASs) have been used to detect and quantify polygenic adaptation in humans. Several studies have reported signatures of natural selection at sets of SNPs associated...Bioinformatics & Computational Biology, Genetic conflicts, Human Evolution, Population Genetics / GenomicsTorsten Günther2020-08-14 15:06:54 View
04 Sep 2019
article picture

The discernible and hidden effects of clonality on the genotypic and genetic states of populations: improving our estimation of clonal rates

How to estimate clonality from genetic data: use large samples and consider the biology of the species

Recommended by ORCID_LOGO based on reviews by David Macaya-Sanz, Marcela Van Loo and 1 anonymous reviewer

Population geneticists frequently use the genetic and genotypic information of a population sample of individuals to make inferences on the reproductive system of a species. The detection of clones, i.e. individuals with the same genotype, can give information on whether there is clonal (vegetative) reproduction in the species. If clonality is detected, population geneticists typically use genotypic richness R, the number of distinct genotypes relative to the sample size, to estimate the rate of clonality c, which can be defined as the proportion of reproductive events that are clonal. Estimating the rate of clonality based on genotypic richness is however problematic because, to date, there is no analytical, nor simulation-based, characterization of this relationship. Furthermore, the effect of sampling on this relationship has never been critically examined.
The paper by Stoeckel, Porro and Arnaud-Haond [1] contributes significantly to the characterization of the relationship between rate of clonality and genetic and genotypic parameters in a population. The authors use an extensive individual-based simulation approach to assess the effects of rate of clonality (fully sexual, fully clonal and a range of intermediate levels of clonality, i.e., partial clonality) on genetic and genotypic parameters, considering variable population size, sample size, and numbers of generations elapsed since population initiation. Based on their simulations, they derive empirical formulae that link for the first time the rate of clonality to the genotypic richness and to the size distribution of clones (genotypic parameters), as well as to the population inbreeding coefficient and to a metric of linkage disequilibrium (genetic parameters). They then use the simulated data to assess the accuracy of their predictions. In a second phase, the authors use a Bayesian supervised learning algorithm to estimate rates of clonality from the simulated data.
The authors show that the relationship between rate of clonality and genotypic richness is not linear: genotypic richness decreases slowly with increasing clonality, a large drop in genotypic richness is only seen for rates of clonality ≥ 0.90. Genetic parameters are only sensitive to high rates of clonality. The practical implications of these results are that genotypic and genetic parameters can complement each other for the estimation of rates of clonality, with genotypic parameters most useful throughout most of the range of clonality values and with genetic parameters complementing them meaningfully at higher values. The most meaningful practical result of the paper is the demonstration of sampling bias on the estimation of genotypic richness. Commonly used population sample sizes in population genetics studies (n ≤ 50) lead to great overestimation of genotypic richness, which consequently leads to a severe underestimation of the rate of clonality in most systems, irrespectively of whether they have reached stationary equilibrium. Only in small populations, these effects are attenuated.
Biologists interested in the estimation of the rate of clonality will find this paper highly useful to design their sampling, and to choose their statistics for inference in a meaningful way. This paper also calls for a careful reappraisal of previously published works that infer rates of clonality from genetic data, and highlights the prime importance of complementary information on species life history data for a correct understanding of partial clonality.

References

[1] Stoeckel, S., Porro, B., and Arnaud-Haond, S. (2019). The discernible and hidden effects of clonality on the genotypic and genetic states of populations: improving our estimation of clonal rates. ArXiv:1902.09365 [q-Bio] v4 peer-reviewed and recommended by Peer Community in Evolutionary Biology. Retrieved from http://arxiv.org/abs/1902.09365v4

The discernible and hidden effects of clonality on the genotypic and genetic states of populations: improving our estimation of clonal ratesSolenn Stoeckel, Barbara Porro, Sophie Arnaud-Haond<p>Partial clonality is widespread across the tree of life, but most population genetics models are conceived for exclusively clonal or sexual organisms. This gap hampers our understanding of the influence of clonality on evolutionary trajectories...Population Genetics / Genomics, Reproduction and SexMyriam Heuertz2019-02-28 10:10:56 View