RAMOS-ONSINS Sebastian Ernesto
- Statistical and Population Genomics, Centre for Research in Agricultural Genomics (CRAG) Consorci CSIC-IRTA-UAB-UB, Bellaterra (Barcelona), Spain
- Adaptation, Bioinformatics & Computational Biology, Genome Evolution, Molecular Evolution, Population Genetics / Genomics
Modularity of genes involved in local adaptation to climate despite physical linkage
Differential effect of genes in diverse environments, their role in local adaptation and the interference between genes that are physically linked
The genome of eukaryotic species is a complex structure that experience many different interactions within itself and with the surrounding environment. The genetic architecture of a phenotype (that is, the set of genetic elements affecting a trait of the organism) plays a fundamental role in understanding the adaptation process of a species to, for example, different climate environments, or to its interaction with other species. Thus, it is fundamental to study the different aspects of the genetic architecture of the species and its relationship with its surronding environment. Aspects such as modularity (the number of genetic units and the degree to which each unit is affecting a trait of the organism), pleiotropy (the number of different effects that a genetic unit can have on an organism) or linkage (the degree of association between the different genetic units) are essential to understand the genetic architecture and to interpret the effects of selection on the genome. Indeed, the knowledge of the different aspects of the genetic architecture could clarify whether genes are affected by multiple aspects of the environment or, on the contrary, are affected by only specific aspects [1,2].
The work performed by Lotterhos et al.  sought to understand the genetic architecture of the adaptation to different environments in lodgepole pine (Pinus contorta), considering as candidate SNPs those previously detected as a result of its extreme association patterns to different environmental variables or to extreme population differentiation. This consideration is very important because the study is only relevant if the studied markers are under the effect of selection. Otherwise, the genetic architecture of the adaptation to different environments would be masked by other (neutral) kind of associations that would be difficult to interpret [4,5]. In order to understand the relationship between genetic architecture and adaptation, it is relevant to detect the association networks of the candidate SNPs with climate variables (a way to measure modularity) and if these SNPs (and loci) are affected by single or multiple environments (a way to measure pleiotropy).
The authors used co-association networks, an innovative approach in this field, to analyse the interaction between the environmental information and the genetic polymorphism of each individual. This methodology is more appropriate than other multivariate methods - such as analysis based on principal components - because it is possible to cluster SNPs based on associations with similar environmental variables. In this sense, the co-association networks allowed to both study the genetic and physical linkage between different co-associations modules but also to compare two different models of evolution: a Modular environmental response architecture (specific genes are affected by specific aspects of the environment) or a Universal pleiotropic environmental response architecture (all genes are affected by all aspects of the environment). The representation of different correlations between allelic frequency and environmental factors (named galaxy biplots) are especially informative to understand the effect of the different clusters on specific aspects of the environment (for example, the co-association network ‘Aridity’ shows strong associations with hot/wet versus cold/dry environments).
The analysis performed by Lotterhos et al. , although it has some unavoidable limitations (e.g., only extreme candidate SNPs are selected, limiting the results to the stronger effects; the genetic and physical map is incomplete in this species), includes relevant results and also implements new methodologies in the field. To highlight some of them: the preponderance of a Modular environmental response architecture (evolution in separated modules), the detection of physical linkage among SNPs that are co-associated with different aspects of the environment (which was unexpected a priori), the implementation of co-association networks and galaxy biplots to see the effect of modularity and pleiotropy on different aspects of environment. Finally, this work contains remarkable introductory Figures and Tables explaining unambiguously the main concepts  included in this study. This work can be treated as a starting point for many other future studies in the field.
 Hancock AM, Brachi B, Faure N, Horton MW, Jarymowycz LB, Sperone FG, Toomajian C, Roux F & Bergelson J. 2011. Adaptation to climate across the Arabidopsis thaliana genome. Science 334: 83–86. doi: 10.1126/science.1209244
 Wagner GP & Zhang J. The pleiotropic structure of the genotypephenotype map: the evolvability of complex organisms. Nature Review Genetics 12: 204–213. doi: 10.1038/nrg2949
 Lotterhos KE, Yeaman S, Degner J, Aitken S, Hodgins K. 2018. Modularity of genes involved in local adaptation to climate despite physical linkage. bioRxiv 202481, ver. 4 peer-reviewed by Peer Community In Evolutionary Biology. doi: 10.1101/202481
 Lotterhos KE & Whitlock MC. 2014. Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests. Molecular Ecology 23: 2178–2192. doi: 10.1111/mec.12725
 Lotterhos KE & Whitlock MC. 2015. The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Molecular Ecology 24: 1031–1046. doi: 10.1111/mec.13100
 Paaby AB & Rockman MV. 2013. The many faces of pleiotropy. Trends in Genetics 29: 66-73. doi: 10.1016/j.tig.2012.10.010
A novel workflow to improve multi-locus genotyping of wildlife species: an experimental set-up with a known model system
Improving the reliability of genotyping of multigene families in non-model organisms
The reliability of published scientific papers has been the topic of much recent discussion, notably in the biomedical sciences . Although small sample size is regularly pointed as one of the culprits, big data can also be a concern. The advent of high-throughput sequencing, and the processing of sequence data by opaque bioinformatics workflows, mean that sequences with often high error rates are produced, and that exact but slow analyses are not feasible.
The troubles with bioinformatics arise from the increased complexity of the tools used by scientists, and from the lack of incentives and/or skills from authors (but also reviewers and editors) to make sure of the quality of those tools. As a much discussed example, a bug in the widely used PLINK software  has been pointed as the explanation  for incorrect inference of selection for increased height in European Human populations .
High-throughput sequencing often generates high rates of genotyping errors, so that the development of bioinformatics tools to assess the quality of data and correct them is a major issue. The work of Gillingham et al.  contributes to the latter goal. In this work, the authors propose a new bioinformatics workflow (ACACIA) for performing genotyping analysis of multigene complexes, such as self-incompatibility genes in plants, major histocompatibility genes (MHC) in vertebrates, and homeobox genes in animals, which are particularly challenging to genotype in non-model organisms. PCR and sequencing of multigene families generate artefacts, hence spurious alleles. A key to Gillingham et al.‘ s method is to call candidate genes based on Oligotyping, a software pipeline originally conceived for identifying variants from microbiome 16S rRNA amplicons . This allows to reduce the number of false positives and the number of dropout alleles, compared to previous workflows.
This method is not based on an explicit probability model, and thus it is not conceived to provide a control of the rate of errors as, say, a valid confidence interval should (a confidence interval with coverage c for a parameter should contain the parameter with probability c, so the error rate 1- c is known and controlled by the user who selects the value of c). However, the authors suggest a method to adapt the settings of ACACIA to each application.
To compare and validate the new workflow, the authors have constructed new sets of genotypes representing different extents copy number variation, using already known genotypes from chicken MHC. In such conditions, it was possible to assess how many alleles are not detected and what is the rate of false positives. Gillingham et al. additionally investigated the effect of using non-optimal primers. They found better performance of ACACIA compared to a preexisting pipeline, AmpliSAS , for optimal settings of both methods. However, they do not claim that ACACIA will always be better than AmpliSAS. Rather, they warn against the common practice of using the default settings of the latter pipeline. Altogether, this work and the ACACIA workflow should allow for better ascertainment of genotypes from multigene families.
 Ioannidis, J. P. A, Greenland, S., Hlatky, M. A., Khoury, M. J., Macleod, M. R., Moher, D., Schulz, K. F. and Tibshirani, R. (2014) Increasing value and reducing waste in research design, conduct, and analysis. The Lancet, 383, 166-175. doi: 10.1016/S0140-6736(13)62227-8
 Chang, C. C., Chow, C. C., Tellier, L. C. A. M., Vattikuti, S., Purcell, S. M. and Lee, J. J. (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4, 7, s13742-015-0047-8. doi: 10.1186/s13742-015-0047-8
 Robinson, M. R. and Visscher, P. (2018) Corrected sibling GWAS data release from Robinson et al. http://cnsgenomics.com/data.html
 Field, Y., Boyle, E. A., Telis, N., Gao, Z., Gaulton, K. J., Golan, D., Yengo, L., Rocheleau, G., Froguel, P., McCarthy, M.I . and Pritchard J. K. (2016) Detection of human adaptation during the past 2000 years. Science, 354(6313), 760-764. doi: 10.1126/science.aag0776
 Gillingham, M. A. F., Montero, B. K., Wihelm, K., Grudzus, K., Sommer, S. and Santos P. S. C. (2020) A novel workflow to improve multi-locus genotyping of wildlife species: an experimental set-up with a known model system. bioRxiv 638288, ver. 3 peer-reviewed and recommended by Peer Community In Evolutionary Biology. doi: 10.1101/638288
 Eren, A. M., Maignien, L., Sul, W. J., Murphy, L. G., Grim, S. L., Morrison, H. G., and Sogin, M.L. (2013) Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data. Methods in Ecology and Evolution 4(12), 1111-1119. doi: 10.1111/2041-210X.12114
 Sebastian, A., Herdegen, M., Migalska, M. and Radwan, J. (2016) AMPLISAS: a web server for multilocus genotyping using next‐generation amplicon sequencing data. Mol Ecol Resour, 16, 498-510. doi: 10.1111/1755-0998.12453