Be careful when studying selection based on polygenic score overdispersion
How robust are cross-population signatures of polygenic adaptation in humans?
The advent of genome-wide association studies (GWAS) has been a great promise for our understanding of the connection between genotype and phenotype. Today, the NHGRI-EBI GWAS catalog contains 251,401 associations from 4,961 studies (1). This wealth of studies has also generated interest to use the summary statistics beyond the few top hits in order to make predictions for individuals without known phenotype, e.g. to predict polygenic risk scores or to study polygenic selection by comparing different groups. For instance, polygenic selection acting on the most studied polygenic trait, height, has been subject to multiple studies during the past decade (e.g. 2–6). They detected north-south gradients in Europe which were consistent with expectations. However, their GWAS summary statistics were based on the GIANT consortium data set, a meta-analysis of GWAS conducted in different European cohorts (7,8). The availability of large data sets with less stratification such as the UK Biobank (9) has led to a re-evaluation of those results. The nature of the GIANT consortium data set was realized to represent a potential problem for studies of polygenic adaptation which led several of the authors of the original articles to caution against the interpretations of polygenic selection on height (10,11). This was a great example on how the scientific community assessed their own earlier results in a critical way as more data became available. At the same time it left the question whether there is detectable polygenic selection separating populations more open than ever.
Generally, recent years have seen several articles critically assessing the portability of GWAS results and risk score predictions to other populations (12–14). Refoyo-Martínez et al. (15) are now presenting a systematic assessment on the robustness of cross-population signatures of polygenic adaptation in humans. They compiled GWAS results for complex traits which have been studied in more than one cohort and then use allele frequencies from the 1000 Genomes Project data (16) set to detect signals of polygenic score overdispersion. As the source for the allele frequencies is kept the same across all tests, differences between the signals must be caused by the underlying GWAS. The results are concerning as the level of overdispersion largely depends on the choice of GWAS cohort. Cohorts with homogenous ancestries show little to no overdispersion compared to cohorts of mixed ancestries such as meta-analyses. It appears that the meta-analyses fail to fully account for stratification in their data sets.
The authors based most of their analyses on the heavily studied trait height. Additionally, they use educational attainment (measured as the number of school years of an individual) as an example. This choice was due to the potential over- or misinterpretation of results by the media, the general public and by far right hate groups. Such traits are potentially confounded by unaccounted cultural and socio-economic factors. Showing that previous results about polygenic selection on educational attainment are not robust is an important result that needs to be communicated well. This forms a great example for everyone working in human genomics. We need to be aware that our results can sometimes be misinterpreted. And we need to make an effort to write our papers and communicate our results in a way that is honest about the limitations of our research and that prevents the misuse of our results by hate groups.
This article represents an important contribution to the field. It is cruicial to be aware of potential methodological biases and technical artifacts. Future studies of polygenic adaptation need to be cautious with their interpretations of polygenic score overdispersion. A recommendation would be to use GWAS results obtained in homogenous cohorts. But even if different biobank-scale cohorts of homogeneous ancestry are employed, there will always be some remaining risk of unaccounted stratification. These conclusions may seem sobering but they are part of the scientific process. We need additional controls and new, different methods than polygenic score overdispersion for assessing polygenic selection. Last year also saw the presentation of a novel approach using sequence data and GWAS summary statistics to detect directional selection on a polygenic trait (17). This new method appears to be robust to bias stemming from stratification in the GWAS cohort as well as other confounding factors. Such new developments show light at the end of the tunnel for the use of GWAS summary statistics in the study of polygenic adaptation.
1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research. 2019 Jan 8;47(D1):D1005–12. doi: https://doi.org/10.1093/nar/gky1120
2. Turchin MC, Chiang CW, Palmer CD, Sankararaman S, Reich D, Hirschhorn JN. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genetics. 2012 Sep;44(9):1015–9. doi: https://doi.org/10.1038/ng.2368
3. Berg JJ, Coop G. A Population Genetic Signal of Polygenic Adaptation. PLOS Genetics. 2014 Aug 7;10(8):e1004412. doi: https://doi.org/10.1371/journal.pgen.1004412
4. Robinson MR, Hemani G, Medina-Gomez C, Mezzavilla M, Esko T, Shakhbazov K, et al. Population genetic differentiation of height and body mass index across Europe. Nature Genetics. 2015 Nov;47(11):1357–62. doi: https://doi.org/10.1038/ng.3401
5. Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA, et al. Genome-wide patterns of selection in 230 ancient Eurasians. Nature. 2015 Dec;528(7583):499–503. doi: https://doi.org/10.1038/nature16152
6. Racimo F, Berg JJ, Pickrell JK. Detecting polygenic adaptation in admixture graphs. Genetics. 2018. Arp;208(4):1565–1584. doi: https://doi.org/10.1534/genetics.117.300489
7. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010 Oct;467(7317):832–8. doi: https://doi.org/10.1038/nature09410
8. Wood AR, Esko T, Yang J, Vedantam S, Pers TH, Gustafsson S, et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014 Nov;46(11):1173–86. doi: https://doi.org/10.1038/ng.3097
9. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018 Oct;562(7726):203–9. doi: https://doi.org/10.1038/s41586-018-0579-z
10. Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Field Y, et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019 Mar 21;8:e39725. doi: https://doi.org/10.7554/eLife.39725
11. Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Turchin MC, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019 Mar 21;8:e39702. doi: https://doi.org/10.7554/eLife.39702
12. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 2019 Apr;51(4):584–91. doi: https://doi.org/10.1038/s41588-019-0379-x
13. Bitarello BD, Mathieson I. Polygenic Scores for Height in Admixed Populations. G3: Genes, Genomes, Genetics. 2020 Nov 1;10(11):4027–36. doi: https://doi.org/10.1534/g3.120.401658
14. Uricchio LH, Kitano HC, Gusev A, Zaitlen NA. An evolutionary compass for detecting signals of polygenic selection and mutational bias. Evolution Letters. 2019;3(1):69–79. doi: https://doi.org/10.1002/evl3.97
15. Refoyo-Martínez A, Liu S, Jørgensen AM, Jin X, Albrechtsen A, Martin AR, Racimo F. How robust are cross-population signatures of polygenic adaptation in humans? bioRxiv, 2021, 2020.07.13.200030, version 5 peer-reviewed and recommended by Peer community in Evolutionary Biology. doi: https://doi.org/10.1101/2020.07.13.200030
16. Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al. A global reference for human genetic variation. Nature. 2015 Sep 30;526(7571):68–74. doi: https://doi.org/10.1038/nature15393
17. Stern AJ, Speidel L, Zaitlen NA, Nielsen R. Disentangling selection on genetically correlated polygenic traits using whole-genome genealogies. bioRxiv. 2020 May 8;2020.05.07.083402. doi: https://doi.org/10.1101/2020.05.07.083402
Torsten Günther (2021) Be careful when studying selection based on polygenic score overdispersion. Peer Community in Evolutionary Biology, 100125. 10.24072/pci.evolbiol.100125
Revision round #22021-02-22
Decision round #2
Thank you very much for the revised version of the preprint. The reviewers and myself are very happy with the way you have addressed the comments from the previous round. There are still some minor edits to be done before we can start the recommendation process.
Reviewed by Lawrence Uricchio, 2021-02-17 06:51
Reviewed by Mashaal Sohail, 2021-01-19 20:18
Reviewed by Barbara Bitarello, 2021-02-19 19:38
Revision round #12020-09-24
Decision round #1
Thank you very much for your patience. Your preprint has been seen by three expert reviewers. They all provide a very detailed list of reasonable comments but no major criticism, so I think it should be possible to address them in a revised version of the manuscript.
I am looking forward to receiving your revised preprint.
Additional requirements of the managing board:
As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that:
-Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data.
-Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused.
-Details on experimental procedures are available to readers in the text or as appendices.
-Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.”