Submit a preprint

838

Faster model-based estimation of ancestry proportionsuse asterix (*) to get italics
Cindy G. Santander, Alba Refoyo Martinez, Jonas MeisnerPlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"
2024
<p>Ancestry estimation from genotype data in unrelated individuals has become an essential tool in population and medical genetics to understand demographic population histories and to model or correct for population structure. The ADMIXTURE software is a widely used model-based approach to account for population stratification, however, it struggles with convergence issues and does not scale to modern human datasets or the large number of variants in whole-genome sequencing data. Likelihood-free approaches optimize a least square objective and have gained popularity in recent years due to their scalability. However, this comes at the cost of accuracy in the ancestry estimates in more complex admixture scenarios. We present a new model-based approach, fastmixture, which adopts aspects from likelihood-free approaches for parameter initialization, followed by a mini-batch expectation-maximization procedure to model the standard likelihood. We demonstrate in a simulation study that the model-based approaches of fastmixture and ADMIXTURE are significantly more accurate than recent and likelihood-free approaches. We further show that fastmixture runs approximately 30x faster than ADMIXTURE on both simulated and empirical data from the 1000 Genomes Project such that our model-based approach scales to much larger sample sizes than previously possible. Our software is freely available at https://github.com/Rosemeis/fastmixture.</p>
https://doi.org/10.5281/zenodo.12683371You should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://
https://doi.org/10.5281/zenodo.12683371You should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://
https://github.com/Rosemeis/fastmixtureYou should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://
admixture, population structure, software
NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).
Bioinformatics & Computational Biology, Human Evolution, Population Genetics / Genomics
Matteo Fumagalli, m.fumagalli@qmul.ac.uk, Mashaal Sohail, mashaal@ccg.unam.mx, Diego Ortega-Del Vecchyo, dortega@liigh.unam.mx, Harald Ringbauer, harald_ringbauer@eva.mpg.de, Thomas Bataillon, tbata@birc.au.dk, Garrett Hellenthal, g.hellenthal@ucl.ac.uk, Lucy van Dorp, lucy.dorp.12@ucl.ac.uk, Benjamin Peter, benjamin_peter@eva.mpg.de, Harald Ringbauer suggested: Sorry for declining to review, but I am about to go to paternity leave any day now (when our baby is born)., Harald Ringbauer suggested: As suitable alternative reviewers I could suggest:, Harald Ringbauer suggested: Prof. John Novembre (jnovembre@uchicago.edu), Harald Ringbauer suggested: Dr. Benjamin Peter (benjamin_peter@eva.mpg.de), Oscar Lao Grueso suggested: Francesc Calafell (francesc.calafell@upf.edu), Frederic Austerlitz suggested: Paul Verdu <paul.verdu@mnhn.fr>
e.g. John Doe john@doe.com
No need for them to be recommenders of PCIEvolBiol. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct
e.g. John Doe john@doe.com
2024-07-14 11:48:39
Matteo Fumagalli