Recommendation

New insights into the dynamics of selective sweeps in seed-banked species

Renaud Vitalis based on reviews by Guillaume Achaz, Jere Koskela, William Shoemaker and Simon Boitard

A recommendation of:

Weak seed banks influence the signature and detectability of selective sweeps

Kevin Korfmann, Diala Abu Awad, Aurélien Tellier (2023), bioRxiv, ver.3, peer-reviewed and recommended by PCI Evolutionary Biology https://doi.org/10.1101/2022.04.26.489499

Read preprint in preprint server Now published in a journal

Codes used in this study

Scripts used to obtain or analyze results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Weak seed banks influence the signature and detectability of selective sweeps

Seed banking (or dormancy) is a widespread bet-hedging strategy, generating a form of population overlap, which decreases the magnitude of genetic drift. The methodological complexity of integrating this trait implies it is ignored when developing tools to detect selective sweeps. But, as dormancy lengthens the ancestral recombination graph (ARG), increasing times to fixation, it can change the genomic signatures of selection. To detect genes under positive selection in seed banking species it is important to 1) determine whether the efficacy of selection is affected, and 2) predict the patterns of nucleotide diversity at and around positively selected alleles. We present the first tree sequence-based simulation program integrating a weak seed bank to examine the dynamics and genomic footprints of beneficial alleles in a finite population. We find that seed banking does not affect the probability of fixation and confirm expectations of increased times to fixation. We also confirm earlier findings that, for strong selection, the times to fixation are not scaled by the inbreeding effective population size in the presence of seed banks, but are shorter than would be expected. As seed banking increases the effective recombination rate, footprints of sweeps appear narrower around the selected sites and due to the scaling of the ARG are detectable for longer periods of time. The developed simulation tool can be used to predict the footprints of selection and draw statistical inference of past evolutionary events in plants, invertebrates, or fungi with seed banks.

seed bank, weak dormancy, selection, tskit, tree sequence, forward simulation, fixation time, fixation probability, ancestral recombination graph

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

تؤثر بنوك البذور الضعيفة على التوقيع وإمكانية اكتشاف عمليات المسح الانتقائية

إن الخدمات المصرفية البذورية (أو السكون) هي استراتيجية واسعة النطاق للتحوط، وتولد شكلاً من أشكال التداخل السكاني، مما يقلل من حجم الانجراف الوراثي. يشير التعقيد المنهجي لدمج هذه السمة إلى أنه يتم تجاهلها عند تطوير أدوات للكشف عن عمليات المسح الانتقائية. ولكن، نظرًا لأن السكون يطيل مخطط إعادة تركيب الأجداد (ARG)، مما يزيد من أوقات التثبيت، فإنه يمكن أن يغير التوقيعات الجينومية للاختيار. للكشف عن الجينات تحت الانتقاء الإيجابي في الأنواع المخزنة في البذور، من المهم 1) تحديد ما إذا كانت فعالية الانتقاء قد تأثرت، و2) التنبؤ بأنماط تنوع النيوكليوتيدات عند الأليلات المختارة إيجابيًا وحولها. نقدم أول برنامج محاكاة قائم على تسلسل الأشجار، ويدمج بنك بذور ضعيفًا لفحص الديناميكيات والآثار الجينومية للأليلات المفيدة في مجموعة محدودة من السكان. نجد أن بنوك البذور لا تؤثر على احتمالية التثبيت وتؤكد التوقعات بزيادة أوقات التثبيت. نؤكد أيضًا النتائج السابقة التي تفيد بأنه، بالنسبة للاختيار القوي، لا يتم قياس أوقات التثبيت حسب حجم السكان الفعال لتزاوج الأقارب في وجود بنوك البذور، ولكنها أقصر مما هو متوقع. نظرًا لأن بنك البذور يزيد من معدل إعادة التركيب الفعال، فإن آثار عمليات المسح تظهر أضيق حول المواقع المحددة، وبسبب تحجيم ARG يمكن اكتشافها لفترات أطول من الوقت. يمكن استخدام أداة المحاكاة المطورة للتنبؤ بآثار الانتخاب واستخلاص الاستدلال الإحصائي للأحداث التطورية الماضية في النباتات أو اللافقاريات أو الفطريات الموجودة في بنوك البذور.

بنك البذور، السكون الضعيف، الاختيار، tskit، تسلسل الشجرة، المحاكاة الأمامية، وقت التثبيت، احتمال التثبيت، الرسم البياني لإعادة تركيب الأجداد

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Los bancos de semillas débiles influyen en la firma y detectabilidad de los barridos selectivos

El banco de semillas (o latencia) es una estrategia generalizada de cobertura de apuestas, que genera una forma de superposición de poblaciones, lo que disminuye la magnitud de la deriva genética. La complejidad metodológica de integrar este rasgo implica que se ignora al desarrollar herramientas para detectar barridos selectivos. Pero, a medida que la latencia alarga el gráfico de recombinación ancestral (ARG), aumentando los tiempos hasta la fijación, puede cambiar las firmas genómicas de selección. Para detectar genes bajo selección positiva en especies de bancos de semillas, es importante 1) determinar si la eficacia de la selección se ve afectada y 2) predecir los patrones de diversidad de nucleótidos en y alrededor de los alelos seleccionados positivamente. Presentamos el primer programa de simulación basado en secuencias de árboles que integra un banco de semillas débil para examinar la dinámica y las huellas genómicas de alelos beneficiosos en una población finita. Encontramos que el banco de semillas no afecta la probabilidad de fijación y confirma las expectativas de mayores tiempos de fijación. También confirmamos hallazgos anteriores de que, para una selección fuerte, los tiempos de fijación no dependen del tamaño efectivo de la población endogámica en presencia de bancos de semillas, sino que son más cortos de lo que se esperaría. A medida que los bancos de semillas aumentan la tasa de recombinación efectiva, las huellas de los barridos parecen más estrechas alrededor de los sitios seleccionados y, debido al escalamiento del ARG, son detectables durante períodos de tiempo más largos. La herramienta de simulación desarrollada se puede utilizar para predecir las huellas de la selección y extraer inferencias estadísticas de eventos evolutivos pasados en plantas, invertebrados u hongos con bancos de semillas.

banco de semillas, latencia débil, selección, tskit, secuencia de árbol, simulación directa, tiempo de fijación, probabilidad de fijación, gráfico de recombinación ancestral

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Des banques de graines faibles influencent la signature et la détectabilité des balayages sélectifs

La banque de semences (ou dormance) est une stratégie de couverture de risque largement répandue, générant une forme de chevauchement de population, ce qui diminue l'ampleur de la dérive génétique. La complexité méthodologique de l’intégration de ce trait implique qu’il soit ignoré lors du développement d’outils de détection de balayages sélectifs. Mais, à mesure que la dormance allonge le graphe de recombinaison ancestral (ARG), augmentant ainsi les temps de fixation, elle peut modifier les signatures génomiques de la sélection. Pour détecter les gènes soumis à une sélection positive dans les espèces de banques de graines, il est important de 1) déterminer si l'efficacité de la sélection est affectée et 2) de prédire les modèles de diversité nucléotidique au niveau et autour des allèles sélectionnés positivement. Nous présentons le premier programme de simulation basé sur la séquence d'arbres intégrant une banque de graines faibles pour examiner la dynamique et les empreintes génomiques d'allèles bénéfiques dans une population finie. Nous constatons que la banque de graines n’affecte pas la probabilité de fixation et confirmons les attentes d’augmentation des délais de fixation. Nous confirmons également des résultats antérieurs selon lesquels, pour une sélection forte, les délais de fixation ne dépendent pas de la taille effective de la population de consanguinité en présence de banques de semences, mais sont plus courts que prévu. À mesure que la banque de graines augmente le taux de recombinaison effectif, les empreintes de balayage semblent plus étroites autour des sites sélectionnés et, en raison de la mise à l'échelle de l'ARG, sont détectables pendant des périodes plus longues. L'outil de simulation développé peut être utilisé pour prédire les empreintes de sélection et tirer des inférences statistiques d'événements évolutifs passés chez les plantes, les invertébrés ou les champignons avec des banques de graines.

banque de graines, dormance faible, sélection, tskit, séquence d'arbres, simulation directe, temps de fixation, probabilité de fixation, graphe de recombinaison ancestrale

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

कमजोर बीज बैंक चयनात्मक स्वीप के हस्ताक्षर और पता लगाने की क्षमता को प्रभावित करते हैं

बीज बैंकिंग (या डॉर्मेंसी) एक व्यापक बेट-हेजिंग रणनीति है, जो जनसंख्या ओवरलैप का एक रूप उत्पन्न करती है, जो आनुवंशिक बहाव के परिमाण को कम करती है। इस विशेषता को एकीकृत करने की पद्धतिगत जटिलता का तात्पर्य है कि चयनात्मक स्वीप का पता लगाने के लिए उपकरण विकसित करते समय इसे नजरअंदाज कर दिया जाता है। लेकिन, जैसे-जैसे निष्क्रियता पैतृक पुनर्संयोजन ग्राफ (एआरजी) को लंबा करती है, निर्धारण के समय को बढ़ाती है, यह चयन के जीनोमिक हस्ताक्षर को बदल सकती है। बीज बैंकिंग प्रजातियों में सकारात्मक चयन के तहत जीन का पता लगाने के लिए यह महत्वपूर्ण है कि 1) यह निर्धारित करें कि क्या चयन की प्रभावकारिता प्रभावित होती है, और 2) सकारात्मक रूप से चयनित एलील्स पर और उसके आसपास न्यूक्लियोटाइड विविधता के पैटर्न की भविष्यवाणी करें। हम एक सीमित आबादी में लाभकारी एलील्स की गतिशीलता और जीनोमिक पदचिह्नों की जांच करने के लिए एक कमजोर बीज बैंक को एकीकृत करने वाला पहला वृक्ष अनुक्रम-आधारित सिमुलेशन कार्यक्रम प्रस्तुत करते हैं। हमने पाया कि सीड बैंकिंग निर्धारण की संभावना को प्रभावित नहीं करती है और निर्धारण के लिए बढ़े हुए समय की अपेक्षाओं की पुष्टि करती है। हम पहले के निष्कर्षों की भी पुष्टि करते हैं कि, मजबूत चयन के लिए, निर्धारण का समय बीज बैंकों की उपस्थिति में इनब्रीडिंग प्रभावी आबादी के आकार से नहीं बढ़ाया जाता है, बल्कि अपेक्षा से कम होता है। जैसे-जैसे बीज बैंकिंग प्रभावी पुनर्संयोजन दर को बढ़ाती है, चयनित साइटों के आसपास स्वीप के पदचिह्न संकीर्ण दिखाई देते हैं और एआरजी की स्केलिंग के कारण लंबे समय तक पता लगाया जा सकता है। विकसित सिमुलेशन टूल का उपयोग चयन के पदचिह्नों की भविष्यवाणी करने और बीज बैंकों के साथ पौधों, अकशेरुकी, या कवक में पिछले विकासवादी घटनाओं के सांख्यिकीय अनुमान निकालने के लिए किया जा सकता है।

बीज बैंक, कमजोर प्रसुप्ति, चयन, टीएसकिट, वृक्ष अनुक्रम, फॉरवर्ड सिमुलेशन, निर्धारण समय, निर्धारण संभाव्यता, पैतृक पुनर्संयोजन ग्राफ

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

弱いシードバンクは選択的スイープの特徴と検出性に影響を与える

シードバンキング (または休眠) は、広く普及している賭けヘッジ戦略であり、一種の集団重複を生成し、遺伝的浮動の大きさを減少させます。この特性を統合する方法論の複雑さは、選択的スイープを検出するツールを開発する際にこの特性が無視されることを意味します。しかし、休眠によって祖先組換えグラフ (ARG) が長くなり、固定までの時間が長くなるにつれて、選択のゲノム特徴が変化する可能性があります。シードバンク種においてポジティブセレクション下の遺伝子を検出するには、1) セレクションの有効性が影響を受けるかどうかを判断すること、2) ポジティブセレクションされた対立遺伝子およびその周囲のヌクレオチド多様性のパターンを予測することが重要です。我々は、有限集団における有益な対立遺伝子のダイナミクスとゲノムフットプリントを調べるために、弱いシードバンクを統合した最初のツリーシーケンスベースのシミュレーションプログラムを紹介します。シードバンキングが定着の確率に影響を及ぼさないことがわかり、定着までの時間が増加するという予想が裏付けられました。また、強力な選択の場合、固定までの時間は種子バンクの存在下での近親交配の有効個体群サイズには比例しないが、予想よりも短いという以前の発見も確認しました。シードバンキングにより有効な組換え率が高まるため、選択した部位の周囲でスイープのフットプリントが狭くなり、ARG のスケーリングにより長時間検出可能になります。開発されたシミュレーションツールを使用すると、シードバンクを使用して選択の足跡を予測し、植物、無脊椎動物、または菌類の過去の進化イベントの統計的推論を引き出すことができます。

シードバンク、弱い休眠、選択、tskit、ツリーシーケンス、フォワードシミュレーション、固定時間、固定確率、祖先組換えグラフ

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Bancos de sementes fracos influenciam a assinatura e a detectabilidade de varreduras seletivas

O banco de sementes (ou dormência) é uma estratégia generalizada de cobertura de apostas, gerando uma forma de sobreposição populacional, que diminui a magnitude da deriva genética. A complexidade metodológica da integração desta característica implica que ela seja ignorada no desenvolvimento de ferramentas para detectar varreduras seletivas. Mas, à medida que a dormência prolonga o gráfico de recombinação ancestral (ARG), aumentando o tempo de fixação, ela pode alterar as assinaturas genômicas da seleção. Para detectar genes sob seleção positiva em espécies de bancos de sementes, é importante 1) determinar se a eficácia da seleção é afetada e 2) prever os padrões de diversidade de nucleotídeos em e ao redor dos alelos selecionados positivamente. Apresentamos o primeiro programa de simulação baseado em sequência de árvores integrando um banco de sementes fraco para examinar a dinâmica e as pegadas genômicas de alelos benéficos em uma população finita. Descobrimos que o banco de sementes não afeta a probabilidade de fixação e confirma as expectativas de aumento do tempo de fixação. Confirmamos também descobertas anteriores de que, para uma seleção forte, os tempos de fixação não são dimensionados pelo tamanho efetivo da população de endogamia na presença de bancos de sementes, mas são mais curtos do que seria esperado. À medida que o banco de sementes aumenta a taxa efetiva de recombinação, as pegadas das varreduras parecem mais estreitas em torno dos locais selecionados e, devido ao dimensionamento do ARG, são detectáveis por períodos de tempo mais longos. A ferramenta de simulação desenvolvida pode ser usada para prever as pegadas de seleção e fazer inferências estatísticas de eventos evolutivos passados em plantas, invertebrados ou fungos com bancos de sementes.

banco de sementes, dormência fraca, seleção, tskit, sequência de árvores, simulação direta, tempo de fixação, probabilidade de fixação, gráfico de recombinação ancestral

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Слабые банки семян влияют на сигнатуру и обнаруживаемость выборочных проверок.

Семенное банкирование (или спячка) – это широко распространенная стратегия хеджирования ставок, порождающая своего рода перекрытие популяций, что уменьшает масштабы генетического дрейфа. Методологическая сложность интеграции этого признака подразумевает, что он игнорируется при разработке инструментов для обнаружения выборочных проверок. Но поскольку покой удлиняет граф предковой рекомбинации (ARG), увеличивая время фиксации, это может изменить геномные признаки отбора. Для обнаружения генов, подвергшихся положительному отбору, у видов из банков семян важно: 1) определить, влияет ли эффективность отбора, и 2) спрогнозировать закономерности нуклеотидного разнообразия в положительно выбранных аллелях и вокруг них. Мы представляем первую программу моделирования на основе древовидных последовательностей, объединяющую слабый банк семян для изучения динамики и геномного следа полезных аллелей в конечной популяции. Мы обнаружили, что банкирование семян не влияет на вероятность фиксации, и подтверждаем ожидания увеличения времени фиксации. Мы также подтверждаем более ранние выводы о том, что при сильном отборе время фиксации не зависит от эффективного размера популяции инбридинга при наличии банков семян, а короче, чем можно было бы ожидать. Поскольку банк семян увеличивает эффективную скорость рекомбинации, следы разверток вокруг выбранных сайтов становятся уже, и из-за масштабирования ARG их можно обнаружить в течение более длительных периодов времени. Разработанный инструмент моделирования можно использовать для прогнозирования последствий отбора и получения статистических выводов о прошлых эволюционных событиях у растений, беспозвоночных или грибов с банками семян.

банк семян, слабый покой, селекция, tskit, древовидная последовательность, прямое моделирование, время фиксации, вероятность фиксации, граф предковой рекомбинации

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

弱种子库影响选择性扫描的特征和可检测性 12f7f7f68a704413坏21a2351c6f80b 种子库、弱休眠、选择、tskit、树序列、正向模拟、固定时间、固定概率、祖先重组图

种子库（或休眠）是一种广泛使用的对冲策略，会产生某种形式的种群重叠，从而降低遗传漂变的程度。整合这一特征的方法的复杂性意味着在开发检测选择性扫描的工具时它被忽略。但是，随着休眠延长祖先重组图（ARG），增加固定时间，它可以改变选择的基因组特征。为了检测种子库物种中正选择下的基因，重要的是 1) 确定选择的功效是否受到影响，2) 预测正选择等位基因处及其周围的核苷酸多样性模式。我们提出了第一个基于树序列的模拟程序，集成了弱种子库，以检查有限种群中有益等位基因的动态和基因组足迹。我们发现种子库不会影响固定的可能性，并证实了固定时间增加的预期。我们还证实了早期的发现，即对于强选择，固定时间不会根据种子库存在下的近交有效种群规模而变化，但比预期的要短。随着种子库提高了有效重组率，所选位点周围的扫描足迹显得更窄，并且由于 ARG 的缩放，可在更长的时间内检测到。开发的模拟工具可用于预测选择的足迹，并通过种子库对植物、无脊椎动物或真菌过去的进化事件进行统计推断。

种子库、弱休眠、选择、tskit、树序列、正向模拟、固定时间、固定概率、祖先重组图

Submission: posted 23 May 2022
Recommendation: posted 22 May 2023, validated 22 May 2023

Cite this recommendation as:
Vitalis, R. (2023) New insights into the dynamics of selective sweeps in seed-banked species. Peer Community in Evolutionary Biology, 100552. https://doi.org/10.24072/pci.evolbiol.100552

Recommendation

Many organisms across the Tree of life have the ability to produce seeds, eggs, cysts, or spores, that can remain dormant for several generations before hatching. This widespread adaptive trait in bacteria, fungi, plants and animals, has a significant impact on the ecology, population dynamics and population genetics of species that express it (Evans and Dennehy 2005).

In population genetics, and despite the recognition of its evolutionary importance in many empirical studies, few theoretical models have been developed to characterize the evolutionary consequences of this trait on the level and distribution of neutral genetic diversity (see, e.g., Kaj et al. 2001; Vitalis et al. 2004), and also on the dynamics of selected alleles (see, e.g., Živković and Tellier 2018). However, due to the complexity of the interactions between evolutionary forces in the presence of dormancy, the fate of selected mutations in their genomic environment is not yet fully understood, even from the most recently developed models.

In a comprehensive article, Korfmann et al. (2023) aim to fill this gap by investigating the effect of germ banking on the probability of (and time to) fixation of beneficial mutations, as well as on the shape of the selective sweep in their vicinity. To this end, Korfmann et al. (2023) developed and released their own forward-in-time simulator of genome-wide data, including neutral and selected polymorphisms, that makes use of Kelleher et al.’s (2018) tree sequence toolkit to keep track of gene genealogies.

The originality of Korfmann et al.’s (2023) study is to provide new quantitative results for the effect of dormancy on the time to fixation of positively selected mutations, the shape of the genomic landscape in the vicinity of these mutations, and the temporal dynamics of selective sweeps. Their major finding is the prediction that germ banking creates narrower signatures of sweeps around positively selected sites, which are detectable for increased periods of time (as compared to a standard Wright-Fisher population).

The availability of Korfmann et al.’s (2023) code will allow a wider range of parameter values to be explored, to extend their results to the particularities of the biology of many species. However, as they chose to extend the haploid coalescent model of Kaj et al. (2001), further development is needed to confirm the robustness of their results with a more realistic diploid model of seed dormancy.

REFERENCES

Evans, M. E. K., and J. J. Dennehy (2005) Germ banking: bet-hedging and variable release from egg and seed dormancy. The Quarterly Review of Biology, 80(4): 431-451. https://doi.org/10.1086/498282

Kaj, I., S. Krone, and M. Lascoux (2001) Coalescent theory for seed bank models. Journal of Applied Probability, 38(2): 285-300. https://doi.org/10.1239/jap/996986745

Kelleher, J., K. R. Thornton, J. Ashander, and P. L. Ralph (2018) Efficient pedigree recording for fast population genetics simulation. PLoS Computational Biology, 14(11): e1006581. https://doi.org/10.1371/journal.pcbi.1006581

Korfmann, K., D. Abu Awad, and A. Tellier (2023) Weak seed banks influence the signature and detectability of selective sweeps. bioRxiv, ver. 3 peer-reviewed and recommended by Peer Community in Evolutionary Biology. https://doi.org/10.1101/2022.04.26.489499

Vitalis, R., S. Glémin, and I. Olivieri (2004) When genes go to sleep: the population genetic consequences of seed dormancy and monocarpic perenniality. American Naturalist, 163(2): 295-311. https://doi.org/10.1086/381041

Živković, D., and A. Tellier (2018). All but sleeping? Consequences of soil seed banks on neutral and selective diversity in plant species. Mathematical Modelling in Plant Biology, 195-212. https://doi.org/10.1007/978-3-319-99070-5_10

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
The authors gratefully acknowledge the computational and data resources provided by the Leibniz Supercomputing Centre (www.lrz.de). KK is supported by a grant from the Deutsche Forschungs-gemeinschaft (DFG) through the TUM International Graduate School of Science and Engineering (IGSSE), GSC 81, within the project GENOMIE QADOP. AT receives funding from the Deutsche Forschungsgemeinschaft (DFG) grant TE809/1-4, project 254587930. DAA was a Humboldt Post-Doctoral fellow

Reviews

Evaluation round #2

DOI or URL of the preprint: https://doi.org/10.1101/2022.04.26.489499

Version of the preprint: 2

Author's Reply, 08 May 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.evolbiol.100552.ar2

Decision by Renaud Vitalis, posted 06 Apr 2023, validated 11 Apr 2023

Dear Dr Korfmann,

I apologize for the delay. Three reviewers have now examined the revised version of your manuscript entitled "Weak seed banks influence the signature and detectability of selective sweeps" that you submitted for recommendation to PCI Evol Biol.

They all reckon that you made a very substantial effort to address the issues raised in the previous evaluation round. I agree with them that your revised manuscript clarifies the main take-home messages of your manuscript, thanks to the correction of the simulation code, the comparison of some limit results with theoretical expectations, the improvement of several figures, and a more in-depth analysis and interpretation of your main results.

That said, Dr. Boitard and Dr. Koskela have made some additional comments and suggestions, asked for clarifications, and raised some points on your revised manuscript, that I recommend you address.

All reviewers and I consider that your manuscript addresses an important topic, with a solid and thorough approach, and I am willing to consider a revised version of your manuscript for recommendation in PCI Evol Biol. However, before making a final decision, I would also like you to consider the following concern.

First, I would like to thank you for clarifying the model description, and for providing a more complete mathematical formalism (p. 4). However, with these clarifications, I now realize that I overlooked an important modeling choice: you state that the generation (or age) G of each parent of a seed is randomly sampled from a multinomial draw, parameterized with a probability vector $\textbf{Y}^{norm}$ that depends upon the rate of dormancy. Since the draws are independent (see Lines 142-143), this amounts to considering that the two parents of a seed may not belong to the same generation (i.e., that they do not have the same age) or, to put it another way, that the ovule and the pollen of a seed were not produced in the same generation (which, biologically speaking, seems a bit odd, at least for diploid plants). If my understanding is incorrect, I recommend that you revise the description of the model, to match what the simulation program actually does; if my understanding is correct, I urge you to justify that this (biologically odd) modeling choice provides accurate results for a weak seed bank model.

Consistent with this concern, I suggest that you review Figure 1, as this Figure illustrates the forward-in-time two-step process considered in Kaj et al. (2001) who assumed a haploid model, and not a diploid model as in this manuscript. A revised Figure could (should) represent diploid seeds, each seed choosing its two parents from the same generation in the past.

Last, I invite you to consider the following minor comments/suggestions:

Line 5 of the abstract: “signals of selection” -> “signatures of selection” or “footprints of selection”

Line 5 of the abstract: “more narrow” -> “narrower”

Keywords: “weak, dormancy” -> “weak dormancy” (remove the comma)

Lines 15-16: “spatial structure” -> “genetic structure” or “genetic differentiation”

Line 23: “the common ancestor of a population” -> “the common ancestor of a sample of genes from the active population”

Line 49: “representing the populations of size N from the past” -> “representing the past populations of size N”

Lines 79-80: “only the non-dormant lineage is affected by recombination” -> “only the non-dormant lineages are affected by recombination”

Line 127: “indicies” -> “indices”

Line 138: Completing Dr. Jore Koskela’s first comment (see below), the definition of the probability vector $\textbf{Y}^{norm}$ should read: $\textbf{Y}^{norm} = \frac{\textbf{Y}}{\sum_{j=1}^{m} Y_j}$, or: $Y_k^{norm} = \frac{Y_k}{\sum_{j=1}^{m} Y_j}$

Line 154 “Sequantially” -> “sequentially”

Line 187: “As previously stated the simulation process can ran, independently of tskit, but is required when planning to analyze the genealogy” -> “As previously stated the simulation process can be ran independently of tskit, but the latter is required when planning to analyze the genealogy.”

Lines 191-192: “(Figure S1 and Table S1 for empirically sufficient number of calibration generations given for a recombination rate)” -> “(see Figure S1 and Table S1 for characterizing empirically the sufficient number of calibration generations needed for a given recombination rate)”

Line 221: “if it stays at a size of 2N for m consecutive generations“ -> “if its number of copies is 2N for m consecutive generations”

Lines 285-286: “[...] decreasing the value of b (i.e. the longer seeds remain dormant)” -> “[...] decreasing the value of b (i.e., leaving seeds dormant longer)”

Figure 2b: indicate what shaded areas represent (as in Figures 3b and 3c).

Figure 3: The last sentence in the legend (“Dashed-blue lines indicate theoretical expectations of a Ne-scaled population corresponding to a given seed bank strength.”) repeats what is written above (“The blue lines indicate the time to fixation in a population without dormancy but with an effective population size scaled by”), does it not?

Line 331: “(e.g. Figure 4b, 4d and S10)” -> “(e.g., Figures 4b, 4d and S10)”

Figure 4: The legend for (3c) should rather read: “(c) $\pi$ assuming two recombination rates r = 10−7 per bp per generation (c1) and r = 5 × 10−8 per bp per generation (c2).”

Figure 4: “(b) Normalized $\pi$ as divided by the average neutral branch diversity from (a) using the values 2,000 and 16,000 for b = 1 and b = 0.35, respectively.” What are the values 2,000 and 16,000? (values of what?) It does not seem to relate to a scaling by $\frac{1}{b^2}$, and I am therefore not quite sure to understand how $\pi$ is normalized.

Lines 327-328: “Moreover, stronger dormancy also generates narrower selective sweeps around sites under positive selection which have reached fixation”: when comparing Figures S10 and 4, I would rather conclude that stronger selection (i.e., s = 1 vs. s = 0.2) generates narrower selective sweeps around sites under positive selection (since the rate of dormancy is the same in both Figures). Please, correct the sentence, or be more explicit.

Figure 5: in the legend, (a12,b12) should read (a21,b21) to be consistent with the Figure.

Line 370-372: I would move the sentence “Following the classic procedure to detect sweep [...]” above, since it provides the criteria used to obtain the results from lines 367-370.

Lines 379-380: “We note that there is a much sharper decrease in the rate of detection of false positive sweeps (neutral simulation line in Figure 5) under seed bank compared to the absence of a seed bank.” Do you have any insight about why would that be? It might seem a bit counterintuitive.

I thank you very much for submitting your manuscript to PCI Evol Biol, and I look forward to receiving your revised manuscript.

Best regards,
Renaud Vitalis

https://doi.org/10.24072/pci.evolbiol.100552.d2

Reviewed by Jere Koskela, 27 Feb 2023

I think this is a thorough paper on a topic which is of both mathematical and biological interest, and the comments I made in the previous round have been fully addressed. I only have two further, superficial comments to add:

1. Page 4, line 137: In the vector Y = (Y_1, ..., Y_m), it looks as if Y_k means the kth entry of Y. However, on the same line Pr(Y_k) seems to stand for the probability of the event that a random variable takes the value k. Do you mean Pr(Y_j = k)?

2. Page 6, line 187: "As previously stated the simulation process can ran..." seems to be missing a "be".

https://doi.org/10.24072/pci.evolbiol.100552.rev21

Reviewed by Simon Boitard, 08 Mar 2023

The authors have made substantial efforts to tackle the questions raised by all reviewers, including myself. They corrected some errors in the code and updated the simulation results, they included new analyzes on allele frequency dynamics and sweep detection, and they significantly revised the text and figures of the manuscript. Some clarifications and corrections are still needed but should be very easy to implement.

Line 27, 53 and many others : ‘coalescent’ is the name for the stochastic process describing the genealogy of a sample, so it is correct to talk about the Kingman ‘coalescent’ (for instance). But the events within this process are ‘coalescence’ events, and similary one should talk about ‘coalescence’ rates, ‘coalescence’ times …

L48-50 : Is it unclear what ‘representing the populations of size N from the past’ means.

L85 : ‘a’ Sequentially Markovian Coalescent.

L89-94 : I don’t understand the arguments here. If the fixation time of selected alleles is not multiplied by 1/b² but by a smaller rate, it is smaller than expected and the selection is thus more efficient, not ‘altered’. Similarly at line 97, why do seed banks ‘decrease’ the efficacy of selection. Based on the results of Figure 3, I assume that L91 is maybe wrong and that fixation time is multiplied by a ‘higher’ rate.

L100 : ‘on’ is probably missing between ‘seed bank’ and theta’.

L137 : Yk is a probability, so one should write Y_k=b(1-b)^(k-1), not P(Y_k)=…

L154-156 : I don’t understand where SMC is used, given that simulations are forward in time and SMC is a backward model.

L188 : Is it meant that ‘tskit is required when planning …’ ?

L190 : Since this calibration phase strongly dpends on the population size, it would be better to describe it after introducing the simulated population sizes.

Figure 2, S1 and S8b : Why recording the ‘oldest TMRCA per sequence’ ? This switches the focus from the mean coalescence time to the maximum coalescence time, which probably explains why this quantity depends on the recombination rate (Figure S1).

Line 307 and 311 : I am not sure that 3c is the correct reference here, is it not 3b ?

Line 317 : Is it not rather ‘decreasing b’ ?

Line 328 : ‘S10’ → ‘(Figure S10)’

Line 345 : ‘smooth’ would maybe fit better than ‘sharp’ ?

Line 365 : ‘these variations’

L372-375 : The link between the two sentences is not obvious to me. Decreasing the window size may increase the rate of true positives more than that of false positives, resulting in a higher power for a fixed false positive rate threshold.

Figure 5 : ‘0 generations’ is missing at line 4 of the legend.

L375-376 : It seems to me that these sweeps are already detectable in b21, not only in b22.

L414-417 : A verb is missing in this sentence.

L 425-428 : This sentence is a bit confusing, especially the part ‘which did not compute the probability of fixation of an advantageous allele’.

Figure S3 : The legend says that time is normalized, but it is not clear which normalization is used.

Figure S8 : « =0.35 » is missing in the legend (current text ‘b=1 and b for ... ».

Figure S9 : I don’t understand why Ne*s is 800 for b=1 but 400 for b=0.35. I thought the aim of this analysis was to have the same Ne for both values of b ?

Figure S10 : Same comment as for S9, although I don’t clearly see whether these two figures have the same or different objectives. I note that in Figure S9 the sweep is narrower with b=1, while in S10 it is narrower with b=0.35 : how can we explain this ?

https://doi.org/10.24072/pci.evolbiol.100552.rev22

Reviewed by William Shoemaker, 15 Mar 2023

The authors have done an admirable job improving the manuscript, both by addressing my comments and the comments of my fellow reviewers. I have no further comments and believe that it should be recommended by PCI Evol Biol.

https://doi.org/10.24072/pci.evolbiol.100552.rev23

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.1101/2022.04.26.489499

Version of the preprint: 1

Author's Reply, 03 Feb 2023

Download tracked changes file

Dear Dr. Vitalis,

the following part contains original reviews and suggestions kindly provided by the reviewers and yourself, with an additional reply added from my colleagues and me under each paragraph, hoping to address your points sufficiently. We appreciate the time you take in looking at the manuscript again and apologize for the delay.

Kindly,

Kevin Korfmann

Dear Mr Korfmann,

your manuscript entitled "Weak seed banks influence the signature and detectability of selective sweeps" submitted to PCI Evol Biol has now been examined by four reviewers. All reviewers and I agree that your study is sound, and addresses an interesting and timely question in population genetics: how seed or egg dormancy influences evolutionary dynamics, in particular the fate of advantageous mutations, and the power to detect them. Furthermore, the release of a new simulation tool, which generates SNP data from evolutionary models including weak seed banking and selection, is much appreciated.

Based on the reviews and my own appreciation, I would be glad to consider a revised version of your manuscript for recommendation in PCI Evol Biol, provided you carefully address the reviewers’ points and comments, and provide a point-by-point response to their concerns.

Reply: We thank Dr Vitalis (recommender) and the four reviewers for spending their time to provide in depth reviews and for acknowledging the interesting aspects of our study. We address all points raised by the reviewers. In a nutshell, we have improved and corrected the simulation code, so we have rerun all analyses to demonstrate that all results are consistent with the theoretical expectations, and improve the interpretation of our results. We also run an additional set of analysis to quantify which phase of a sweep is most affected by seed bank. We also perform new analyses to include sweep detection with SweeD (in addition to OmegaPlus) which relies on the SFS signatures. We are sorry for the delay in performing these new analyses, and hope the new revised preprint can be recommended in PCI Evol Biol.

In addition to the reviewers’ comments, I invite you to consider the following points:

My understanding of the model description (section 2.1) is that, each generation, "parents" are picked from a randomly chosen age group (see lines 161-162), and one "gamete" is generated from each of these parents to create a "seed" (haploid model). This wording choice introduces the notion that parents are dormant (i.e., in a non-reproductive stage) rather than seeds. Although I agree that it amounts to the same from a modelling perspective, I wonder whether the text could be clearer by considering, e.g., that each generation parents produce seeds, and that seeds enter a dormant stage at rate (1 – b).

Reply: The description has been modified to clarify this point.

Although the manuscript is overall well-structured and nicely organized, I concur with the reviewers that some of your conclusions could be more clearly stated. For example, the claim made lines 325-326 that "Taken together, the results in Figure 3a, 3b and 3c demonstrate that [weak] selection is slowed down by dormancy", is opposed to the claim made lines 333-334 that "[...] yielding the counter-intuitive result that dormancy enhances the efficiency of [strong] selection compared to genetic drift": (i) it is not clear to me how the latter conclusion is reached from Figure 3; (ii) using different wordings ("slow down", "enhance the efficiency", "magnify", etc.) as synonyms does not help.

Reply: We have now clarified the description of the results by describing more precisely time to fixation and probability of fixation, to avoid unclear statements.

Figures

As the reviewers, I believe that Figures 2-5 (and their legends) could be improved. In addition to reviewers’ comments, I noticed that, in each panel, ticks are missing from the bottom and left axes.

Reply: All figures have been completely revised to include the suggestions from the reviewers.

In Figure 4d: "b 1.0 r 5e-5" should read "b 0.35 r 5e-5". Please provide the effective size for this set of parameter values, for the purpose of comparison with a model without a seed bank (in order to justify the claim lines 353-354 that "the results in Figure 4 cannot be produced by scaling only the effective population size in the absence of dormancy").

Reply: This information is included now in the legend.

Open access

The reviewers appreciated the release of simulation codes on a public repository. However, an anonymous reviewer found it difficult to run the example Jupyter scripts. Please, consider expanding the installation guide, as suggested by this reviewer.

Reply: We have improved the installation guide based on the new sets of codes.

Conflict of interest

The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article."

Reply: A conflict of interest statement has been added.

Minor points

Line 152: define explicitly the X_parent variable; the multinomial notation is cumbersome and should use vector notations instead. Idem in Equation (2).

Reply: The definition is now on line 137, and we use as suggested the vector notation + multinomial notation alongside a description of the constraints of suitable number spaces.

Cite "Maynard Smith and Haig 1974", instead of "Smith and Haig 1974" (see lines 343, 417, 458, and reference [38]).

Reply: Citation modified.

I thank you very much for submitting your manuscript to PCI Evol Biol, and I look forward to receiving your revised manuscript.

Best regards,

Renaud Vitalis

+++++++++++++++++++++++++++++++++++++++++++++

Review by Guillaume Achaz, 08 Jul 2022 16:41

The ms by Korfmann et al. describes a diploid Wright-Fisher model with seed banking together with a C++ implementation that performs Monte-Carlo simulations. The ms is sound, very clearly written and addresses an interesting question of evolutionary biology. I believe it has the potential to be recommanded by PCI Evol Biol. I nonethless have some suggestions that could help clarifying some of the points discussed by the authors.

Reply: We thank Dr Achaz for his kind comments.

I know realize that there are many points, but they are all easy to fix or be discussed.

:: Main comments ::

- I am unsure the interpretation of the Probability of fixation (Pfix hereafter) is sound. For a sweep in a finite population, there is a drift barrier of low frequency and the fate of the beneficial allele is settled in the first generations. I suspect that with seed dormancy, the probability to leave 0 descendants for the beneficial allele in the m first generations (meaning never leave any descendant) is simply higher. This would be the same for the probability of extinction in general after securing few descendants in the first generations. I am late in my review and only attempted the case of m=2, but this seems doable without much pain for m generations for any kind of distribution. The effect of sampling from several previous generations for each descendant cannot be simply synthesized as "longer time for drift to eliminate the beneficial allele". By some work, using birth and death processes (see specific point on l302 below), I suspect the probability of fixation can be computed with some extra-work (this is not what I expect from the authors in the present article, but it could be worth doing it on a future work).

Reply: The reviewer is correct that it should be doable to perform such theoretical analysis, which we hope to do in the near future. For the time being we first revised and corrected our code as we found a problem in the calculations for the probability of fixation. Our results now agree with previous analytical expectations: the fixation probability is approximately 2hs for all parameter values. Moreover, we analyze in response to reviewer 4 (Dr. Boitard) the dynamics of changes in allele frequency to investigate more precisely the impact of dormancy on the different phases of a sweep dynamics: low allele frequency and intermediate to high frequency. The new results are now shown in Supps Figure S2-4 and discussed.

The fixation time (which should be corrected all along this article by "mean fixation time") is very well approximated for a finite haploid population of size N by

E(Tfix) = 2 ln (N . Pfix +\gamma_e)/s

in a haploid WF model, where \gamma_e is the Euler constant. As the intensity of dormancy (\beta) will enter on Pfix within the log_e, we can immediately see that the relationship will not be trivial. So it does not come as a surprise that there is not simple relationship between beta and E(Tfix). I have not worked the diploid version with h != 0.5, but it may well add a layer of complication adding further complication to the relationship.

Reply: Thank you for pointing this out. We had neglected the non-linear relationship between Tfix and Ne.s, which led to a partial interpretation of the results. An approximated result is found in Koopmann et al. 2017 (equation 19), where it indeed appears that the relationship is not trivial and likely non-linear. We have added the expectations of Tfix to Figure 3, based on the equation above and the diffusion approximation given by Kimura and Ohta (1969) for weaker s, as the above equation does not give reliable results in that range. The time to fixation is actually much higher with dormancy than would be expected for a similar Ne. This has resulted in a change of our take home message: Dormancy makes selection less efficient. This is highlighted by the new Supp Figs (S2-4) where we show that as b decreases and s increases, more time is spent at very low and very high frequencies of the allele under selection with regards to the time spent at intermediate frequencies.

- There is a need to explain how the software OmegaPlus works and what is the value "Omega" for casual (understand lazy) readers that would like to read your article without having the duty of reading the OmegaPlus articles. Just the basic ideas would be a great added value.

Reply: More details and comments about OmegaPlus have been added in the section 2.3 Statistics and sweep detection section (second paragraph)

- Figure 2a, you are plotting average TMRCAs when the recombination rate is >0, right ?

Reply: We simulate 200 DNA sequences of 0.1Mb with recombination. From all the trees of a given simulation run, we use the time of the most ancestral node from the tree containing it as TMRCA. Or said differently, from all the ancestral nodes of a given simulation, we choose the oldest time as TMRCA per sequence. Therefore, when there is no recombination, we obtain one TMRCA per sequence, otherwise under recombination, we retain only the oldest TMRCA per sequence.

:: Specific comments ::

Abstract: l21-22: the sentence about magnifying and lowering selection is especially confusing. If it decreases the Pfix, how could it magnify selection? What "magnify" means here is unclear.

Reply: We acknowledge that our choice of words was confusing and have now streamlined our results description. This particular sentence has been removed.

l31- authors list all sorts of organisms but not fungi... but they produce spores. This is likely due to my ignorance, but I would have guessed that fungi tend to sporulate when the nutriments are less abundant (this is at least true for S cerevisiae and S pombe) and are likely concerned by dormancy.

Reply: Indeed, fungi are well-documented organisms portraying dormancy traits, and we now mention them.

l40-41: buffering mutation rate? What does this mean?

Buffering population size changes, not mutation rate. Added “affecting mutation rates”, to make it more clear.

l80: any intuition on the formula?

Reply: We are not sure which formula the reviewer meant, but guess that it refers to the 1/beta². The intuition in a coalescent framework (see Kaj et al. 2001) is that for two lineages to find a common ancestor, i.e. to coalesce, they need to choose the same parent in the above-ground population, and each have a probability beta to do so because only active lineages can coalesce. Thus the probability that two lineages are simultaneously present in the active population is beta², thus scaling the coalescent rate. This is now included in the next sentence.

l90 : effective population size is always fuzzy. I believe you refer to "inbreeding" population size. It would not cost much too state 'inbreeding effective population size".

Reply: Done, it is now stated in the introduction.

l96: there is an unnecessary ")" in the \theta definition

Reply: Fixed.

l102: "Only one" can be replaced by "The non-dormant" or "The active" or any other more precise wording.

Reply: Changed to “only the non-dormant”

l116 : we read again the "efficacy" of selection. This is confusing. Mean fixation time is longer but at the same time the "efficacy" is increased. What are we measuring then, since mean Tfix is longer and Pfix is smaller? How could it be more efficient?

Reply: Again we apologize for the poor choice of words. When analyzing the time spent in the two “stochastic” phases as well as deterministic phases (see Appendix Figure S2-4), and performing computation of expected fixation times (Figure 3b), we saw that selection is actually less “efficient” compared to a 1/b² rescaled population. This is because the allele spends comparatively longer in the stochastic phases. We adapted the message of the paper, and are more precise now when defining time to fixation or probability of fixation and we avoid using efficiency or efficacy.

l152 + l162 : please define properly X and G. I am not sure parent "1" is the best subscript as it is true for all individuals.

Reply: The section and notations have been revised.

l159: numerator is also clear when written as 1-(1-b)^m (and faster to computer)

Reply: Thank you for the suggestion. It is indeed improving the computational efficiency.

l160: a truncated geometric would be an adequate name for the distribution.

Reply: Changed.

par l163-l171: this implies there is a single chromosome of size 1, but later (par l208-l220) there is a variable map length. So I think Poisson (L\mu) and Poisson (Lr) can be introduced here and L can be defined as the map length. Please mention that a single chromosome is considered.

Reply: We now removed to the notion of map lengths and refer to chromosome or sequence only. We additionally added the simulated sequence length, ranging from 100kb up to 10MB.

l182-l184: As the probability of being chosen is inflated for all generations, it seems to me that the survival probability is overall inflated. So I am not sure whether the sentence is truly correct.

Reply: The description of the model was improved and this sentence modified.

l196-198: not easy to grasp on for someone who doesn't know the arcane of tskit. Can the author provide a sentence to at least get an idea of what is going on?

Reply: Changed. The paragraph has been partly rewritten, to emphasize that one can run the simulations without recording the genealogies, which is the purpose of tskit. The first paragraph of the methods explains why we use tskit (and the general idea). And the last paragraph of the code description paragraph justifies its usage over SLiM.

l200: I believe 5pN is better choice for the burn-in (where p is the ploidy). This stems from Malécot recursion on Heterozygosity that will be almost at equilibrium (to a 1% relative difference) at approximately 5 times the chromosome pool size (N for haploids).

Reply: Many thanks for this insight, and we have now studied this issue carefully and derive an ad hoc burn in time taking into account the dormancy rate. In the first implementation, there was an issue using to small burn-in generations for the high recombination rate. Now we have added an appendix figure S1 showing the absolute TMRCA dependent on the recombination rate to demonstrate that there is sufficient burn in time. Based on experiments of trying different recombination rates, we came up with a an empirically sufficient number of burn-in generations (see Appendix table) to reach theoretical expectations (especially for the neutral diversity background in the sweep figures, i.e. left and right sides of the chromosome). For a recombination rate around 10^-7 - 10^-8, we choose burn-in for 200,000 generations for a population of 500 diploids under b=0.35 dormancy. Choosing such a large number increases substantially the computational time for the large number of simulations we ran, but it was necessary. We now included the rationale for using different burn in times in Figure S1.

par l222-232: For the haploid case, an easy way to condition on fixation is simply to set a random individual at the selected type and attribute the others uniformly as in a regular WF model. I wonder if there would be a similar trick for this case (no fix is required, this is just a hint on how to speed up the simulator for future work).

Reply: Thanks for the idea, sounds interesting indeed.

l236: would it be appropriate to (also) cite Tajima 83 for "Tajima's \pi" (that was noted k in 83)?

Reply: Tajimas 83 citation is added now.

l258 - Is running time also very efficient for s=0.01 or lower s values?

Reply: By now we have rewritten the simulator so that lower selection coefficients can be easily tested. In fact we test in the new version values as low as Ne.s=0.1, corresponding to s=0.0001 and assess the probability and time to fixation.

l292- I think the recombination events are 1 per coalescent tree (in a tree sequence) or NLr/b per coalescent unit.

Reply: Indeed there are 1/b more recombination events per chromosome. The sentence has been modified.

Fig2a - make sure it is 2 in coalescent unit for b=0.

Reply: Yes. We have double-checked and made sure that values are within theoretical expectations. An appendix figure with absolute values has been added additionally.

l 302 - I am not sure the difference comes from the N=500. 1-exp(-2s) is only valid for small s. Another potential derivation is to use the probability of survival of a critical birth-death that starts with 1 individual, who has a Poisson number of descendants (of mean \lambda), leading to 1-Pfix = Exp(- \lambda Pfix ). The dominance may add some complications, though. I believe you used h=1 here ?

Reply: We used h=0.5 as our focal dominance coefficient. A sentence has been added to clarify this. Additionally, the probability of fixation is no longer affected by the germination rate.

Figure 3: please specify the parameters. N, h, range of s.

Reply: Added.

l 327- I don't understand the meaning of this sentence.

Reply: the sentence has been rewritten.

par l329-l334: see main points above.

Reply: the sentences have been rewritten to account for the updated simulator version.

Figure 4.d: what is N for b=0.35?

Reply: corrected.

l404: "a period of 1 / 2 Ne or 1 / 2 Ne s"? What do I don't understand?

Reply: Modified to time to fixation instead of period.

l431-432: see main points above

Reply: Because our results and main conclusions have changed, this is no longer an issue.

+++++++++++++++++++++++++++++

Reviewer 2

This article presents the first analysis of the joint effect of selective sweeps and so-called

weak seed bank dynamics, in which organisms are able to enter a reversible state of metabolic

dormancy lasting a small number of generations. Seed banks are known to slow down genetic

drift, elongate branches in ancestral trees, and increase the effective rate of recombination. The

findings of the article are consistent with these intuitions: fixation probabilities decline, and

the signatures of selective sweeps become narrower but are persist for longer. On the other

hand, predictions derived from deterministic or otherwise simplified models have shown that

the effect of seed banks on the efficiency of selection is subtle [1, 2]. This article presents a

scalable, tskit-based simulation tool for whole genome data under the joint effect of weak

seed banks and selective sweeps, clarifying the applicability of the various predictions to more

realistic models, and shedding light on the nonlinearities involved in the joint effect of dormancy

and selection. The simulation model is a natural extension of the classical Wright–Fisher model.

Given the universality of coalescent models, it is likely that findings are robust to other choices

of individual-based models (at least when suitably rescaled).

The tskit framework is natural for neutral evolution, but more cumbersome for non-neutral

processes since genotypes need to be tracked in addition to ancestral relations. The paper avoids

this issue by tracking genotypes outside the main simulation data structure. Simulation param-

eters are well-chosen, and span a biologically plausible range while remaining computationally

feasible.

The authors have provided their simulator as a GitHub repository sleepy, with brief instal-

lation instructions which I was able to follow. They have also provided an analysis repository,

sleepy-analysis, which I was able to clone. However, despite the installation of the sleepy

package running without errors, the Jupyter scripts in sleepy-analysis were not able to lo-

cate the sleepy method. I am by no means an experienced Jupyter user so may be missing

something obvious, but expanding the installation instructions to include details on how to run

the provided minimal code example would be a good way to improve the accessibility of the

code.

Reply:

A more detailed example notebook has been added. But notably the compilation does not require root privileges, because the executable is not transferred or linked to a directory, where root privilege is required. What we suggest instead is to append the bin directory to your path in the bashrc file:

export PATH=$PATH:/path/to//sleepy/bin

Now, after that is done. It is important to close the window and reopen the terminal window and then open jupyter-lab/notebook

Additionally in the jupyter notebook you can check if sleepy is found by typing bash commands as demonstrated in the example notebook

The simulation outputs consist of a collection of standard population genetic statistics (link-

age disequilibrium, time to most recent common ancestor, etc.), as well as the time until fixation

and an estimator of the fixation probability. The last two are estimated by repeatedly introduc-

ing a selective mutation at a given site until it fixates (rather than goes extinct), and storing

the number of generations until fixation and the number of trials. The latter amounts to esti-

mating the success probability p of a geometric distribution from iid replicates, which is known

to have a p(1 − p)/n bias, but the authors have performed n = 1000 replicates to render the

bias negligible. The simulator is validated using a neutral scenario, for which many expected

values are known analytically.

Overall, the simulations are an interesting catalogue of the joint, nonlinear impact of dor-

mancy and selection on genetic diversity, and of the prospect of detecting sweeps in the presence of dormancy. I agree with the authors that their results are of applied importance in the analysis of many biologically important species, particularly among plants and fungi for which the weak seed bank is a natural model. The main limitations of the results are the assumptions of a

constant population size, and that mutations occur in dormant individuals at the same rate as

in active ones. Both assumptions are discussed and justified by the authors, but the mapping

of diversity patterns arising from dormancy and selection without these assumptions remains

an open task.

Reply: We agree that these models are interesting to explore, but argue that they fall outside the scope of the project, because addressing them would require a substantial number of additional simulations, without adding much novelty to the manuscript.

I have two aesthetic comments on the figures:

1. The neutral line in Figure 3c is effectively invisible. I’m guessing it overlaps almost exactly

with the s = 0.01 line, but it would be useful to say so, or to reformat the plots to make

it obvious.

Reply: We completely redid the plot. Since the average time of fixation of a neutral allele for N=500 is 2000 (2cN with c indicating the ploidy), the neutral allele can be simplify calculated by scaling with 1/b^2. The line corresponds to Ne.s=0.1, i.e. increasing the time to fixation from to 2000 to 32000, which fits the neutral line perfectly now. This is also why we removed it.

2. The colour scheme in Figure 4 also makes it hard to distinguish the different trajectories,

particularly the two blue-hued ones.

Reply: The colour scheme has been changed and all figures are now in high-resolution.

References

[1] L Heinrich, J M ̈uller, A Tellier and D ˇZivkovi ́c. Effects of population- and seed bank

size fluctuations on neutral evolution and efficacy of natural selection. Theor Popul Biol

123:45–69, 2018. [2] D ˇZivkovi ́c and A Tellier. All but sleeping? Consequences of soil seed banks on neutral

and selective diversity in plant species. In Mathematical Modelling in Plant Biology, R

J Morris (ed), Springer, Cham, 2018

+++++++++++++++++++++++++++++++++++

Reviewer 3

Summary

Overall, I think this is an interesting study that leverages existing theory alongside

computational tools, yielding promising results. Ultimately this work should be published, but I

have several minor comments and I believe that the impact of the paper would be

strengthened by addressing them. Most of these comments can be grouped into requests to 1)

elaborate on points the authors made and their connection to population genetic theory and 2)

make alterations to the figures to increase their clarity.

While I am familiar with mathematical models of molecular evolutionary dynamics, I am less

familiar with models of selective sweeps. I have attempted to provide useful feedback and

questions regarding sweeps and their statistics throughout this review.

Major comments

- The authors model the evolution of a recombining population with a seedbank as a

diploid hermaphroditic population. While it may be outside the scope of the manuscript,

it’s worth considering that recombination can be modeled in a more general manner

where there is some probability of recombination per-generation in a haploid

population. Could the authors describe the extent that their model and results can be

extended to haploid populations?

Reply: Previous theoretical results on haploid model (Koopmann et al. 2017, Heinrich et al. 2018) confirm the reviewer’s intuition that the results we obtain are valid for haploids. We expect the same germination rate scaling of the coalescent tree by b^2 and recombination rate scaling of b. Thus, while it would have been possible to design the study using a haploid model, the results are expected to be similar regardless of the chosen ploidy. Initially, when reading how forward simulations work, we started from the work on SLiM by Ben Haller, which is a diploid simulator that inspired the current work. Yet, retrospectively, implementing a haploid model would have likely been easier.

- Line 39: I don’t see why the word “species” is needed here.

Reply: species has been removed.

- Line 200: It would be useful for the unfamiliar reader if the authors made note that the

number of generations they chose for the burn-in corresponds to the coalescent

timescale. My understanding is that this is the timescale of the Kingman coalescent, if so

or if not, please note it in the text.

Reply: This echoes a reply to reviewer 1 (Dr Achaz) above. We added a supplementary figure with the TMRCA times as well as a supplementary table with the burn-in-periods in generations, justifying the large chosen burn-in-period necessary when dealing with recombination and the presence of seed bank.

- Line 204-206: Is there any existing theory that the authors could reference for the order-

of-magnitude timescale required for signatures of a selective sweep to become

detectable?

Reply: The theoretical results are available for classic models (Kim and Stephan (2002) for example), but not in the presence of seed banks. The magnitude is simply 1/b^2, meaning the period that a sweep is potentially detectable is a reflection of the scaling of the coalescent. We added a supplementary figure (Figure S3), to emphasize this.

- Line 219-220: While I understand that an appropriate range of selection coefficients can

be hard to justify, selection coefficients ranging from 0.1-10 seems too high. Even with

the lowest value of the empirical population size (ignoring effective size) of 1,000, the

population-scaled strength of selection is much larger than one for all parameter

combinations (|N*s| >> 1).

Reply: We agree with the reviewer and have now expanded the range from Ne.s=0.1 to Ne.s=100.

I would think that it would be useful to examine signatures

of selective sweeps with and without fixation as the population-scaled strength of

selection (calculated using the effective population size) increases starting below 1 (drift

dominates).

Reply: We wanted to investigate the signatures of sweeps conditioned on fixation as these are the most likely to be detected in real data. Extending to softer sweeps or incomplete sweeps is indeed interesting but increase significantly the parameter space to search and simulate and the result section will be inflated, possibly without a real gain in terms of novelty. We keep this idea in mind for future work on selection and seed banks.

There may be some nuance I’m missing, but as a general pop gen reader

this is the range of parameter values I’d be looking for.-

Line 241–242: It would be helpful for the reader if the authors described the Omega

statistic, how it works, and any additional parameter settings the authors may have

chosen.

Reply: A short description and comments on the choice of parameters have been added.

- Line 271-274: This was a very helpful point, thank you for making it.

Reply: Thank you.

- The strength of selection is not necessarily the key parameter, but rather the

population-scaled strength of selection. Could the authors discuss that quantity in this

sentence instead?

Reply: Thank you for pointing this out. We have included the expectations for the times to fixation without dormancy for different Ne.s, so that the interpretation of our results has changed.

- Line 345-348: This was explained well.

Reply: Thanks

- Fig. 2: a) The recombination rates should be listed in a legend inside the figure. And

what do the boxplots represent? Are they quantile plots? Is so, given that these are

simulation results, it may be more useful for the reader if the spread around the mean

was plotted as 95% confidence intervals.

Reply: We changed the plotting, so the recombination rate is now at the x-Axis and added an explanation about the boxplots in the legend. We additionally plotted the mean and since we changed the axis. Boxplots are likely a sufficient way of representing the data. We agree that if we use the old axis, 95% line plots would have been better. And we further added a supplementary figure with the absolute values. We hope the plots are now clearer.

b) “r2” should be “r2 ” and there should be an

equal sign for “b” in the figure legend. And what are distance bins? Should these be

thought of as base pairs?

Reply: We have changed the legend for figure 2.

- Fig3: a) I do not understand the x axis. Are these selection coefficients ranging from 1 to

5? If so, this is the plot where I’d like to see the population scaled strength of selection

(|Ne *s|) over a logarithmic range of values with |Ne *s| = 1 as a midpoint. The shaded

areas around the lines are not described in the legend and it would be helpful if the

parameter b was included in the figure legend (e.g, b=0.25, b=0.35, etc).

Reply: We are sorry, the axis range was not a good representation of suitable selection coefficients. We completely overhauled the figure now. And are always plotting effective population size scaled values.

b) It would be helpful if the y axis was on a log base 10 scale. Also, “Germination rate, 1/b” should be

written on the x-axis. Again, it would be helpful if “s” was included on the figure legend

(e.g., s=0.01, s=0.1,). C) This is a useful plot, but I think it would drive the take-home

point home is a horizontal line at y=1 was included as a null as dormancy decreases.

Reply: We completely reformatted the figure, to address the points mentioned. And additionally added theoretical expectations based on rescaled N alone in blue.

Also, “Germination rate, 1/b” should be written on the x-axis.

Reply: We completely overhauled the figures, added a population size scaled legend, and added log-scaled subplots and adapted the legend.

- Fig. 4: These are nice plots, but they need some tweaking to help the reader. What does

the shaded area around a line represent? Are these standard errors or 95% CIs. All

parameters in the legends should have equal signs. And “sc” is used in the figure,

whereas “s” is used in the legend. Are these the selection coefficients?

Reply: Figures have been tweaked accordingly and notations are consistent between figures.

- Fig. 5: The y axis could be more informative. Could it be labelled “Probability of

detecting a sweep”? it would be helpful if the parameter under “model” was included in

the figure legend with an equal sign.

Reply: The axis has been changed and the parameters are now part of the legend's title.

Minor comments

- The authors occasionally use the first letter of an author’s first name in in-text citations.

Is there a reason for this?

If not, they should be removed from the revision. I noticed this on lines 42, 84, and 170, for example. I did not count all the occurrences in the

manuscript.

Reply: We changed the citation style to resolve this issue.

- Throughout the manuscript the authors use “e” notation to describe the orders-of-

magnitude of numbers. Could they switch to base-10 notation (e.g., 10-3 instead of 1e-

3?).

Reply: This “e” notation has been removed.

- I may have missed it, but I didn’t see the simulated data repository mentioned in the

manuscript. If it’s not there,

Reply: Indeed the simulated data repository was missing. It is now available on gitlab.

- Lines 24-26: Could bacteria be included here as well? I understand that bacteria are

typically thought of as asexual, but recent research efforts point towards their rate of

recombination being higher than previously thought (Garud, Good et al., 2019; Good,

2020; Sakoparnig et al., 2021).

Reply: Because of the underlying molecular mechanism of how recombination occurs in bacteria versus plants/animals, we are very cautious and do not think that our results are easily transferable. However a recent Bacterial Slimulator code has been developed which could be extended to introduce strong dormancy/seed bank in the future (https://peercommunityjournal.org/articles/10.24072/pcjournal.72/)

- Lines 49: This sentence is somewhat unclear to me. Unless I’m misunderstanding

something, existence of dormancy increases the true T_MRCA in a population, so it will

increase the T_MRCA estimated from a sample from a population. “of a sample or

population“ implies exchangeability between a sample and the true population, which I

don’t think is the authors’ intent.

Reply: Ideally, a sample large enough should have the same TMRCA as the population, so they should be ideally exchangeable. As it may be confusing, we remove the sample part and only leave the population.

- Lines 61-63: I think it’s worth mentioning that the weak seed bank model is also

applicable to cases where the existence of a seed bank is experimentally imposed,

where it is unfeasible for the observer to observe a system over the timescale required

for the strong seed bank effect. This could include bacteria (e.g., Shoemaker et al.,

2022).

Reply: Cool suggestion, thank you. It has been added.

- Lines 74-77: This was well said.

Reply: Thanks.

- Line 100: How do mutation rates work here? Seeds don’t reproduce, so they’re

acquiring mutations at some rate per-unit time (days, years, etc.), whereas the above-

ground plants are acquiring mutations at a rate of per-generation. Are there any rough

estimates of how these rates compare when they have the same units?

Reply: As seeds are essentially just “parents” from previous generations and time is discrete, we have exactly the same rate in the current generation and “seed generation”, which is in units of mutation events per bp per generation. And because of that, we can simply add the mutations to the branches afterwards, if required. There is no need to distinguish a parent that emerged from a seed or from the previous generations when applying the mutation rate. We hope that Figure 1 as well as justifications L.76 clarify this assumption. We choose this definition for convenience of modeling and due to the lack of empirical evidence regarding the empirical mutation rate in seeds per real time unit (days, year…). It could be expected that mutations accumulate in very long dormant seeds compared to short dormant seeds, but again the empirical evidence is so far lacking (and many mutations occurring in seeds maybe deleterious, see argument in Vitalis et al. 2004).

- Line 227-228: Should it be either “frequency of one” or “size of 2N”.

Reply: Modified “to size of”.

- Line 300-302: I’m unsure what the shaded areas are in the referenced figure, but if the

authors resample the simulated data with replacement to obtain 95% CIs, would they

fall within the theoretical prediction? This may allow you to confirm that stochasticity

was a contributing factor.

Reply: This statement was out of date as the updated simulator does not have this issue (the probability of fixation is unaffected by dormancy).

- Line 329-332: “However, when the beneficial allele reaches fixation, the time to

fixation” I do not understand the framing of this sentence. It seems to be implying that there is some non-zero time to fixation once an allele becomes fixed (i.e., all individuals

have the mutation).

Reply: This sentence has been corrected.

- Line 333-334: “yielding the counter-intuitive result that dormancy enhances the

efficiency of selection compared to genetic drift”. I don’t understand what “efficiency”

means here. The plot in Fig. 3c examined the time to fixation with a seedbank relative to

the time to fixation without a seed bank. This ratio seems to increase with germination

rate for all selection coefficients, so I do not see how the notion of “efficiency” fits in

here.

Reply: The sentence was removed, because it was not longer holding true with the updated simulator.

- Line 423: replace “>=” with “≥”

Reply: Fixed

- Line 460: What are the CLR tests?

Reply: Composite likelihood ratio test of SweeD. This sentence has been reformulated and we remove the “CLR-Test” as it was not needed here.

- Line 473: Should “reduce” be plural?

Reply: Fixed.

- Listing 1: Should “analysis of” be “analysis”?

Reply: corrected

- Fig. 1: Do the authors have a version of the figure with higher resolution? In its present

form, the resolution contrasts with the resolution of the PDF, drawing the eye towards

the difference.

Reply: All Figures are high-resolution pdfs now.

- Appendix: In both figures in the appendix, it would be helpful if equal signs were

included for the parameters. It is also unclear what the parameter “d” represents.

Reply: all figures have been adapted.

References

Garud, N. R., Good, B. H., Hallatschek, O., & Pollard, K. S. (2019). Evolutionary dynamics of

bacteria in the gut microbiome within and across hosts. PLOS Biology, 17(1), e3000102.

https://doi.org/10.1371/journal.pbio.3000102

Good, B. H. (2020). Linkage disequilibrium between rare mutations. BioRxiv,

2020.12.10.420042. https://doi.org/10.1101/2020.12.10.420042

Sakoparnig, T., Field, C., & van Nimwegen, E. (2021). Whole genome phylogenies reflect the

distributions of recombination rates for many bacterial species. ELife, 10, e65366.

https://doi.org/10.7554/eLife.65366Shoemaker, W. R., Polezhaeva, E., Givens, K. B., & Lennon, J. T. (2022). Seed banks alter the

molecular evolutionary dynamics of Bacillus subtilis. Genetics, 221(2), iyac071.

https://doi.org/10.1093/genetics/iyac071

++++++++++++++++++++++++++++++++++++++++

Review by Simon Boitard, 24 Jun 2022 10:03

In this study, Kevin Korfmann and colleagues describe a new software for the simulation of genome sequences with weak seed banking, an evolution model that is relevant to many plant, fungi or invertebrate species. This simulation tool exploits the recently developed tskit library, which makes it very efficient. It allows to simulate evolution models including both seed banking and selective sweeps, which had never been done so far. Thanks to this new tool, the authors investigate several important properties of selective sweeps under a weak seed bank model : the fixation probability and fixation time of new beneficial mutations, the genetic diversity patterns around a selective sweep, and the detection power of selective sweeps. Comparing these properties with those of a standard Wright-Fisher model, they conclude that (i) the efficacy of selection is increased by seed banking for strong selection, but not for weak selection ; (ii) the region of reduced diversity around a sweep is narrower for seed bank models ; (iii) past sweep signatures may be detectable for a longer time period in seed bank models.

These new results provide interesting insights on an important evolutionary process, and the new simulation software developed by the authors will certainly be very useful to researchers wishing to test hypotheses or perform simulation based inference under seed bank models. The context of the study and the underlying mathematical models are nicely introduced, and the manuscript is well structured and generally easy to read. However, some of the conclusions could be more clearly phrased or justified and a limited number of additional analyzes could help strengthening them. These suggestions and other minor points are detailed below.

Reply: Thanks to Dr Boitard for his kind words and summary of our work.

Dynamics of alleles under positive selection :

- The general conclusion that seed banks « magnify » the efficacy of selection in the case of strong selection, made for instance lines 411-413, is not obvious to me and should be more carefully justified, or rephrased. For strong selection, the time to fixation is indeed reduced by seed banking (relatively to the time of fixation of neutral alleles), but on the other hand the fixation probability is increased. It would be important to stress that we have here two opposite effects. Similarly, the discussion in section 4.1 is not precise enough concerning the existence of these two opposite effects, because the expression ‘selection is delayed’ is used to describe the effect of seed banks on both fixation time and fixation probability.

Reply: In echo to comment from Dr Achaz, this part has been rewritten/removed, because it was no longer correct. Please see the next comment, too.

- Figure 3 : to illustrate the idea that selection is more efficient with seed banks (l 334-334), panel c could maybe be normalized by the fixation time under neutrality ? To better compare the effect of seed banks on fixation probability and fixation time, it would maybe help to plot panel a as a function of 1/b, with one curve / color per selection coefficient, and possibly to also normalize it by the probability under neutrality ? Finally, panels b/c would be easier to read if the legend for ‘neutral’ evolution would be the top row, so that this comes just before s=0.01 (similarly, the theroretical prediction in the legend of panel a could be the last row).

Reply: We have revised Figure 3 now, and added in theoretical expectation of times to fixation of populations which are rescaled by 1/b². Additionally, we added appendix figures S2-4, showing which phase (stochastic, or deterministic phases) contribute most to the fixation/loss of alleles under seed bank. We also include a percentage figure to describe which phase contributes the most to allele fixation/loss with respect to the selection coefficients. All these new results show that selection is actually less efficient in the seed bank, because the seed spend more time in the stochastic phases, as we initially thought. Now, we can clearly see the scaling of 1/b² for the neutral case and 1/b for strong selection (as expected from Koopmann et al. 2017). Additionally, computational improvements/corrections made it clear that the probability of fixation is unaffected in the seedbank. We hope by adding all those new figures, as well as theoretical expectations, our message is clearer now.

- Besides these clarifications in the text and in Figure 3, one way to reach a global conclusion on the influence of seed banks on selection dynamics could be to compute the expected number of fixation events per time unit, which would account for both the frequency and the speed of sweep events.

Reply: Thanks for the suggestion. To reach a global conclusion and provide better evidence, we put substantial effort into deciphering the individual contribution of each phase (stochastic, deterministic) to the lengthening of the seed bank. We also include a percentage contribution figure (Figure S4), showing which phase contributes the most of the time spent to reach fixation, indicating that seed banks actually tend to lengthen the stochastic phase. Additionally, we added theoretical expectations in blue (Figure S3) to investigate our initial impression of “more efficient” selection compared to a rescaled population size, which turned out not to be true. So we tone down our conclusion here and revise our initial idea from Koopmann et al. (2017).

Genetic diversity patterns around a sweep :

- l 341-345 : it is clear from Figure 4 that the sweep window is reduced by seed banks, but the interpretation of this result is not so obvious to me. While effective recombination rate is indeed higher with seed banks, fixation time is also shorter (cf section 3.3) so there are fewer recombination opportunities during the sweep (in classical models of selective sweeps, the probability to escape the sweep is related to the product r*tau, where r is the recombination rate and tau the duration of the sweep). I would be interested to have the authors’ feedback on this point.

Reply: We now include a description of this effect along with Figure 4. We note that the new results show less of a difference in the width of a sweep signature between seed bank and absence of seed bank compared to the previous version (see also Fig S10).

- l 345-348 : I don’t understand the rationale here. Should we relate explanation 1) to absolute diversity and explanation 2) to relative diversity ?

Reply: This part has been rewritten. Relative diversity refers to the scaling by 1/b², which can easily be accounted for by scaling the population size by that factor.

- l 354-356 : the observation that no model without seed bank can mimic the genomic patterns of a specific model with seed bank is very interesting ; nice idea !

Reply: Thanks.

- l 368 : « narrower » is exact but « sharper » is not obvious, when looking at the normalized curves of Figure 4b.

Reply: We have added appendix figure S10 to make this effect more apparent.

Detection of selective sweeps :

- Linkage disequilibrium is one of the possible approaches to detect hard selective sweeps. Another very common one is to look at the SFS, as mentionned by the authors in the discusion (software Sweed). Comparing this approach with OmegaPlus on the data already simulated would strenghten the conclusions of the study.

Reply: We have now added a SweeD analysis.

- The comparison between models at lines 380-398 seems a bit approximative. This could easily be solved by indicating the exact detection threshold associated to 10% false positives on each panel of Figure 5 (the x axis of panels b/d could be rescaled to focus on the left part on the graph).

Reply: The figure has been replaced. In the new Figure, we indicated the 5% threshold with vertical dotted lines.

- l 471-474 : there are a number of claims here that would really require more justification, for instance with precise references to previous studies. None of these claims is really intuitive to me.

Reply: This section has been rewritten and streamline, and these sentences were removed.

Model definition :

- l 153 & 164 : in order to maintain a population of N diploids, I suppose that each sampled individual actually contributes two gametes, not one.

Reply: We sample two times N individuals, each contributing one gamete, or part of a gamete (in the case of recombination.) We added lines 121-122 to clarify this point.

- Equation (1) : if I refer to the last line, Y_i is a probability, not a stochastic variable. To be consistent with this notation, the first line should be something like Y_i=P(Generation=k_i)=…

Reply: corrected (line L130).

- Equation (2) : the first parameter of the multinomial is the number of draws; I think this is 1 here, not m.

Reply: Exactly, it should have been 1 indeed. In the new rewritten version, it is now N because we draw a N dormant generation, one for each parent.

Minor points :

- l 14 : « means » → « implies »

Reply: Changed.

- l 21-22 : this is an important conclusion of the study so I would avoid the use of ‘respectively’ within parentheses and write a specific sentence or sub-sentence for each type of selection.

Reply: it has been rewritten.

- l23-24 : In section 3.4 the authors show that selective sweeps are detected for longer time with seed banks, but explain it by the fact that new mutations need longer time to accumulate from the end of the sweep. The link with recombination rate made in this sentence is thus not justified.

Reply: The sentence has been corrected, the longer detectability is due to the rescaling of the ARG, not the increased recombination rate.

- l 62 : the formulation « a few ... compared to ... » is not correct.

Reply: Modified.

- l 65 : « as we ... » → « in order to » ?

Reply: corrected.

- l 220 : I suppose that 1.1 should be 1 ?

Reply: We specifically test dominance coefficients of 0.1, 0.5 and 1.1, representing recessive, co-dominant and overdominant beneficial mutations.

- l 248 : what is the meaning of these parameters ? Actually, maybe a few lines about how OmegaPlus works would be helpful.

Reply: this has been addressed now.

- l 250-251 : this last sentence is a bit confusing, because no previous reference was done to the notion of « recovery scenario ».

Reply: The sentence has been rephrased.

- l 308-309 : The first of these two sentences could be removed, to my opinion it makes things more complex than they actually are ; the important point here is that the ratio of the probabilities with and without seed banks decreases for stronger selection.

Reply: This sentence has been removed since the probability of fixation is not affected anymore.

- l 329-332 : Maybe say clearly here than fixation is faster than expected based on 1/b².

Reply: This section has been rewritten in light of new results, as the seed bank is not faster compared to a population rescaling but actually slower (see blue expectation lines in Figure 3).

- l 364 : « as a result of » → « based on »

Reply: corrected.

- l 366-367 : Figure 4 does not explore point 2 (influence of the time elapsed since the sweep).

Reply: We added appendix Figure S5 to explore the time since the sweep by plotting the sweep signatures from 1,000 – 16,000 generations after the fixation has occurred. This shows that sweeps can indeed be detected for much longer, when there is a seed bank.

https://doi.org/10.24072/pci.evolbiol.100552.ar1

Decision by Renaud Vitalis, posted 20 Jul 2022

Dear Dr Korfmann,

your manuscript entitled "Weak seed banks influence the signature and detectability of selective sweeps" submitted to PCI Evol Biol has now been examined by four reviewers. All reviewers and I agree that your study is sound, and addresses an interesting and timely question in population genetics: how seed or egg dormancy influences evolutionary dynamics, in particular the fate of advantageous mutations, and the power to detect them. Furthermore, the release of a new simulation tool, which generates SNP data from evolutionary models including weak seed banking and selection, is much appreciated.

Based on the reviews and my own appreciation, I would be glad to consider a revised version of your manuscript for recommendation in PCI Evol Biol, provided you carefully address the reviewers’ points and comments, and provide a point-by-point response to their concerns.

In addition to the reviewers’ comments, I invite you to consider the following points:

My understanding of the model description (section 2.1) is that, each generation, "parents" are picked from a randomly chosen age group (see lines 161-162), and one "gamete" is generated from each of these parents to create a "seed" (haploid model). This wording choice introduces the notion that parents are dormant (i.e., in a non-reproductive stage) rather than seeds. Although I agree that it amounts to the same from a modelling perspective, I wonder whether the text could be clearer by considering, e.g., that each generation parents produce seeds, and that seeds enter a dormant stage at rate (1 – b).

Although the manuscript is overall well-structured and nicely organized, I concur with the reviewers that some of your conclusions could be more clearly stated. For example, the claim made lines 325-326 that "Taken together, the results in Figure 3a, 3b and 3c demonstrate that [weak] selection is slowed down by dormancy", is opposed to the claim made lines 333-334 that "[...] yielding the counter-intuitive result that dormancy enhances the efficiency of [strong] selection compared to genetic drift": (i) it is not clear to me how the latter conclusion is reached from Figure 3; (ii) using different wordings ("slow down", "enhance the efficiency", "magnify", etc.) as synonyms does not help.

Figures

As the reviewers, I believe that Figures 2-5 (and their legends) could be improved. In addition to reviewers’ comments, I noticed that, in each panel, ticks are missing from the bottom and left axes.

In Figure 4d: "b 1.0 r 5e-5" should read "b 0.35 r 5e-5". Please provide the effective size for this set of parameter values, for the purpose of comparison with a model without a seed bank (in order to justify the claim lines 353-354 that "the results in Figure 4 cannot be produced by scaling only the effective population size in the absence of dormancy").

Open access

The reviewers appreciated the release of simulation codes on a public repository. However, an anonymous reviewer found it difficult to run the example Jupyter scripts. Please, consider expanding the installation guide, as suggested by this reviewer.

Conflict of interest

The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article."

Minor points

Line 152: define explicitly the X_parent variable; the multinomial notation is cumbersome and should use vector notations instead. Idem in Equation (2).

Cite "Maynard Smith and Haig 1974", instead of "Smith and Haig 1974" (see lines 343, 417, 458, and reference [38]).

I thank you very much for submitting your manuscript to PCI Evol Biol, and I look forward to receiving your revised manuscript.

Best regards,
Renaud Vitalis

https://doi.org/10.24072/pci.evolbiol.100552.d1

Reviewed by Guillaume Achaz, 08 Jul 2022

The ms by Korfmann et al. describes a diploid Wright-Fisher model with seed banking together with a C++ implementation that performs Monte-Carlo simulations. The ms is sound, very clearly written and addresses an interesting question of evolutionary biology. I believe it has the potential to be recommanded by PCI Evol Biol. I nonethless have some suggestions that could help clarifying some of the points discussed by the authors.

I know realize that there are many points, but they are all easy to fix or be discussed.

:: Main comments ::

- I am unsure the interpretation of the Probability of fixation (Pfix hereafter) is sound. For a sweep in a finite population, there is a drift barrier of low frequency and the fate of the beneficial allele is setelled in the first generations. I suspect that with seed dormancy, the probability to leave 0 descendants for the beneficial allele in the m first generations (meaning never leave any descendant) is simply higher. This would be the same for the probability of extinction in general after securing few descendants in the first generations. I am late in my review and only attempted the case of m=2, but this seems doable without much pain for m generations for any kind of distribution. The effect of sampling from several previous generations for each descendant cannot be simply synthesized as "longer time for drift to eliminate the beneficial allele". By some work, using birth and death processes (see specific point on l302 below), I suspect the probability of fixation can be computed with some extra-work (this is not what I expect from the authors in the present article, but it could be worth doing it on a future work).

- The fixation time (which should be corrected all along this article by "mean fixation time") is very well approximated for a finite haploid population of size N by

E(Tfix) = 2 ln (N . Pfix +\gamma_e)/s

in a haploid WF model, where \gamma_e is the Euler constant. As the intensity of dormancy (\beta) will enter on Pfix within the log_e, we can immediately see that the relationship will not be trivial. So it does not come as a surprise that there is not simple relationship between beta and E(Tfix). I have not worked the diploid version with h != 0.5, but it may well add a layer of complication adding further complication to the relationship.

- There is a need to explain how the software OmegaPlus works and what is the value "Omega" for casual (understand lazy) readers that would like to read your article without having the duty of reading the OmegaPlus articles. Just the basic ideas would be a great added value.

- Figure 2a, you are plotting average TMRCAs when the recombination rate is >0, right ?

:: Specific comments ::

Abstract: l21-22: the sentence about magnifying and lowering selection is especially confusing. If it decreases the Pfix, how could it magnify selection? What "magnify" means here is unclear.

l31- authors list all sorts of organisms but not fungi... but they produce spores. This is likely due to my ignorance, but I would have guessed that fungi tend to sporulate when the nutriments are less abundant (this is at least true for S cerevisiae and S pombe) and are likely concerned by dormancy.

l40-41: buffering mutation rate? What does this mean?

l80: any intuition on the formula?

l90 : effective population size is always fuzzy. I believe you refer to "inbreeding" population size. It would not cost much too state 'inbreeding effective population size".

l96: there is an unnecessary ")" in the \theta definition

l102: "Only one" can be replaced by "The non-dormant" or "The active" or any other more precise wording.

l116 : we read again the "efficacy" of selection. This is confusing. Mean fixation time is longer but at the same time the "efficacy" is increased. What are we measuring then, since mean Tfix is longer and Pfix is smaller? How could it be more efficient?

l152 + l162 : please define properly X and G. I am not sure parent "1" is the best subscript as it is true for all individuals.

l159: numerator is also clear when written as 1-(1-b)^m (and faster to computer)

l160: a truncated geometric would be an adequate name for the distribution.

par l163-l171: this implies there is a single chromosome of size 1, but later (par l208-l220) there is a variable map length. So I think Poisson (L\mu) and Poisson (Lr) can be introduced here and L can be defined as the map length. Please mention that a single chromosome is considered.

l182-l184: As the probability of being chosen is inflated for all generations, it seems to me that the survival probability is overall inflated. So I am not sure whether the sentence is truly correct.

l196-198: not easy to grasp on for someone who doesn't know the arcane of tskit. Can the author provide a sentence to at least get an idea of what is going on?

l200: I believe 5pN is better choice for the burn-in (where p is the ploidy). This stems from Malécot recursion on Heterozygosity that will be almost at equilibrium (to a 1% relative difference) at approximately 5 times the chromosome pool size (N for haploids).

par l222-232: For the haploid case, an easy way to condition on fixation is simply to set a random individual at the selected type and attribute the others uniformly as in a regular WF model. I wonder if there would be a similar trick for this case (no fix is required, this is just a hint on how to speed up the simulator for future work).

l236: would it be appropriate to (also) cite Tajima 83 for "Tajima's \pi" (that was noted k in 83)?

l258 - Is running time also very efficient for s=0.01 or lower s values?

l292- I think the recombination events are 1 per coalescent tree (in a tree sequence) or NLr/b per coalescent unit.

Fig2a - make sure it is 2 in coalescent unit for b=0.

l 302 - I am not sure the difference comes from the N=500. 1-exp(-2s) is only valid for small s. Another potential derivation is to use the probability of survival of a critical birth-death that starts with 1 individual, who has a Poisson number of descendants (of mean \lambda), leading to 1-Pfix = Exp(- \lambda Pfix ). The dominance may add some complications, though. I believe you used h=1 here ?

Figure 3: please specify the parameters. N, h, range of s.

l 327- I don't understand the meaning of this sentence.

par l329-l334: see main points above.

Figure 4.d: what is N for b=0.35?

l404: "a period of 1 / 2 Ne or 1 / 2 Ne s"? What do I don't understand?

l431-432: see main points above

https://doi.org/10.24072/pci.evolbiol.100552.rev11

Reviewed by Jere Koskela, 06 Jun 2022

Download the review https://doi.org/10.24072/pci.evolbiol.100552.rev12

Reviewed by William Shoemaker, 16 Jun 2022

Download the review https://doi.org/10.24072/pci.evolbiol.100552.rev13

Reviewed by Simon Boitard, 24 Jun 2022

In this study, Kevin Korfmann and colleagues describe a new software for the simulation of genome sequences with weak seed banking, an evolution model that is relevant to many plant, fungi or invertebrate species. This simulation tool exploits the recently developed tskit library, which makes it very efficient. It allows to simulate evolution models including both seed banking and selective sweeps, which had never been done so far. Thanks to this new tool, the authors investigate several important properties of selective sweeps under a weak seed bank model : the fixation probability and fixation time of new beneficial mutations, the genetic diversity patterns around a selective sweep, and the detection power of selective sweeps. Comparing these properties with those of a standard Wright-Fisher model, they conclude that (i) the efficacy of selection is increased by seed banking for strong selection, but not for weak selection ; (ii) the region of reduced diversity around a sweep is narrower for seed bank models ; (iii) past sweep signatures may be detectable for a longer time period in seed bank models.

These new results provide interesting insights on an important evolutionary process, and the new simulation software developed by the authors will certainly be very useful to researchers wishing to test hypotheses or perform simulation based inference under seed bank models. The context of the study and the underlying mathematical models are nicely introduced, and the manuscript is well structured and generally easy to read. However, some of the conclusions could be more clearly phrased or justified and a limited number of additional analyzes could help strengthening them. These suggestions and other minor points are detailed below.

Dynamics of alleles under positive selection :

- The general conclusion that seed banks « magnify » the efficacy of selection in the case of strong selection, made for instance lines 411-413, is not obvious to me and should be more carefully justified, or rephrased. For strong selection, the time to fixation is indeed reduced by seed banking (relatively to the time of fixation of neutral alleles), but on the other hand the fixation probability is increased. It would be important to stress that we have here two opposite effects. Similarly, the discussion in section 4.1 is not precise enough concerning the existence of these two opposite effects, because the expression ‘selection is delayed’ is used to describe the effect of seed banks on both fixation time and fixation probability.

- Figure 3 : to illustrate the idea that selection is more efficient with seed banks (l 334-334), panel c could maybe be normalized by the fixation time under neutrality ? To better compare the effect of seed banks on fixation probability and fixation time, it would maybe help to plot panel a as a function of 1/b, with one curve / color per selection coefficient, and possibly to also normalize it by the probability under neutrality ? Finally, panels b/c would be easier to read if the legend for ‘neutral’ evolution would be the top row, so that this comes just before s=0.01 (similarly, the theroretical prediction in the legend of panel a could be the last row).

- Besides these clarifications in the text and in Figure 3, one way to reach a global conclusion on the influence of seed banks on selection dynamics could be to compute the expected number of fixation events per time unit, which would account for both the frequency and the speed of sweep events.

Genetic diversity patterns around a sweep :

- l 341-345 : it is clear from Figure 4 that the sweep window is reduced by seed banks, but the interpretation of this result is not so obvious to me. While effective recombination rate is indeed higher with seed banks, fixation time is also shorter (cf section 3.3) so there are fewer recombination opportunities during the sweep (in classical models of selective sweeps, the probability to escape the sweep is related to the product r*tau, where r is the recombination rate and tau the duration of the sweep). I would be interested to have the authors’ feedback on this point.

- l 345-348 : I don’t understand the rationale here. Should we relate explanation 1) to absolute diversity and explanation 2) to relative diversity ?

- l 354-356 : the observation that no model without seed bank can mimic the genomic patterns of a specific model with seed bank is very interesting ; nice idea !

- l 368 : « narrower » is exact but « sharper » is not obvious, when looking at the normalized curves of Figure 4b.

Detection of selective sweeps :

- Linkage disequilibrium is one of the possible approaches to detect hard selective sweeps. Another very common one is to look at the SFS, as mentionned by the authors in the discusion (software Sweed). Comparing this approach with OmegaPlus on the data already simulated would strenghten the conclusions of the study.

- The comparison between models at lines 380-398 seems a bit approximative. This could easily be solved by indicating the exact detection threshold associated to 10% false positives on each panel of Figure 5 (the x axis of panels b/d could be rescaled to focus on the left part on the graph).

- l 471-474 : there are a number of claims here that would really require more justification, for instance with precise references to previous studies. None of these claims is really intuitive to me.

Model definition :

- l 153 & 164 : in order to maintain a population of N diploids, I suppose that each sampled individual actually contributes two gametes, not one.

- Equation (1) : if I refer to the last line, Y_i is a probability, not a stochastic variable. To be consistent with this notation, the first line should be something like Y_i=P(Generation=k_i)=…

- Equation (2) : the first parameter of the multinomial is the number of draws; I think this is 1 here, not m.

Minor points :

- l 14 : « means » → « implies »

- l 21-22 : this is an important conclusion of the study so I would avoid the use of ‘respectively’ within parentheses and write a specific sentence or sub-sentence for each type of selection.

- l23-24 : In section 3.4 the authors show that selective sweeps are detected for longer time with seed banks, but explain it by the fact that new mutations need longer time to accumulate from the end of the sweep. The link with recombination rate made in this sentence is thus not justified.

- l 62 : the formulation « a few ... compared to ... » is not correct.

- l 65 : « as we ... » → « in order to » ?

- l 220 : I suppose that 1.1 should be 1 ?

- l 248 : what is the meaning of these parameters ? Actually, maybe a few lines about how OmegaPlus works would be helpful.

- l 250-251 : this last sentence is a bit confusing, because no previous reference was done to the notion of « recovery scenario ».

- l 308-309 : The first of these two sentences could be removed, to my opinion it makes things more complex than they actually are ; the important point here is that the ratio of the probabilities with and without seed banks decreases for stronger selection.

- l 329-332 : Maybe say clearly here than fixation is faster than expected based on 1/b².

- l 364 : « as a result of » → « based on »

- l 366-367 : Figure 4 does not explore point 2 (influence of the time elapsed since the sweep).

https://doi.org/10.24072/pci.evolbiol.100552.rev14

User comments

No user comments yet

or Register
Submit a preprint