Close printable page

Recommendation

Is convergence an evidence for positive selection?

Guillaume Achaz based on reviews by Jeffrey Townsend and 1 anonymous reviewer

A recommendation of:

Convergent evolution as an indicator for selection during acute HIV-1 infection

Frederic Bertels, Karin J Metzner, Roland R Regoes (2018), bioRxiv, 168260, ver. 4 peer-reviewed and recommended by Peer Community in Evolutionary Biology https://doi.org/10.1101/168260

Read preprint in preprint server Now published in Peer Community Journal

Data used for results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Convergent evolution as an indicator for selection during acute HIV-1 infection

Convergent evolution describes the process of different populations acquiring similar phenotypes or genotypes. Complex organisms with large genomes only rarely and only under very strong selection converge to the same genotype. In contrast, independent virus populations with very small genomes often acquire identical mutations. Here we test the hypothesis of whether convergence in early HIV-1 infection is common enough to serve as an indicator for selection. To this end, we measure the number of convergent mutations in a well-studied dataset of full-length HIV-1 env genes sampled from HIV-1 infected individuals during early infection. We compare this data to a neutral model and find an excess of convergent mutations. Convergent mutations are not evenly distributed across the env gene, but more likely to occur in gp41, which suggests that convergent mutations provide a selective advantage and hence are positively selected. In contrast, mutations that are only found in an HIV-1 population of a single individual are significantly affected by purifying selection. Our analysis suggests that comparisons between convergent and private mutations with neutral models allow us to identify positive and negative selection in small viral genomes. Our results also show that selection significantly shapes HIV-1 populations even before the onset of the adaptive immune system.

HIV-1, parallel evolution, convergent evolution, selection

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

التطور المتقارب كمؤشر للاختيار أثناء الإصابة الحادة بفيروس نقص المناعة البشرية -1

يصف التطور المتقارب عملية اكتساب المجموعات السكانية المختلفة لأنماط ظاهرية أو أنماط وراثية متشابهة. الكائنات المعقدة ذات الجينومات الكبيرة نادرًا ما تتقارب مع نفس النمط الجيني إلا في ظل انتقاء قوي جدًا. في المقابل، غالبًا ما تكتسب مجموعات الفيروسات المستقلة ذات الجينومات الصغيرة جدًا طفرات متطابقة. نحن هنا نختبر فرضية ما إذا كان التقارب في الإصابة المبكرة بفيروس العوز المناعي البشري -1 شائعًا بدرجة كافية ليكون بمثابة مؤشر للاختيار. ولتحقيق هذه الغاية، نقوم بقياس عدد الطفرات المتقاربة في مجموعة بيانات مدروسة جيدًا لجينات البيئة الكاملة لفيروس نقص المناعة البشرية-1 التي تم أخذ عينات منها من الأفراد المصابين بفيروس نقص المناعة البشرية-1 أثناء الإصابة المبكرة. نقارن هذه البيانات بنموذج محايد ونجد فائضًا من الطفرات المتقاربة. لا يتم توزيع الطفرات المتقاربة بالتساوي عبر جين env، ولكن من المرجح أن تحدث في gp41، مما يشير إلى أن الطفرات المتقاربة توفر ميزة انتقائية وبالتالي يتم اختيارها بشكل إيجابي. في المقابل، فإن الطفرات التي لا توجد إلا في مجموعة فيروس نقص المناعة البشرية-1 لفرد واحد تتأثر بشكل كبير بالانتقاء المنقي. يشير تحليلنا إلى أن المقارنات بين الطفرات المتقاربة والخاصة مع النماذج المحايدة تسمح لنا بتحديد الاختيار الإيجابي والسلبي في الجينومات الفيروسية الصغيرة. تظهر نتائجنا أيضًا أن الانتقاء يشكل بشكل كبير مجموعات فيروس نقص المناعة البشرية -1 حتى قبل ظهور الجهاز المناعي التكيفي.

فيروس نقص المناعة البشرية-1، التطور الموازي، التطور المتقارب، الاختيار

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

La evolución convergente como indicador de selección durante la infección aguda por VIH-1

La evolución convergente describe el proceso por el cual diferentes poblaciones adquieren fenotipos o genotipos similares. Los organismos complejos con genomas grandes rara vez y sólo bajo una selección muy fuerte convergen al mismo genotipo. Por el contrario, las poblaciones de virus independientes con genomas muy pequeños suelen adquirir mutaciones idénticas. Aquí probamos la hipótesis de si la convergencia en la infección temprana por VIH-1 es lo suficientemente común como para servir como indicador de selección. Con este fin, medimos el número de mutaciones convergentes en un conjunto de datos bien estudiado de genes env del VIH-1 de longitud completa tomados de individuos infectados por el VIH-1 durante la infección temprana. Comparamos estos datos con un modelo neutral y encontramos un exceso de mutaciones convergentes. Las mutaciones convergentes no se distribuyen uniformemente en el gen env, pero es más probable que ocurran en gp41, lo que sugiere que las mutaciones convergentes proporcionan una ventaja selectiva y, por lo tanto, se seleccionan positivamente. Por el contrario, las mutaciones que sólo se encuentran en una población VIH-1 de un solo individuo se ven significativamente afectadas por la selección purificadora. Nuestro análisis sugiere que las comparaciones entre mutaciones convergentes y privadas con modelos neutrales nos permiten identificar la selección positiva y negativa en genomas virales pequeños. Nuestros resultados también muestran que la selección moldea significativamente las poblaciones de VIH-1 incluso antes de la aparición del sistema inmunológico adaptativo.

VIH-1, evolución paralela, evolución convergente, selección

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Evolution convergente comme indicateur de sélection lors d'une infection aiguë par le VIH-1

L'évolution convergente décrit le processus par lequel différentes populations acquièrent des phénotypes ou des génotypes similaires. Les organismes complexes dotés de grands génomes ne convergent que rarement et seulement sous une très forte sélection vers le même génotype. En revanche, les populations virales indépendantes dotées de très petits génomes acquièrent souvent des mutations identiques. Nous testons ici l'hypothèse de savoir si la convergence dans l'infection précoce par le VIH-1 est suffisamment courante pour servir d'indicateur de sélection. À cette fin, nous mesurons le nombre de mutations convergentes dans un ensemble de données bien étudiées de gènes env complets du VIH-1 échantillonnés auprès d'individus infectés par le VIH-1 au début de l'infection. Nous comparons ces données à un modèle neutre et constatons un excès de mutations convergentes. Les mutations convergentes ne sont pas uniformément réparties dans le gène env, mais sont plus susceptibles de se produire dans gp41, ce qui suggère que les mutations convergentes offrent un avantage sélectif et sont donc sélectionnées positivement. En revanche, les mutations qui ne se trouvent que dans une population VIH-1 d’un seul individu sont significativement affectées par la sélection purificatrice. Notre analyse suggère que les comparaisons entre mutations convergentes et privées avec des modèles neutres nous permettent d'identifier une sélection positive et négative dans les petits génomes viraux. Nos résultats montrent également que la sélection façonne de manière significative les populations de VIH-1 avant même l'apparition du système immunitaire adaptatif.

VIH-1, évolution parallèle, évolution convergente, sélection

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

तीव्र एचआईवी-1 संक्रमण के दौरान चयन के लिए एक संकेतक के रूप में अभिसरण विकास

अभिसरण विकास विभिन्न आबादी द्वारा समान फेनोटाइप या जीनोटाइप प्राप्त करने की प्रक्रिया का वर्णन करता है। बड़े जीनोम वाले जटिल जीव शायद ही कभी और केवल बहुत मजबूत चयन के तहत एक ही जीनोटाइप में परिवर्तित होते हैं। इसके विपरीत, बहुत छोटे जीनोम वाली स्वतंत्र वायरस आबादी अक्सर समान उत्परिवर्तन प्राप्त करती है। यहां हम इस परिकल्पना का परीक्षण करते हैं कि क्या प्रारंभिक एचआईवी-1 संक्रमण में अभिसरण चयन के लिए एक संकेतक के रूप में काम करने के लिए पर्याप्त सामान्य है। इस प्रयोजन के लिए, हम प्रारंभिक संक्रमण के दौरान एचआईवी-1 संक्रमित व्यक्तियों से लिए गए पूर्ण लंबाई वाले एचआईवी-1 एनवी जीन के अच्छी तरह से अध्ययन किए गए डेटासेट में अभिसरण उत्परिवर्तन की संख्या को मापते हैं। हम इस डेटा की तुलना एक तटस्थ मॉडल से करते हैं और अभिसरण उत्परिवर्तन की अधिकता पाते हैं। अभिसरण उत्परिवर्तन एनवी जीन में समान रूप से वितरित नहीं होते हैं, लेकिन जीपी41 में होने की अधिक संभावना होती है, जो बताता है कि अभिसरण उत्परिवर्तन एक चयनात्मक लाभ प्रदान करते हैं और इसलिए सकारात्मक रूप से चुने जाते हैं। इसके विपरीत, उत्परिवर्तन जो केवल एक ही व्यक्ति की एचआईवी-1 आबादी में पाए जाते हैं, शुद्धिकरण चयन से महत्वपूर्ण रूप से प्रभावित होते हैं। हमारा विश्लेषण बताता है कि तटस्थ मॉडल के साथ अभिसरण और निजी उत्परिवर्तन के बीच तुलना हमें छोटे वायरल जीनोम में सकारात्मक और नकारात्मक चयन की पहचान करने की अनुमति देती है। हमारे नतीजे यह भी दिखाते हैं कि चयन अनुकूली प्रतिरक्षा प्रणाली की शुरुआत से पहले ही एचआईवी-1 आबादी को महत्वपूर्ण रूप से आकार देता है।

एचआईवी-1, समानांतर विकास, अभिसरण विकास, चयन

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

急性HIV-1感染時の選択指標としての収斂進化

収斂進化とは、異なる集団が類似の表現型または遺伝子型を獲得するプロセスを指します。大きなゲノムを持つ複雑な生物が同じ遺伝子型に収束することは非常にまれであり、非常に強力な選択の下でのみ行われます。対照的に、ゲノムが非常に小さい独立したウイルス集団は、同一の変異を獲得することがよくあります。ここでは、初期の HIV-1 感染における収束が選択の指標として機能するほど一般的であるかどうかという仮説を検証します。この目的を達成するために、感染初期に HIV-1 感染者からサンプリングされた全長 HIV-1 env 遺伝子のよく研究されたデータセットにおける収束突然変異の数を測定します。このデータを中立モデルと比較し、過剰な収束突然変異を見つけます。収束突然変異は env 遺伝子全体に均等に分布しているわけではありませんが、gp41 で発生する可能性が高く、これは収束突然変異が選択上の利点を提供し、したがって積極的に選択されることを示唆しています。対照的に、単一個人の HIV-1 集団内でのみ見つかる変異は、精製選択によって大きな影響を受けます。私たちの分析は、収束突然変異とプライベート突然変異を中立モデルと比較することで、小さなウイルスゲノムにおけるポジティブ選択とネガティブ選択を識別できることを示唆しています。私たちの結果は、適応免疫系が始まる前であっても、選択によって HIV-1 集団が大きく形成されることも示しています。

HIV-1、並行進化、収束進化、選択

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Evolução convergente como indicador de seleção durante infecção aguda por HIV-1

A evolução convergente descreve o processo em que diferentes populações adquirem fenótipos ou genótipos semelhantes. Organismos complexos com genomas grandes raramente e apenas sob seleção muito forte convergem para o mesmo genótipo. Em contraste, populações de vírus independentes com genomas muito pequenos adquirem frequentemente mutações idênticas. Aqui testamos a hipótese de se a convergência na infecção precoce pelo VIH-1 é suficientemente comum para servir como um indicador de selecção. Para este fim, medimos o número de mutações convergentes em um conjunto de dados bem estudado de genes env completos do HIV-1, amostrados de indivíduos infectados pelo HIV-1 durante a infecção inicial. Comparamos esses dados com um modelo neutro e encontramos um excesso de mutações convergentes. Mutações convergentes não são distribuídas uniformemente pelo gene env, mas são mais prováveis de ocorrer em gp41, o que sugere que mutações convergentes fornecem uma vantagem seletiva e, portanto, são selecionadas positivamente. Em contraste, as mutações que são encontradas apenas numa população de VIH-1 de um único indivíduo são significativamente afectadas pela selecção purificadora. Nossa análise sugere que comparações entre mutações convergentes e privadas com modelos neutros nos permitem identificar seleção positiva e negativa em pequenos genomas virais. Nossos resultados também mostram que a seleção molda significativamente as populações de HIV-1, mesmo antes do início do sistema imunológico adaptativo.

HIV-1, evolução paralela, evolução convergente, seleção

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Конвергентная эволюция как индикатор отбора при острой инфекции ВИЧ-1

Конвергентная эволюция описывает процесс приобретения различными популяциями сходных фенотипов или генотипов. Сложные организмы с большими геномами лишь в редких случаях и только при очень сильном отборе сходятся к одному и тому же генотипу. Напротив, независимые популяции вирусов с очень маленькими геномами часто приобретают идентичные мутации. Здесь мы проверяем гипотезу о том, является ли конвергенция на ранних стадиях заражения ВИЧ-1 достаточно распространенной, чтобы служить индикатором для отбора. С этой целью мы измеряем количество конвергентных мутаций в хорошо изученном наборе данных полноразмерных генов env ВИЧ-1, отобранных у людей, инфицированных ВИЧ-1, на ранней стадии заражения. Мы сравниваем эти данные с нейтральной моделью и обнаруживаем избыток конвергентных мутаций. Конвергентные мутации неравномерно распределены по гену env, но с большей вероятностью встречаются в gp41, что позволяет предположить, что конвергентные мутации обеспечивают селективное преимущество и, следовательно, отбираются положительно. Напротив, мутации, которые обнаруживаются только в популяции ВИЧ-1 у одного человека, подвергаются значительному влиянию очищающего отбора. Наш анализ показывает, что сравнение конвергентных и частных мутаций с нейтральными моделями позволяет нам выявить положительный и отрицательный отбор в небольших вирусных геномах. Наши результаты также показывают, что отбор существенно формирует популяцию ВИЧ-1 еще до появления адаптивной иммунной системы.

ВИЧ-1, параллельная эволюция, конвергентная эволюция, селекция

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

趋同进化作为急性 HIV-1 感染期间选择的指标

趋同进化描述了不同种群获得相似表型或基因型的过程。具有大基因组的复杂生物体很少并且只有在非常强的选择下才会趋于相同的基因型。相比之下，具有非常小的基因组的独立病毒群体通常会获得相同的突变。在这里，我们检验了早期 HIV-1 感染的收敛是否足够普遍以作为选择指标的假设。为此，我们在经过充分研究的全长 HIV-1 env 基因数据集中测量了趋同突变的数量，该数据集是从早期感染期间的 HIV-1 感染者中采样的。我们将这些数据与中性模型进行比较，发现过量的趋同突变。趋同突变在 env 基因中的分布并不均匀，但更有可能发生在 gp41 中，这表明趋同突变提供了选择优势，因此被积极选择。相比之下，仅在单个个体的 HIV-1 群体中发现的突变会受到纯化选择的显着影响。我们的分析表明，通过中性模型比较趋同突变和单独突变，使我们能够识别小病毒基因组中的正选择和负选择。我们的结果还表明，甚至在适应性免疫系统出现之前，选择就显着塑造了 HIV-1 群体。

HIV-1，平行进化，趋同进化，选择

Submission: posted 26 July 2017
Recommendation: posted 23 October 2018, validated 21 November 2018

Cite this recommendation as:
Achaz, G. (2018) Is convergence an evidence for positive selection?. Peer Community in Evolutionary Biology, 100060. https://doi.org/10.24072/pci.evolbiol.100060

Recommendation

The preprint by Bertels et al. [1] reports an interesting application of the well-accepted idea that positively selected traits (here variants) can appear several times independently; think about the textbook examples of flight capacity. Hence, the authors assume that reciprocally convergence implies positive selection. The methodology becomes then, in principle, straightforward as one can simply count variants in independent datasets to detect convergent mutations.
In this preprint, the authors have applied this counting strategy on 95 available sequence alignments of the env gene of HIV-1 [2,3] that corresponds to samples taken in different patients during the early phase of infection, at the very beginning of the onset of the immune system. They have compared the number and nature of the convergent mutations to a "neutral" model that assumes (a) a uniform distribution of mutations and (b) a substitution matrix estimated from the data. They show that there is an excess of convergent mutations when compared to the “neutral” expectations, especially for mutations that have arisen in 4+ patients. They also show that the gp41 gene is enriched in these convergent mutations. The authors then discuss in length the potential artifacts that could have given rise to the observed pattern.
I think that this preprint is remarkable in the proposed methodology. Samples are taken in different individuals, whose viral populations were founded by a single particle. Thus, there is no need for phylogenetic reconstruction of ancestral states that is the typical first step of trait convergent analyses. It simply becomes counting variants. This simple counting procedure needs nonetheless to be compared to a “neutral” expectation (a reference model), which includes the mutational process. In this article, the poor predictions of a specifically designed reference model is interpreted as an evidence for positive selection.
Whether the few mutations that are convergent in 4-7 samples out of 95 were selected or not is hard to assess with certainty. The authors have provided good evidence that they are, but only experimental validations will strengthen the claim. Nonetheless, beyond a definitive clue to the implication of selection on these particular mutations, I found the methodological strategy and the discussions on the potential biases highly stimulating. This article is an excellent starting point for further methodological developments that could be then followed by large-scale analyses of convergence in many different organisms and case studies.

References

[1] Bertels, F., Metzner, K. J., & Regoes R. R. (2018). Convergent evolution as an indicator for selection during acute HIV-1 infection. BioRxiv, 168260, ver. 4 peer-reviewed and recommended by PCI Evol Biol. doi: 10.1101/168260
[2] Keele, B. F., Giorgi, E. E., Salazar-Gonzalez, J. F., Decker, J. M., Pham, K.T., Salazar, M. G., Sun, C., Grayson, T., Wang, S., Li, H. et al. (2008). Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci USA 105: 7552–7557. doi: 10.1073/pnas.0802203105
[3] Li, H., Bar, K. J., Wang, S., Decker, J. M., Chen, Y., Sun, C., Salazar-Gonzalez, J.F., Salazar, M.G., Learn, G.H., Morgan, C. J. et al. (2010). High multiplicity infection by HIV-1 in men who have sex with men. PLoS Pathogens 6:e1000890. doi: 10.1371/journal.ppat.1000890

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Reviews

Evaluation round #2

DOI or URL of the preprint: 10.1101/168260

Version of the preprint: 2

Author's Reply, 01 Oct 2018

Download author's reply https://doi.org/10.24072/pci.evolbiol.100100.ar2

Decision by Guillaume Achaz, posted 01 Oct 2018

The revised version by Bertels et al. shows a considerable improvement when compared to the previous version. It has a better flow and is much easier to read. For this, I would like to congratulate the authors for the effort and work they have put in this revised version. This was worth it. The first reviewer has no further major comment but the second reviewer (reviewer 3 of the previous version) is still unconvinced by the conclusions. I have to confess that I am still myself unsure that the patterns reported here constitute strong support for selective effects, although they can be considered as good clues. I however found that the approach proposed here is clever and is worth delivering to the community. Thus, I think that on top of the major improvements the authors have made so far, some extra work (mostly on writing) is still needed before I can recommend this preprint.

While revising this ms, please keep in mind that:

The indication for the implication of selection is still weak. Thus I would suggest the authors to lower the strength of their claim. Keep in mind that the indisputable pattern you describe here is that your null model does not fit. Rejecting H0 may have other causes than selection.
The second reviewer rightly points at a confusing argument on the effect of purifying selection (par L215-228). The same pattern (a positive correlation with diversity) is interpreted in one hand as an effect of positive selection for the convergent mutations and at the other hand as an effect of purifying selection for the private ones. I recommend caution.

Personal suggestion for improvement:

To assess the independence between the mutations (current rev 2), the authors could first test for recombination (using 4-gamates like test or decay in LD or any \rho estimation method) and, if no recombination, built phylogenetic trees with ancestral states reconstruction for each sample (and even use the MRCA sequence to orientate if they include an outgroup). They could then see whether convergent mutations occurred 1 or several times in the samples and eventually test if they hitchhike on each other (please take this only as a suggestion, not as mandatory extra work).
The remark of the ex-reviewer 2 of the previous version is still valid. Why 10/11 of the non-synonymous convergent mutations are either G->A or A->G. It deserves at least to be reported in the results and discussed in the article. Do you observe the same for the synonymous convergent mutations? If you would assess the expected number of convergent mutations by types of mutations (and not globally) is this still very unlikely?
The level-off of the decline reported for Figure 1 may be slightly overclaimed (L120). This is based on 11 mutations that cannot be below 1 (while the null model can go well below 1). What do you observe for the synonymous convergent mutations?
The paragraph L382-L388 needs to clarified.

On a didactic level A Black&White version of this ms is almost impossible to follow as the colors on the plots look identical. May I suggest that you use filled and empty circles and dashed, pointed and continuous lines on top of the colors (if you like colors) in all figures? Another possibility is to use dark vs light colors.

Typos: - L43: remove 'will' to change the sentence into present time - L411: positions -> position (delete the 's')

To conclude, I think this ms is evolving in a right direction although it still deserves some extra work. I almost convinced that the next version will be ripe for recommendation. Take all the suggestions of the reviewers as constructive feedbacks (or genuine incomprehensions) and include a point by point response to all comments along with your next version.

https://doi.org/10.24072/pci.evolbiol.100100.d2

Reviewed by Jeffrey Townsend, 02 Jul 2018

I am satisfied by the comprehensive revisions as performed. A few minor points for consideration:

1) It looks like only the maximum likelihood "model selection" clusters from MACML have been used / displayed. Model selection (linear hot/cold cluster detection) appears to have been informative in this way, but if it was not examined already it is worth mentioning that it may be illuminating to use the computationally intensive model averaging (over "hot" and "cold" spots hierarchically detected) to provide a pseudo-continuous profile of clustering across sites. See flag -m in the MACML user manual.

2) line 36, no "," after "are"

3) line 60, needs "," after "load"

https://doi.org/10.24072/pci.evolbiol.100100.rev21

Evaluation round #1

DOI or URL of the preprint: 10.1101/168260

Version of the preprint: 1

Author's Reply, 22 Jun 2018

Download author's reply https://doi.org/10.24072/pci.evolbiol.100100.ar1

Decision by Guillaume Achaz, posted 22 Jun 2018

The ms by Bertels et al. has been reviewed by three independent experts in population genetics and molecular evolution. All three reviewers found that this ms has a good potential but also raised important points that need to be addressed before it can be recommended by PCI Evol Biol. Reviewers 1 and 2 suggested several articles that the authors must read and potentially include as references in their revised version. Reviewers 2 and 3 were convinced that the convergence approach is interesting but at the same time show some concerns on the power and the reliability of the method. I also agree with reviewer 3 that this study should not be oversold, as results are not extremely robust as they are.

Please address carefully all points raised by the reviewers and revise you manuscript accordingly. A point by point response to their comments must be included along with your revised version of the ms.

https://doi.org/10.24072/pci.evolbiol.100100.d1

Reviewed by Jeffrey Townsend, 28 Nov 2017

Download the review https://doi.org/10.24072/pci.evolbiol.100100.rev11

Reviewed by anonymous reviewer 2, 28 Nov 2017

The ms by Bertels et al. reports an analysis of nucleotide convergence pattern in HIV. It reads well, is mostly sound and quite easy to follow. I only however few remarks that could potentially help improving its content.

:: Major ::

Although the authors demonstrate clearly that some positions have mutated several times independently in different patients, I am not convinced this is really due to selection. One important part of the puzzle (that is never discussed) is the type of mutations the authors have found independently repeated. A summary table listing all types recurrent mutations (i.e. the type of nucleotide change) is required in the main text. As they are mostly G->A mutations (Table S1), this is suspicious as HIV has a very strong mutational bias in that direction. It would be much more convincing to find that the apparently selected mutations are not all of the same nature. If I understood Table S1, all are G->A or A->G, but the latter could simply be mis-oriented mutations (see the minor points below).

I am not sure how to interpret biologically the value of H. H mixes drift, selection and recurrent mutations. Some other metrics such as the number of alleles (2, 3 or 4) are more directly measuring the number of mutations at a site.

As a general comment, I think there is room for improvement in the general flow of the article. While reading it few times, I am still confused about the statements. Casual readers could easily get lost.

The weak overlap between the author list of potentially selected mutations with the one from Wood et al. can suggest that the data are quite noisy and the overall power of the method(s) are simply quite weak. Although I believe this was a clever method, more discussion on this point (limitations of the method) would be welcome.

:: Minor ::

l142 - why did the authors chose to report only the results for >= 3 populations ? What about providing the full distribution ? Can the authors also give the raw number (and not only the %). Furthermore, although this is statistically significant, it leaves 30% that are outside the gene. This cast doubts on the strength of the reported pattern.

l70 - the dN/dS strategy would also work if selection affect less dS than dN, not necessarily that dS has to be immune to selection.

l303-307 - the ancestral sequence is not always the consensus. Mutations could simply reach high-frequency. This is true even in the standard neutral model (see expectation of the unfolded SFS). So I guess the direction of mutations may be unsure and therefore authors may want to pool symmetrical mutations (i.e. G->A with A->G and if mutations can occur in the two strands also with C<->T).

l309-315 - why doing an alignment with a reference sequence ? (and not only all sequences together without the ref). This seems odd.

l348 - Did you consider using an entropy between 0 and 1 instead of [0,2] ? You would need to use log4 instead of log2. Eventually, you could change the base of the log depending on the number of alleles.

l220-l223 - Please clarify as it is slightly confusing as it is.

https://doi.org/10.24072/pci.evolbiol.100100.rev12