The preprint by Bertels et al.  reports an interesting application of the well-accepted idea that positively selected traits (here variants) can appear several times independently; think about the textbook examples of flight capacity. Hence, the authors assume that reciprocally convergence implies positive selection. The methodology becomes then, in principle, straightforward as one can simply count variants in independent datasets to detect convergent mutations.
In this preprint, the authors have applied this counting strategy on 95 available sequence alignments of the env gene of HIV-1 [2,3] that corresponds to samples taken in different patients during the early phase of infection, at the very beginning of the onset of the immune system. They have compared the number and nature of the convergent mutations to a "neutral" model that assumes (a) a uniform distribution of mutations and (b) a substitution matrix estimated from the data. They show that there is an excess of convergent mutations when compared to the “neutral” expectations, especially for mutations that have arisen in 4+ patients. They also show that the gp41 gene is enriched in these convergent mutations. The authors then discuss in length the potential artifacts that could have given rise to the observed pattern.
I think that this preprint is remarkable in the proposed methodology. Samples are taken in different individuals, whose viral populations were founded by a single particle. Thus, there is no need for phylogenetic reconstruction of ancestral states that is the typical first step of trait convergent analyses. It simply becomes counting variants. This simple counting procedure needs nonetheless to be compared to a “neutral” expectation (a reference model), which includes the mutational process. In this article, the poor predictions of a specifically designed reference model is interpreted as an evidence for positive selection.
Whether the few mutations that are convergent in 4-7 samples out of 95 were selected or not is hard to assess with certainty. The authors have provided good evidence that they are, but only experimental validations will strengthen the claim. Nonetheless, beyond a definitive clue to the implication of selection on these particular mutations, I found the methodological strategy and the discussions on the potential biases highly stimulating. This article is an excellent starting point for further methodological developments that could be then followed by large-scale analyses of convergence in many different organisms and case studies.
 Bertels, F., Metzner, K. J., & Regoes R. R. (2018). Convergent evolution as an indicator for selection during acute HIV-1 infection. BioRxiv, 168260, ver. 4 peer-reviewed and recommended by PCI Evol Biol. doi: 10.1101/168260
 Keele, B. F., Giorgi, E. E., Salazar-Gonzalez, J. F., Decker, J. M., Pham, K.T., Salazar, M. G., Sun, C., Grayson, T., Wang, S., Li, H. et al. (2008). Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci USA 105: 7552–7557. doi: 10.1073/pnas.0802203105
 Li, H., Bar, K. J., Wang, S., Decker, J. M., Chen, Y., Sun, C., Salazar-Gonzalez, J.F., Salazar, M.G., Learn, G.H., Morgan, C. J. et al. (2010). High multiplicity infection by HIV-1 in men who have sex with men. PLoS Pathogens 6:e1000890. doi: 10.1371/journal.ppat.1000890
DOI or URL of the preprint: 10.1101/168260
Version of the preprint: 2
The revised version by Bertels et al. shows a considerable improvement when compared to the previous version. It has a better flow and is much easier to read. For this, I would like to congratulate the authors for the effort and work they have put in this revised version. This was worth it. The first reviewer has no further major comment but the second reviewer (reviewer 3 of the previous version) is still unconvinced by the conclusions. I have to confess that I am still myself unsure that the patterns reported here constitute strong support for selective effects, although they can be considered as good clues. I however found that the approach proposed here is clever and is worth delivering to the community. Thus, I think that on top of the major improvements the authors have made so far, some extra work (mostly on writing) is still needed before I can recommend this preprint.
While revising this ms, please keep in mind that:
Personal suggestion for improvement:
To assess the independence between the mutations (current rev 2), the authors could first test for recombination (using 4-gamates like test or decay in LD or any \rho estimation method) and, if no recombination, built phylogenetic trees with ancestral states reconstruction for each sample (and even use the MRCA sequence to orientate if they include an outgroup). They could then see whether convergent mutations occurred 1 or several times in the samples and eventually test if they hitchhike on each other (please take this only as a suggestion, not as mandatory extra work).
The remark of the ex-reviewer 2 of the previous version is still valid. Why 10/11 of the non-synonymous convergent mutations are either G->A or A->G. It deserves at least to be reported in the results and discussed in the article. Do you observe the same for the synonymous convergent mutations? If you would assess the expected number of convergent mutations by types of mutations (and not globally) is this still very unlikely?
The level-off of the decline reported for Figure 1 may be slightly overclaimed (L120). This is based on 11 mutations that cannot be below 1 (while the null model can go well below 1). What do you observe for the synonymous convergent mutations?
The paragraph L382-L388 needs to clarified.
On a didactic level A Black&White version of this ms is almost impossible to follow as the colors on the plots look identical. May I suggest that you use filled and empty circles and dashed, pointed and continuous lines on top of the colors (if you like colors) in all figures? Another possibility is to use dark vs light colors.
Typos: - L43: remove 'will' to change the sentence into present time - L411: positions -> position (delete the 's')
To conclude, I think this ms is evolving in a right direction although it still deserves some extra work. I almost convinced that the next version will be ripe for recommendation. Take all the suggestions of the reviewers as constructive feedbacks (or genuine incomprehensions) and include a point by point response to all comments along with your next version.
DOI or URL of the preprint: 10.1101/168260
Version of the preprint: 1
The ms by Bertels et al. has been reviewed by three independent experts in population genetics and molecular evolution. All three reviewers found that this ms has a good potential but also raised important points that need to be addressed before it can be recommended by PCI Evol Biol. Reviewers 1 and 2 suggested several articles that the authors must read and potentially include as references in their revised version. Reviewers 2 and 3 were convinced that the convergence approach is interesting but at the same time show some concerns on the power and the reliability of the method. I also agree with reviewer 3 that this study should not be oversold, as results are not extremely robust as they are.
Please address carefully all points raised by the reviewers and revise you manuscript accordingly. A point by point response to their comments must be included along with your revised version of the ms.