Papillomaviruses (PVs) infect almost all mammals and possibly amniotes and bony fishes. While most of them have no significant effects on the hosts, some induce physical lesions. Phylogeny of PVs consists of a few crown groups , among which AlphaPVs that infect primates including human have been well studied. They are associated to largely different clinical manifestations: non-oncogenic PVs causing anogenital warts, oncogenic and non-oncogenic PVs causing mucosal lesions, and non-oncogenic PVs causing cutaneous warts.
The PV genome consists of a double stranded circular DNA genome, roughly organized into three parts: an early region coding for six open reading frames (ORFs: E1, E2, E4, E5, E6 and E7) involved in multiple functions including viral replication and cell transformation; a late region coding for structural proteins (L1 and L2); and a non-coding regulatory region (URR) that contains the cis-elements necessary for replication and transcription of the viral genome.
The E5, E6, and E7 are known to act as oncogenes. The E6 protein binds to the cellular p53 protein . The E7 protein binds to the retinoblastoma tumor suppressor gene product, pRB . However, the E5 has been poorly studied, even though a high correlation between the type of E5 protein and the infection phenotype is observed. E5s, being present on the E2/L2 intergenic region in the genomes of a few polyphyletic PV lineages, are so diverged and can only be characterized by high hydrophobicity. No similar sequences have been found in the sequence database.
Willemsen et al.  provide valuable evidence on the origin and evolutionary history of E5 genes and their genomic environments. First, they tested common ancestry vs independent origins . Because alignment can lead to biased testing toward the hypothesis of common ancestry , they took full account of alignment uncertainty  and conducted random permutation test . Although the strong chemical similarity hampered decisive conclusion on the test, they could confirm that E5 may do code proteins, and have unique evolutionary history with far different topology from the neighboring genes.
Still, there is mysteries with the origin and evolution of E5 genes. One of the largest interest may be the evolution of hydrophobicity, because it may be the main cause of variable infection phenotype. The inference has some similarity in nature with the inference of evolutionary history of G+C contents in bacterial genomes . The inference may take account of possible opportunity of convergent or parallel evolution by setting an anchor to the topologies of neighboring genes.
 Bravo, I. G., & Alonso, Á. (2004). Mucosal human papillomaviruses encode four different E5 proteins whose chemistry and phylogeny correlate with malignant or benign growth. Journal of virology, 78, 13613-13626. doi: 10.1128/JVI.78.24.13613-13626.2004
 Werness, B. A., Levine, A. J., & Howley, P. M. (1990). Association of human papillomavirus types 16 and 18 E6 proteins with p53. Science, 248, 76-79. doi: 10.1126/science.2157286
 Dyson, N., Howley, P. M., Munger, K., & Harlow, E. D. (1989). The human papilloma virus-16 E7 oncoprotein is able to bind to the retinoblastoma gene product. Science, 243, 934-937. doi: 10.1126/science.2537532
 Willemsen, A., Félez-Sánchez, M., & Bravo, I. G. (2019). Genome plasticity in Papillomaviruses and de novo emergence of E5 oncogenes. bioRxiv, 337477, ver. 3 peer-reviewed and recommended by PCI Evol Biol. doi: 10.1101/337477
 Theobald, D. L. (2010). A formal test of the theory of universal common ancestry. Nature, 465, 219–222. doi: 10.1038/nature09014
 Yonezawa, T., & Hasegawa, M. (2010). Was the universal common ancestry proved?. Nature, 468, E9. doi: 10.1038/nature09482
 Redelings, B. D., & Suchard, M. A. (2005). Joint Bayesian estimation of alignment and phylogeny. Systematic biology, 54(3), 401-418. doi: 10.1080/10635150590947041
 de Oliveira Martins, L., & Posada, D. (2014). Testing for universal common ancestry. Systematic biology, 63(5), 838-842. doi: 10.1093/sysbio/syu041
 Galtier, N., & Gouy, M. (1998). Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular biology and evolution, 15(7), 871-879. doi: 10.1093/oxfordjournals.molbev.a025991
MDS obtains a map based on the distance matrix. Correspondence analysis obtains a map that corresponds samples and categories. Based on these properties of the methods, I am afraid that Figures 2 and 6 were obtained not by correspondence analysis and MDS respectively and but by MDS and correspondence analysis respectively. Please confirm quickly whether the explanations of these figures are correct.
Dear Anouk Willemsen,
Thank you for submitting the manuscript to PCI Evolutionary Biology. Now, we have received comments from two reviewers. each of the reviewers' comments. Both of the reviewers appreciate the work. However, Leonardo de Oliveira Martins raised methodological concerns on the UCA test, which is a core part of the manuscript, and made a constructive suggestion. Please read the comments carefully and revise the manuscript, responding to each of them.
Sincerely yours, Hiro