Close printable page

Recommendation

E5, the third oncogene of Papillomavirus

Hirohisa Kishino based on reviews by Leonardo de Oliveira Martins and 1 anonymous reviewer

A recommendation of:

Genome plasticity in Papillomaviruses and de novo emergence of E5 oncogenes

Anouk Willemsen, Marta Félez-Sánchez, and Ignacio G. Bravo (2019), bioRxiv, 337477, ver. 3 peer-reviewed and recommended by Peer Community in Evolutionary Biology https://doi.org/10.1101/337477

Read preprint in preprint server Now published in a journal

Data used for results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Genome plasticity in Papillomaviruses and de novo emergence of E5 oncogenes

The clinical presentations of papillomavirus (PV) infections come in many different flavors. While most PVs are part of a healthy skin microbiota and are not associated to physical lesions, other PVs cause benign lesions, and only a handful of PVs are associated to malignant transformations linked to the specific activities of the E5, E6 and E7 oncogenes. The functions and origin of E5 remain to be elucidated. These E5 ORFs are present in the genomes of a few polyphyletic PV lineages, located between the early and the late viral gene cassettes. We have computationally assessed whether these E5 ORFs have a common origin and whether they display the properties of a genuine gene. Our results suggest that during the evolution of Papillomaviridae, at least four events lead to the presence of a long non-coding DNA stretch between the E2 and the L2 genes. In three of these events, the novel regions evolved coding capacity, becoming the extant E5 ORFs. We then focused on the evolution of the E5 genes in AlphaPVs infecting humans. The sharp match between the type of E5 protein encoded in AlphaPVs and the infection phenotype (cutaneous warts, genital warts or anogenital cancers) supports the role of E5 in the differential oncogenic potential of these PVs. In our analyses, the best-supported scenario is that the five types of extant E5 proteins within the AlphaPV genomes may not have a common ancestor. However, the chemical similarities between E5s regarding amino acid composition prevent us from confidently rejecting the model of a common origin. Our evolutionary interpretation is that an originally non-coding region entered the genome of the ancestral AlphaPVs. This genetic novelty allowed to explore novel transcription potential, triggering an adaptive radiation that yielded three main viral lineages encoding for different E5 proteins, and that display distinct infection phenotypes. Overall, our results provide an evolutionary scenario for the de novo emergence of viral genes and illustrate the impact of such genotypic novelty in the phenotypic diversity of the viral infections.

oncogenes, virus evolution, papillomavirus, genome evolution

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

اللدونة الجينومية في فيروسات الورم الحليمي وظهور الجينات المسرطنة E5

تأتي المظاهر السريرية لعدوى فيروس الورم الحليمي (PV) بعدة نكهات مختلفة. في حين أن معظم الخلايا الكهروضوئية هي جزء من الكائنات الحية الدقيقة الجلدية الصحية ولا ترتبط بالآفات الجسدية، فإن الخلايا الكهروضوئية الأخرى تسبب آفات حميدة، ويرتبط عدد قليل فقط من الخلايا الكهروضوئية بالتحولات الخبيثة المرتبطة بالأنشطة المحددة للجينات المسرطنة E5 وE6 وE7. لا يزال يتعين توضيح وظائف وأصل E5. توجد هذه ORFs E5 في جينومات عدد قليل من السلالات الكهروضوئية متعددة العرق، الموجودة بين أشرطة الجينات الفيروسية المبكرة والمتأخرة. لقد قمنا بتقييم حسابيًا ما إذا كانت هذه ORFs E5 لها أصل مشترك وما إذا كانت تعرض خصائص الجين الحقيقي. تشير نتائجنا إلى أنه خلال تطور فيروسات الورم الحليمي، أدت أربعة أحداث على الأقل إلى وجود امتداد طويل من الحمض النووي غير المشفر بين جينات E2 وL2. في ثلاثة من هذه الأحداث، طورت المناطق الجديدة قدرة التشفير، لتصبح E5 ORFs الموجودة. ثم ركزنا بعد ذلك على تطور جينات E5 في AlphaPVs التي تصيب البشر. إن التطابق الحاد بين نوع بروتين E5 المشفر في AlphaPVs والنمط الظاهري للعدوى (الثآليل الجلدية أو الثآليل التناسلية أو سرطانات الشرج التناسلي) يدعم دور E5 في الإمكانات التفاضلية المسببة للسرطان لهذه PVs. في تحليلاتنا، السيناريو الأكثر دعمًا هو أن الأنواع الخمسة من بروتينات E5 الموجودة داخل جينومات AlphaPV قد لا يكون لها سلف مشترك. ومع ذلك، فإن التشابه الكيميائي بين E5s فيما يتعلق بتركيب الأحماض الأمينية يمنعنا من رفض نموذج الأصل المشترك بثقة. تفسيرنا التطوري هو أن المنطقة غير المشفرة في الأصل دخلت جينوم AlphaPVs السلفي. سمحت هذه الحداثة الجينية باستكشاف إمكانات النسخ الجديدة، مما أدى إلى إطلاق إشعاع تكيفي أنتج ثلاثة سلالات فيروسية رئيسية مشفرة لبروتينات E5 مختلفة، والتي تعرض أنماطًا ظاهرية مميزة للعدوى. بشكل عام، توفر نتائجنا سيناريو تطوريًا لظهور الجينات الفيروسية من جديد وتوضح تأثير هذه الحداثة الجينية في التنوع المظهري للعدوى الفيروسية.

الجينات المسرطنة، تطور الفيروس، فيروس الورم الحليمي، تطور الجينوم

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Plasticidad del genoma en papilomavirus y aparición de novo de oncogenes E5

Las presentaciones clínicas de las infecciones por el virus del papiloma (PV) son de muchas formas diferentes. Si bien la mayoría de las VP son parte de una microbiota cutánea sana y no están asociadas a lesiones físicas, otras VP causan lesiones benignas y sólo un puñado de VP están asociadas a transformaciones malignas relacionadas con las actividades específicas de los oncogenes E5, E6 y E7. Las funciones y el origen del E5 aún están por dilucidar. Estos ORF E5 están presentes en los genomas de algunos linajes de PV polifiléticos, ubicados entre los casetes de genes virales tempranos y tardíos. Hemos evaluado computacionalmente si estos ORF E5 tienen un origen común y si muestran las propiedades de un gen genuino. Nuestros resultados sugieren que durante la evolución de Papillomaviridae, al menos cuatro eventos conducen a la presencia de un largo tramo de ADN no codificante entre los genes E2 y L2. En tres de estos eventos, las nuevas regiones desarrollaron capacidad de codificación, convirtiéndose en los ORF E5 existentes. Luego nos centramos en la evolución de los genes E5 en AlphaPV que infectan a humanos. La fuerte coincidencia entre el tipo de proteína E5 codificada en AlphaPV y el fenotipo de la infección (verrugas cutáneas, verrugas genitales o cánceres anogenitales) respalda el papel de E5 en el potencial oncogénico diferencial de estos PV. En nuestros análisis, el escenario mejor respaldado es que los cinco tipos de proteínas E5 existentes dentro de los genomas de AlphaPV pueden no tener un ancestro común. Sin embargo, las similitudes químicas entre los E5 con respecto a la composición de aminoácidos nos impiden rechazar con seguridad el modelo de un origen común. Nuestra interpretación evolutiva es que una región originalmente no codificante entró en el genoma de los AlphaPV ancestrales. Esta novedad genética permitió explorar un nuevo potencial de transcripción, desencadenando una radiación adaptativa que produjo tres linajes virales principales que codifican diferentes proteínas E5 y que muestran distintos fenotipos de infección. En general, nuestros resultados proporcionan un escenario evolutivo para la aparición de novo de genes virales e ilustran el impacto de dicha novedad genotípica en la diversidad fenotípica de las infecciones virales.

oncogenes, evolución del virus, virus del papiloma, evolución del genoma

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Plasticité du génome des papillomavirus et émergence de novo d'oncogènes E5

Les présentations cliniques des infections à papillomavirus (PV) se présentent sous de nombreuses formes différentes. Si la plupart des PV font partie d’un microbiote cutané sain et ne sont pas associés à des lésions physiques, d’autres PV provoquent des lésions bénignes, et seule une poignée de PV sont associées à des transformations malignes liées aux activités spécifiques des oncogènes E5, E6 et E7. Les fonctions et l'origine de l'E5 restent à élucider. Ces ORF E5 sont présents dans les génomes de quelques lignées polyphylétiques PV, situées entre les cassettes de gènes viraux précoces et tardives. Nous avons évalué informatiquement si ces ORF E5 ont une origine commune et s'ils présentent les propriétés d'un véritable gène. Nos résultats suggèrent qu'au cours de l'évolution des Papillomaviridae, au moins quatre événements conduisent à la présence d'une longue séquence d'ADN non codant entre les gènes E2 et L2. Dans trois de ces événements, les nouvelles régions ont développé leur capacité de codage, devenant ainsi les ORF E5 existants. Nous nous sommes ensuite concentrés sur l’évolution des gènes E5 chez les AlphaPV infectant l’homme. La correspondance nette entre le type de protéine E5 codée dans les AlphaPV et le phénotype de l'infection (verrues cutanées, verrues génitales ou cancers anogénitaux) conforte le rôle de l'E5 dans le potentiel oncogène différentiel de ces PV. Dans nos analyses, le scénario le mieux étayé est que les cinq types de protéines E5 existantes dans les génomes AlphaPV pourraient ne pas avoir d’ancêtre commun. Cependant, les similitudes chimiques entre les E5 concernant la composition en acides aminés nous empêchent de rejeter avec confiance le modèle d’une origine commune. Notre interprétation évolutive est qu'une région initialement non codante est entrée dans le génome des AlphaPV ancestraux. Cette nouveauté génétique a permis d'explorer un nouveau potentiel de transcription, déclenchant un rayonnement adaptatif qui a donné naissance à trois lignées virales principales codant pour différentes protéines E5 et qui présentent des phénotypes d'infection distincts. Dans l'ensemble, nos résultats fournissent un scénario évolutif pour l'émergence de novo de gènes viraux et illustrent l'impact d'une telle nouveauté génotypique sur la diversité phénotypique des infections virales.

oncogènes, évolution du virus, papillomavirus, évolution du génome

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

पैपिलोमावायरस में जीनोम प्लास्टिसिटी और E5 ऑन्कोजीन का डे नोवो उद्भव

पेपिलोमावायरस (पीवी) संक्रमण की नैदानिक प्रस्तुतियाँ कई अलग-अलग स्वादों में आती हैं। जबकि अधिकांश पीवी स्वस्थ त्वचा माइक्रोबायोटा का हिस्सा हैं और शारीरिक घावों से जुड़े नहीं हैं, अन्य पीवी सौम्य घावों का कारण बनते हैं, और केवल कुछ मुट्ठी भर पीवी ई5, ई6 और ई7 ऑन्कोजीन की विशिष्ट गतिविधियों से जुड़े घातक परिवर्तनों से जुड़े होते हैं। E5 के कार्य और उत्पत्ति को स्पष्ट किया जाना बाकी है। ये ई5 ओआरएफ कुछ पॉलीफाइलेटिक पीवी वंशों के जीनोम में मौजूद हैं, जो प्रारंभिक और देर से वायरल जीन कैसेट के बीच स्थित हैं। हमने कम्प्यूटेशनल रूप से मूल्यांकन किया है कि क्या इन E5 ORF की एक समान उत्पत्ति है और क्या वे वास्तविक जीन के गुण प्रदर्शित करते हैं। हमारे परिणाम बताते हैं कि पैपिलोमाविरिडे के विकास के दौरान, कम से कम चार घटनाओं से ई2 और एल2 जीन के बीच एक लंबे गैर-कोडिंग डीएनए खिंचाव की उपस्थिति होती है। इनमें से तीन घटनाओं में, नवीन क्षेत्रों ने कोडिंग क्षमता विकसित की, जो मौजूदा E5 ORF बन गए। फिर हमने मनुष्यों को संक्रमित करने वाले अल्फापीवी में ई5 जीन के विकास पर ध्यान केंद्रित किया। अल्फापीवी में एन्कोड किए गए ई5 प्रोटीन के प्रकार और संक्रमण फेनोटाइप (त्वचीय मस्से, जननांग मस्से या एनोजिनिटल कैंसर) के बीच तीव्र मिलान इन पीवी की विभेदक ऑन्कोजेनिक क्षमता में ई5 की भूमिका का समर्थन करता है। हमारे विश्लेषणों में, सबसे अच्छा समर्थित परिदृश्य यह है कि अल्फापीवी जीनोम के भीतर पांच प्रकार के मौजूदा ई5 प्रोटीन का एक सामान्य पूर्वज नहीं हो सकता है। हालाँकि, अमीनो एसिड संरचना के संबंध में E5s के बीच रासायनिक समानताएं हमें एक सामान्य उत्पत्ति के मॉडल को आत्मविश्वास से अस्वीकार करने से रोकती हैं। हमारी विकासवादी व्याख्या यह है कि मूल रूप से गैर-कोडिंग क्षेत्र पैतृक अल्फापीवी के जीनोम में प्रवेश करता है। इस आनुवंशिक नवीनता ने नवीन प्रतिलेखन क्षमता का पता लगाने की अनुमति दी, जिससे एक अनुकूली विकिरण शुरू हुआ जिससे विभिन्न ई5 प्रोटीनों के लिए तीन मुख्य वायरल वंशावली एन्कोडिंग उत्पन्न हुई, और जो अलग-अलग संक्रमण फेनोटाइप प्रदर्शित करती हैं। कुल मिलाकर, हमारे परिणाम वायरल जीन के नए उद्भव के लिए एक विकासवादी परिदृश्य प्रदान करते हैं और वायरल संक्रमण की फेनोटाइपिक विविधता में इस तरह के जीनोटाइपिक नवीनता के प्रभाव को दर्शाते हैं।

ओंकोजीन, वायरस विकास, पेपिलोमावायरस, जीनोम विकास

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

パピローマウイルスのゲノム可塑性と E5 癌遺伝子の新規出現

パピローマウイルス (PV) 感染の臨床症状にはさまざまな種類があります。ほとんどの PV は健康な皮膚微生物叢の一部であり、物理的病変とは関連しませんが、他の PV は良性病変を引き起こし、少数の PV のみが E5、E6、および E7 癌遺伝子の特定の活性に関連した悪性変化に関連しています。 E5 の機能と起源はまだ解明されていません。これらの E5 ORF は、初期ウイルス遺伝子カセットと後期ウイルス遺伝子カセットの間に位置する、いくつかの多系統 PV 系統のゲノムに存在します。我々は、これらの E5 ORF に共通の起源があるかどうか、またそれらが本物の遺伝子の特性を示すかどうかをコンピューターで評価しました。我々の結果は、パピローマウイルス科の進化中に、少なくとも 4 つの出来事が、E2 遺伝子と L2 遺伝子の間にある長い非コード DNA ストレッチの存在につながることを示唆しています。これらの出来事のうち 3 つでは、新規領域がコード化能力を進化させ、現存する E5 ORF になりました。次に、ヒトに感染する AlphaPV の E5 遺伝子の進化に焦点を当てました。 AlphaPV でコードされる E5 タンパク質の種類と感染表現型 (皮膚いぼ、生殖器いぼ、または肛門生殖器がん) との厳密な一致は、これらの PV の異なる発癌能における E5 の役割を裏付けています。私たちの分析では、AlphaPV ゲノム内に現存する 5 種類の E5 タンパク質には共通の祖先が存在しない可能性があるというシナリオが最も支持されています。しかし、アミノ酸組成に関して E5 間の化学的類似性があるため、共通起源のモデルを自信を持って拒否することはできません。私たちの進化的解釈は、もともと非コード領域が祖先の AlphaPV のゲノムに入ったということです。この遺伝的新規性により、新たな転写の可能性を探ることが可能となり、異なる E5 タンパク質をコードする 3 つの主要なウイルス系統を生み出し、異なる感染表現型を示す適応放散を引き起こしました。全体として、私たちの結果はウイルス遺伝子の新規出現の進化シナリオを提供し、ウイルス感染の表現型の多様性におけるそのような遺伝子型の新規性の影響を示しています。

癌遺伝子、ウイルス進化、パピローマウイルス、ゲノム進化

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Plasticidade do genoma em papilomavírus e emergência de novo de oncogenes E5

As apresentações clínicas das infecções por papilomavírus (PV) apresentam muitos sabores diferentes. Embora a maioria dos PVs façam parte de uma microbiota cutânea saudável e não estejam associados a lesões físicas, outros PVs causam lesões benignas, e apenas um punhado de PVs estão associados a transformações malignas ligadas às atividades específicas dos oncogenes E5, E6 e E7. As funções e a origem do E5 ainda precisam ser elucidadas. Estas ORFs E5 estão presentes nos genomas de algumas linhagens polifiléticas de PV, localizadas entre os cassetes de genes virais iniciais e tardios. Avaliamos computacionalmente se essas ORFs E5 têm uma origem comum e se apresentam as propriedades de um gene genuíno. Nossos resultados sugerem que durante a evolução dos Papillomaviridae, pelo menos quatro eventos levam à presença de um longo trecho de DNA não codificante entre os genes E2 e L2. Em três destes eventos, as novas regiões desenvolveram capacidade de codificação, tornando-se as ORFs E5 existentes. Em seguida, nos concentramos na evolução dos genes E5 em AlphaPVs que infectam humanos. A nítida correspondência entre o tipo de proteína E5 codificada em AlphaPVs e o fenótipo de infecção (verrugas cutâneas, verrugas genitais ou cancros anogenitais) apoia o papel de E5 no potencial oncogénico diferencial destes PVs. Em nossas análises, o cenário mais bem fundamentado é que os cinco tipos de proteínas E5 existentes nos genomas do AlphaPV podem não ter um ancestral comum. No entanto, as semelhanças químicas entre os E5s no que diz respeito à composição de aminoácidos impedem-nos de rejeitar com segurança o modelo de origem comum. Nossa interpretação evolutiva é que uma região originalmente não codificante entrou no genoma dos AlphaPVs ancestrais. Esta novidade genética permitiu explorar um novo potencial de transcrição, desencadeando uma radiação adaptativa que produziu três linhagens virais principais que codificam diferentes proteínas E5, e que apresentam fenótipos de infecção distintos. No geral, nossos resultados fornecem um cenário evolutivo para o surgimento de novo de genes virais e ilustram o impacto dessa novidade genotípica na diversidade fenotípica das infecções virais.

oncogenes, evolução do vírus, papilomavírus, evolução do genoma

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Пластичность генома папилломавирусов и возникновение de novo онкогенов Е5

Клинические проявления папилломавирусной (ПВ) инфекции могут быть самыми разными. Хотя большинство ЛВ являются частью здоровой микробиоты кожи и не связаны с физическими повреждениями, другие ЛВ вызывают доброкачественные поражения, и лишь немногие из них связаны со злокачественными трансформациями, связанными со специфической активностью онкогенов Е5, Е6 и Е7. Функции и происхождение E5 еще предстоит выяснить. Эти ORF E5 присутствуют в геномах нескольких полифилетических линий PV, расположенных между ранними и поздними кассетами вирусных генов. Мы с помощью вычислений оценили, имеют ли эти ORF E5 общее происхождение и проявляют ли они свойства настоящего гена. Наши результаты позволяют предположить, что в ходе эволюции Papillomaviridae по крайней мере четыре события привели к наличию длинного некодирующего участка ДНК между генами E2 и L2. В трех из этих событий новые регионы развили способность кодирования, став существующими ORF E5. Затем мы сосредоточились на эволюции генов E5 у AlphaPV, заражающих людей. Точное совпадение между типом белка E5, кодируемым AlphaPV, и фенотипом инфекции (кожные бородавки, генитальные бородавки или аногенитальный рак) подтверждает роль E5 в дифференциальном онкогенном потенциале этих PV. В нашем анализе наиболее вероятным сценарием является то, что пять типов существующих белков E5 в геномах AlphaPV могут не иметь общего предка. Однако химическое сходство E5 по аминокислотному составу не позволяет нам уверенно отвергнуть модель общего происхождения. Наша эволюционная интерпретация состоит в том, что изначально некодирующая область вошла в геном предков AlphaPV. Эта генетическая новизна позволила изучить новый потенциал транскрипции, вызвав адаптивное излучение, которое привело к появлению трех основных вирусных линий, кодирующих разные белки E5 и демонстрирующих различные инфекционные фенотипы. В целом, наши результаты представляют собой эволюционный сценарий появления вирусных генов de novo и иллюстрируют влияние такой генотипической новизны на фенотипическое разнообразие вирусных инфекций.

онкогены, эволюция вирусов, папилломавирус, эволюция генома

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

乳头瘤病毒的基因组可塑性和 E5 癌基因的从头出现

乳头瘤病毒 (PV) 感染的临床表现有多种不同形式。虽然大多数 PV 是健康皮肤微生物群的一部分，与物理病变无关，但其他 PV 会引起良性病变，只有少数 PV 与与 E5、E6 和 E7 癌基因的特定活性相关的恶性转化有关。 E5的功能和起源仍有待阐明。这些E5 ORF存在于一些多系PV谱系的基因组中，位于早期和晚期病毒基因盒之间。我们通过计算评估了这些 E5 ORF 是否具有共同起源以及它们是否显示出真正基因的特性。我们的结果表明，在乳头瘤病毒科的进化过程中，至少有四个事件导致 E2 和 L2 基因之间存在长非编码 DNA 片段。在其中三个事件中，新区域进化出了编码能力，成为现存的 E5 ORF。然后我们关注感染人类的 AlphaPV 中 E5 基因的进化。 AlphaPV 中编码的 E5 蛋白类型与感染表型（皮肤疣、生殖器疣或肛门生殖器癌）之间的高度匹配支持了 E5 在这些 PV 的差异致癌潜力中的作用。在我们的分析中，最有支持的情况是 AlphaPV 基因组中现存的五种 E5 蛋白可能没有共同的祖先。然而，E5 之间在氨基酸组成方面的化学相似性使我们无法自信地拒绝共同起源的模型。我们的进化解释是，最初的非编码区域进入了祖先 AlphaPV 的基因组。这种基因新颖性允许探索新的转录潜力，触发适应性辐射，产生编码不同 E5 蛋白的三种主要病毒谱系，并表现出不同的感染表型。总的来说，我们的结果为病毒基因的从头出现提供了一个进化场景，并说明了这种基因型新颖性对病毒感染表型多样性的影响。

癌基因、病毒进化、乳头瘤病毒、基因组进化

Submission: posted 04 June 2018
Recommendation: posted 06 February 2019, validated 08 February 2019

Cite this recommendation as:
Kishino, H. (2019) E5, the third oncogene of Papillomavirus. Peer Community in Evolutionary Biology, 100067. https://doi.org/10.24072/pci.evolbiol.100067

Recommendation

Papillomaviruses (PVs) infect almost all mammals and possibly amniotes and bony fishes. While most of them have no significant effects on the hosts, some induce physical lesions. Phylogeny of PVs consists of a few crown groups [1], among which AlphaPVs that infect primates including human have been well studied. They are associated to largely different clinical manifestations: non-oncogenic PVs causing anogenital warts, oncogenic and non-oncogenic PVs causing mucosal lesions, and non-oncogenic PVs causing cutaneous warts.
The PV genome consists of a double stranded circular DNA genome, roughly organized into three parts: an early region coding for six open reading frames (ORFs: E1, E2, E4, E5, E6 and E7) involved in multiple functions including viral replication and cell transformation; a late region coding for structural proteins (L1 and L2); and a non-coding regulatory region (URR) that contains the cis-elements necessary for replication and transcription of the viral genome.
The E5, E6, and E7 are known to act as oncogenes. The E6 protein binds to the cellular p53 protein [2]. The E7 protein binds to the retinoblastoma tumor suppressor gene product, pRB [3]. However, the E5 has been poorly studied, even though a high correlation between the type of E5 protein and the infection phenotype is observed. E5s, being present on the E2/L2 intergenic region in the genomes of a few polyphyletic PV lineages, are so diverged and can only be characterized by high hydrophobicity. No similar sequences have been found in the sequence database.
Willemsen et al. [4] provide valuable evidence on the origin and evolutionary history of E5 genes and their genomic environments. First, they tested common ancestry vs independent origins [5]. Because alignment can lead to biased testing toward the hypothesis of common ancestry [6], they took full account of alignment uncertainty [7] and conducted random permutation test [8]. Although the strong chemical similarity hampered decisive conclusion on the test, they could confirm that E5 may do code proteins, and have unique evolutionary history with far different topology from the neighboring genes.
Still, there is mysteries with the origin and evolution of E5 genes. One of the largest interest may be the evolution of hydrophobicity, because it may be the main cause of variable infection phenotype. The inference has some similarity in nature with the inference of evolutionary history of G+C contents in bacterial genomes [9]. The inference may take account of possible opportunity of convergent or parallel evolution by setting an anchor to the topologies of neighboring genes.

References

[1] Bravo, I. G., & Alonso, Á. (2004). Mucosal human papillomaviruses encode four different E5 proteins whose chemistry and phylogeny correlate with malignant or benign growth. Journal of virology, 78, 13613-13626. doi: 10.1128/JVI.78.24.13613-13626.2004
[2] Werness, B. A., Levine, A. J., & Howley, P. M. (1990). Association of human papillomavirus types 16 and 18 E6 proteins with p53. Science, 248, 76-79. doi: 10.1126/science.2157286
[3] Dyson, N., Howley, P. M., Munger, K., & Harlow, E. D. (1989). The human papilloma virus-16 E7 oncoprotein is able to bind to the retinoblastoma gene product. Science, 243, 934-937. doi: 10.1126/science.2537532
[4] Willemsen, A., Félez-Sánchez, M., & Bravo, I. G. (2019). Genome plasticity in Papillomaviruses and de novo emergence of E5 oncogenes. bioRxiv, 337477, ver. 3 peer-reviewed and recommended by PCI Evol Biol. doi: 10.1101/337477
[5] Theobald, D. L. (2010). A formal test of the theory of universal common ancestry. Nature, 465, 219–222. doi: 10.1038/nature09014
[6] Yonezawa, T., & Hasegawa, M. (2010). Was the universal common ancestry proved?. Nature, 468, E9. doi: 10.1038/nature09482
[7] Redelings, B. D., & Suchard, M. A. (2005). Joint Bayesian estimation of alignment and phylogeny. Systematic biology, 54(3), 401-418. doi: 10.1080/10635150590947041
[8] de Oliveira Martins, L., & Posada, D. (2014). Testing for universal common ancestry. Systematic biology, 63(5), 838-842. doi: 10.1093/sysbio/syu041
[9] Galtier, N., & Gouy, M. (1998). Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular biology and evolution, 15(7), 871-879. doi: 10.1093/oxfordjournals.molbev.a025991

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
no declaration

Reviews

Evaluation round #2

DOI or URL of the preprint: 10.1101/337477

Version of the preprint: 2

Author's Reply, 06 Feb 2019

Dear Hirohisa Kishino,

Thank you for your revision. We have verified the legends of Figures 2 and 6, and we confirm the explanations are correct. The correspondence analysis in Figure 2 has been performed on a symmetrical matrix containing the unweighted and weighted Robinson-Foulds tree distances. The MDS in Figure 6 was performed on a matrix containing the distances in codon usage preferences for the different AlphaPV ORFS. I hope that this explanation clarified any doubts.

Kind Regards,

Anouk

https://doi.org/10.24072/pci.evolbiol.100157.ar2

Decision by Hirohisa Kishino, posted 05 Feb 2019

MDS obtains a map based on the distance matrix. Correspondence analysis obtains a map that corresponds samples and categories. Based on these properties of the methods, I am afraid that Figures 2 and 6 were obtained not by correspondence analysis and MDS respectively and but by MDS and correspondence analysis respectively. Please confirm quickly whether the explanations of these figures are correct.

https://doi.org/10.24072/pci.evolbiol.100157.d2

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.1101/337477

Version of the preprint: 1

Author's Reply, 24 Jan 2019

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.evolbiol.100157.ar1

Decision by Hirohisa Kishino, posted 16 Jul 2018

Dear Anouk Willemsen,

Thank you for submitting the manuscript to PCI Evolutionary Biology. Now, we have received comments from two reviewers. each of the reviewers' comments. Both of the reviewers appreciate the work. However, Leonardo de Oliveira Martins raised methodological concerns on the UCA test, which is a core part of the manuscript, and made a constructive suggestion. Please read the comments carefully and revise the manuscript, responding to each of them.

Sincerely yours, Hiro

https://doi.org/10.24072/pci.evolbiol.100157.d1

Reviewed by Leonardo de Oliveira Martins, 14 Jun 2018

The authors describe independent insertions of a non-coding stretch of DNA in the intergenic E2–L2 region of Papillomavirus (PV) genomes, with subsequent acquisition of coding capacity leading to clinically important novel proteins. The manuscript is well written, and describes concisely an important problem ― de novo oncogene emergence ― using an ingenious solution. At different phylogenetic scales, the authors tested explicitly if the inter–E2–L2 region (a highly variable and clinically relevant region of the circular PV genome) has a single common ancestry or appeared independently, given its low similarity and diverse composition. Within a particularly important clade (AlphaPV) they furthermore explored the genetic characteristics of the so-called E5 ORFs. Although the common ancestry hypothesis is properly addressed, I am afraid that the particular model used may not be very convincing without a few modifications. I will describe this problem in more detail below, together with a few other suggestions.

◾ In [1] we give a hint on potential complications when using Bali-phy (and Bayesian models, in general) for model selection: the difficulty in achieving convergence, and the poor quality of the marginal likelihood estimation:

Convergence: For a moderate-to-high number of sequences, special attention must be paid to convergence when using Bali-phy. This is a bit different from the author’s solution of running the BF test three times: what is needed is to check if, under each hypothesis, two or more independent runs achieve equilibrium (similar alignments, trees, LnL,...). It is not uncommon that even for a very long run the MCMC algorithm keeps trapped in a local optimum, given the complexity of the problem (assuming both the tree and the alignment are parameters).
Marginal Likelihood: The marginal likelihood calculated as through the geometric mean is known to be problematic, and the problem may not go away by multiple runs, etc. We wrote a follow-up on this UCA test later, describing a simpler way to test for common ancestry based on random permutations of the alignments [2]. Basically we reshuffle the columns of one of the clades and recalculate our “statistics” (difference in log-likelihoods under CA and IO hypotheses, tree length, or even average similarity, which doesn’t rely on tree inference).

Therefore I would like to suggest a few options that may corroborate your conclusions and help convince readers of the independent ancestry of the inter-E2–L2 regions, at your discretion:

Show the phylogeny of the inter-E2–L2 regions assuming common ancestry used in the tests, from Bali-phy or even a faster method (muscle+RAxML). If the IO hypothesis is convincing, then the branch lengths leading to each cluster C1-C5 should be quite large, compared to the other branches.
Run convergence diagnostics for each Bali-phy analysis, to make sure the posterior distributions can be trusted. You can furthermore follow the alignment size or tree size along each sampling, and compare them to an optimal estimate (muscle+RAxML).
Decrease number of sequences. This may be essential in case the Bali-phy analysis is not converging ― which may well be the case for more than a few dozen sequences. You may choose the four or five most dissimilar sequences within each clade.
Use a permutation-based test described above, from [2]. This may be faster than running Bali-phy even for for a restricted set of sequences, since you don’t need to worry about convergence.

Notice that you don’t need to add all the suggested analyses, but some further evidence for the independent origins hypothesis will be welcome.

◾ I am bit confused about the section “DNA Sequences in The inter-E2–L2 Region in AlphaPVs are Monophyletic but The E5 ORFs Therein Encoded are Not” (page 5): If E5β has an independent origin, then Cut should also be inferred as independently originated, unless they represent non-overlapping regions. Or maybe the inter-E2-L2 regions described on Table 2 exclude the E5 ORFs (the “non-coding regions” described in the discussion)? A diagram showing which regions are being included in each test, or at least a bit more info (e.g. if some sequences miss the E5 ORF, or about the non-coding regions) would help, even for the previous analysis (Table 1). Notice that this confusion may be a product of my limited knowledge of these genomes, but hopefully you can make these points clearer to other reader like me.

◾ Furthermore I have a few minor suggestions, that nonetheless can be easily addressed:

I would like to urge the authors to deposit the scripts and/or data on a publicly available repository (https://figshare.com/ or https://github.com/, for instance).
In general I missed some summary statistics about the sequence lengths and number of sequences on each analysis. Specially for the data sets subject to the common ancestry test, what is the average sequence length, and the equivalent alignment lengths (under each IO scenario and under CA)? This helps us having an idea about how the alignment optimisation may be influencing the homology assumptions, and is also helpful in interpreting the Bayes Factors (you may also describe the Bayes Factors normalised by the number of sites).
The authors may want to describe the multiple correspondence analysis in more detail ― I could not see how this method is different from, e.g., an MDS plot. Furthermore on this figure (Figure 2) I would also include the concatenate tree from Figure 1, since it is the only phylogeny actually displayed in the manuscript. In theory even IO sequences can be included, since their branch lengths would denounce the disagreement with other trees, but then a distance like the weighted RF distance or the branch score distance (https://rdrr.io/cran/phangorn/man/treedist.html) should be used.
There is a typo on second paragraph of page 5, where you write “Common Ancestry (CO)” (should it be “CA”?). The authors might even drop the acronyms since they’re not used further down the text. It seems that the acronym “MCA” is also not used and may be removed.
Bali-phy is not “under a maximum-likelihood framework” (page 2), it uses a Bayesian model.

References

[1] de Oliveira Martins, L. & Posada, D. Testing for Universal Common Ancestry. Systematic Biology 63, 838–842 (2014). http://www.ncbi.nlm.nih.gov/pubmed/24958930

[2] de Oliveira Martins, L. & Posada, D. Infinitely Long Branches and an Informal Test of Common Ancestry. Biology Direct 11 (1): 19. (2016) http://dx.doi.org/10.1186/s13062-016-0120-y

https://doi.org/10.24072/pci.evolbiol.100157.rev11

Reviewed by anonymous reviewer 1, 21 Jun 2018

This paper uses computational analysis to examine the evolution of the papillomavirus (PV) E5 ORF, which is located between the early and late region (the inter-E2-L2 region) of the PV genome. First, it provides evidence that the nucleotide sequence of the inter-E2-L2 region among the various PV types is not derived from a common ancestor. Instead, at least five independent events, one occurring for each PV clade, resulted in the insertion of this region. This implies that the E5 ORFs in the AlphaPVs (e.g., HPV16) and those of the DeltaPVs (e.g., BPV-1) are evolutionarily unrelated, consistent with the fact that the E5 proteins of HPV16 and BPV-1 share little amino acid sequence similarity except for their hydrophobicity. The authors next focused on evolution of the E5 ORFs from the AlphaPVs, which includes the HPVs. They show that while the nucleotide sequence of the inter-E2-L2 region of these PVs arose from a common ancestor, their E5 ORFs did not. Specifically, the E5 ORFs from HPVs with mucosal tropism arose separately from those with cutaneous tropism. Since the oncogenic HPVs are mucosal and not cutaneous, the independent evolution of the E5 ORF in these HPV types suggests a role for E5 in the oncogenic potential of HPVs. Finally, this paper shows the that E5 ORFs in AlphaPVs display characteristics of actual coding sequences. The authors propose that the PV E5 genes evolved by the de novo emergence of new protein-coding sequences from non-coding regions. They speculate that the independent emergence of the E5 ORFs in different HPV types occurred by random nucleotide addition and/or recombination during viral DNA synthesis to insert a noncoding sequence, followed by mutation to generate a new protein coding sequence. But, although the PV E5 genes arose independently, they all encode a small hydrophobic protein. The occurrence of multiple independent selection events for a small hydrophobic protein suggests that modulating cellular membrane proteins or the membrane environment by such a protein is important for PV fitness.

Overall, this paper provides an interesting scenario for the evolution of a diverse class of small viral transmembrane proteins and should be accepted for publication with minimal revision.

Minor corrections:

Page 10, 4th paragraph, 5th and 6th sentences should read: "Experimentally, protein structures that have not been observed in nature have been isolated and shown to have biological activity. More specifically, Chacon et al., 2014, used genetic selection to isolate small artificial transmembrane proteins modeled after the BPV-1 E5 protein but lacking any preexisting sequences."

Page 10, 5th paragraph, 4th sentence: replace "rise" with "raise"

https://doi.org/10.24072/pci.evolbiol.100157.rev12