Recommendation

Fitting diversification models on undated or partially dated trees

Nicolas Lartillot based on reviews by Amaury Lambert, Dominik Schrempf and 1 anonymous reviewer

A recommendation of:

Probabilities of tree topologies with temporal constraints and diversification shifts

Gilles Didier (2019), bioRxiv, 376756, ver. 4 peer-reviewed by Peer Community in Evolutionary Biology https://doi.org/10.1101/376756

Read preprint in preprint server Now published in Peer Community Journal

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Probabilities of tree topologies with temporal constraints and diversification shifts

Dating the tree of life is a task far more complicated than only determining the evolutionary relationships between species. It is therefore of interest to develop approaches apt to deal with undated phylogenetic trees. The main result of this work is a method to compute probabilities of undated phylogenetic trees under Markovian diversification models by constraining some of the divergence times to belong to given time intervals and by allowing diversification shifts on certain clades. If the diversification models considered are lineage-homogeneous, the time complexity of this computation is quadratic with the number of species of the phylogenetic tree and linear with the number of temporal constraints. The interest of this computation method is illustrated with three applications, namely, • to compute the distribution of the divergence times of a tree topology with temporal constraints, • to directly sample the divergence times of a tree topology, and • to test for a diversification shift at a given clade.

Phylogenetics, Datation, Shift Detection, Diversification, Birth-death process

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

احتمالات طوبولوجيا الأشجار مع القيود الزمنية وتحولات التنويع

إن تحديد تاريخ شجرة الحياة هو مهمة أكثر تعقيدًا بكثير من مجرد تحديد العلاقات التطورية بين الأنواع. ولذلك فمن المهم تطوير أساليب مناسبة للتعامل مع أشجار النشوء والتطور غير المؤرخة. والنتيجة الرئيسية لهذا العمل هي طريقة لحساب احتمالات أشجار النشوء والتطور غير المؤرخة في ظل نماذج التنويع الماركوفية عن طريق تقييد بعض أوقات التباعد لتنتمي إلى فترات زمنية معينة وبالسماح بتحولات التنويع في بعض الفروع الحيوية. إذا كانت نماذج التنويع التي تم النظر فيها متجانسة من حيث النسب، فإن التعقيد الزمني لهذا الحساب يكون تربيعيًا مع عدد أنواع شجرة النشوء والتطور وخطيًا مع عدد القيود الزمنية. يتم توضيح الفائدة من طريقة الحساب هذه من خلال ثلاثة تطبيقات، وهي: • حساب توزيع أوقات التباعد لطوبولوجيا الشجرة مع القيود الزمنية، • أخذ عينات مباشرة من أوقات التباعد لطوبولوجيا الشجرة، و • اختبار التنويع التحول في فرع حيوي معين.

علم الوراثة، البيانات، كشف التحول، التنويع، عملية الولادة والوفاة

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Probabilidades de topologías de árboles con restricciones temporales y cambios de diversificación.

Fechar el árbol de la vida es una tarea mucho más complicada que simplemente determinar las relaciones evolutivas entre especies. Por lo tanto, es de interés desarrollar enfoques aptos para abordar árboles filogenéticos sin fecha. El principal resultado de este trabajo es un método para calcular probabilidades de árboles filogenéticos sin fecha bajo modelos de diversificación de Markov, restringiendo algunos de los tiempos de divergencia para que pertenezcan a intervalos de tiempo dados y permitiendo cambios de diversificación en ciertos clados. Si los modelos de diversificación considerados son homogéneos en linaje, la complejidad temporal de este cálculo es cuadrática con el número de especies del árbol filogenético y lineal con el número de restricciones temporales. El interés de este método de cálculo se ilustra con tres aplicaciones, a saber: • calcular la distribución de los tiempos de divergencia de una topología de árbol con restricciones temporales, • muestrear directamente los tiempos de divergencia de una topología de árbol, y • probar una diversificación cambio en un clado determinado.

Filogenética, Datación, Detección de Cambios, Diversificación, Proceso nacimiento-muerte

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Probabilités de topologies arborescentes avec contraintes temporelles et changements de diversification

La datation de l'arbre de vie est une tâche bien plus compliquée que la simple détermination des relations évolutives entre les espèces. Il est donc intéressant de développer des approches aptes à traiter les arbres phylogénétiques non datés. Le principal résultat de ce travail est une méthode permettant de calculer les probabilités d'arbres phylogénétiques non datés sous des modèles de diversification markoviens en contraignant certains temps de divergence à appartenir à des intervalles de temps donnés et en autorisant des changements de diversification sur certains clades. Si les modèles de diversification considérés sont homogènes en lignées, la complexité temporelle de ce calcul est quadratique avec le nombre d'espèces de l'arbre phylogénétique et linéaire avec le nombre de contraintes temporelles. L'intérêt de cette méthode de calcul est illustré par trois applications, à savoir, • calculer la distribution des temps de divergence d'une topologie arborescente avec des contraintes temporelles, • échantillonner directement les temps de divergence d'une topologie arborescente, et • tester une diversification changement à un clade donné.

Phylogénétique, Datation, Détection de décalage, Diversification, Processus naissance-mort

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

अस्थायी बाधाओं और विविधीकरण बदलावों के साथ ट्री टोपोलॉजी की संभावनाएँ

जीवन के वृक्ष का निर्धारण करना केवल प्रजातियों के बीच विकासवादी संबंधों को निर्धारित करने से कहीं अधिक जटिल कार्य है। इसलिए अदिनांकित फ़ाइलोजेनेटिक पेड़ों से निपटने के लिए उपयुक्त दृष्टिकोण विकसित करना रुचिकर है। इस कार्य का मुख्य परिणाम मार्कोवियन विविधीकरण मॉडल के तहत कुछ विचलन समय को दिए गए समय अंतराल से संबंधित करने और कुछ समूहों पर विविधीकरण बदलाव की अनुमति देकर अदिनांकित फ़ाइलोजेनेटिक पेड़ों की संभावनाओं की गणना करने की एक विधि है। यदि विचार किए गए विविधीकरण मॉडल वंश-सजातीय हैं, तो इस गणना की समय जटिलता फ़ाइलोजेनेटिक पेड़ की प्रजातियों की संख्या के साथ द्विघात और अस्थायी बाधाओं की संख्या के साथ रैखिक है। इस गणना पद्धति की रुचि को तीन अनुप्रयोगों के साथ चित्रित किया गया है, अर्थात्, अस्थायी बाधाओं के साथ एक पेड़ टोपोलॉजी के विचलन समय के वितरण की गणना करने के लिए, एक पेड़ टोपोलॉजी के विचलन समय का सीधे नमूना लेने के लिए, और विविधीकरण के लिए परीक्षण करने के लिए। किसी दिए गए क्लैड पर बदलाव करें।

फ़ाइलोजेनेटिक्स, डेटाेशन, शिफ्ट डिटेक्शन, विविधीकरण, जन्म-मृत्यु प्रक्रिया

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

時間的制約と多様化シフトを伴うツリートポロジーの確率

生命の木の年代を特定することは、種間の進化上の関係を決定するだけよりもはるかに複雑な作業です。したがって、日付のない系統樹を扱うのに適したアプローチを開発することは興味深いことです。この研究の主な成果は、分岐時間の一部を所定の時間間隔に属するように制約し、特定のクレードでの多様化シフトを許可することによって、マルコフ多様化モデルの下で日付のない系統樹の確率を計算する方法です。考慮される多様化モデルが系統的に同種である場合、この計算の時間計算量は系統樹の種の数で二次になり、時間的制約の数で線形になります。この計算方法の興味深い点は、次の 3 つのアプリケーションで説明されます。つまり、次の 3 つです。 • 時間的制約のあるツリートポロジの発散時間の分布を計算する。 • ツリートポロジの発散時間を直接サンプリングする。 • 多様化をテストする。特定のクレードでシフトします。

系統発生学、データ化、シフト検出、多様化、誕生と死亡のプロセス

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Probabilidades de topologias em árvore com restrições temporais e mudanças de diversificação

Datar a árvore da vida é uma tarefa muito mais complicada do que apenas determinar as relações evolutivas entre as espécies. É, portanto, de interesse desenvolver abordagens capazes de lidar com árvores filogenéticas não datadas. O principal resultado deste trabalho é um método para calcular probabilidades de árvores filogenéticas não datadas sob modelos de diversificação Markoviana, restringindo alguns dos tempos de divergência a pertencerem a determinados intervalos de tempo e permitindo mudanças de diversificação em certos clados. Se os modelos de diversificação considerados são homogêneos de linhagem, a complexidade temporal deste cálculo é quadrática com o número de espécies da árvore filogenética e linear com o número de restrições temporais. O interesse deste método de cálculo é ilustrado com três aplicações, a saber, • calcular a distribuição dos tempos de divergência de uma topologia em árvore com restrições temporais, • amostrar diretamente os tempos de divergência de uma topologia em árvore, e • testar uma diversificação mudança em um determinado clado.

Filogenética, Datação, Detecção de Mudanças, Diversificação, Processo de nascimento-morte

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Вероятности древовидных топологий с временными ограничениями и сдвигами диверсификации

Датирование древа жизни — задача гораздо более сложная, чем просто определение эволюционных связей между видами. Поэтому представляет интерес разработка подходов, способных работать с недатированными филогенетическими деревьями. Основным результатом этой работы является метод расчета вероятностей недатированных филогенетических деревьев в рамках марковских моделей диверсификации путем ограничения принадлежности некоторых времен дивергенции к заданным временным интервалам и путем разрешения сдвигов диверсификации в определенных кладах. Если рассматриваемые модели диверсификации однородны по происхождению, временная сложность этих вычислений квадратична в зависимости от количества видов филогенетического дерева и линейна в зависимости от количества временных ограничений. Интерес к этому методу вычислений иллюстрируется тремя приложениями, а именно: • для вычисления распределения времен расхождения древовидной топологии с временными ограничениями, • для непосредственной выборки времен расхождения древовидной топологии и • для проверки диверсификации. сдвиг в данной кладе.

9fcb38981285439папа66249495e6a05c Вероятности древовидных топологий с временными ограничениями и сдвигами диверсификации bca2474e40e445eeb66314f8d13f0806 Филогенетика, сбор данных, обнаружение сдвигов, диверсификация, процесс рождения и смерти

Филогенетика, сбор данных, обнаружение сдвигов, диверсификация, процесс рождения и смерти

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

具有时间约束和多样化变化的树拓扑的概率

确定生命之树的年代是一项比仅仅确定物种之间的进化关系复杂得多的任务。因此，开发适合处理未注明日期的系统发育树的方法是很有意义的。这项工作的主要结果是一种通过限制某些分歧时间属于给定时间间隔并允许某些进化枝上的多样化转移来计算马尔可夫多样化模型下未注明日期的系统发育树的概率的方法。如果所考虑的多样化模型是谱系同质的，则该计算的时间复杂度与系统发育树的物种数量成二次方，并且与时间约束的数量成线性。该计算方法的兴趣通过三个应用来说明，即： • 计算具有时间约束的树拓扑的发散时间的分布， • 直接对树拓扑的发散时间进行采样，以及 • 测试多样化在给定的进化枝上发生转变。

系统发育学、数据化、转变检测、多样化、生死过程

Submission: posted 30 January 2019
Recommendation: posted 09 January 2020, validated 10 January 2020

Cite this recommendation as:
Lartillot, N. (2020) Fitting diversification models on undated or partially dated trees. Peer Community in Evolutionary Biology, 100088. https://doi.org/10.24072/pci.evolbiol.100088

Recommendation

Phylogenetic trees can be used to extract information about the process of diversification that has generated them. The most common approach to conduct this inference is to rely on a likelihood, defined here as the probability of generating a dated tree T given a diversification model (e.g. a birth-death model), and then use standard maximum likelihood. This idea has been explored extensively in the context of the so-called diversification studies, with many variants for the models and for the questions being asked (diversification rates shifting at certain time points or in the ancestors of particular subclades, trait-dependent diversification rates, etc).
However, all this assumes that the dated tree T is known without error. In practice, trees (that is, both the tree topology and the divergence times) are inferred based on DNA sequences, possibly combined with fossil information for calibrating and informing the divergence times. Molecular dating is a delicate exercise, however, and much more so in fact than reconstructing the tree topology. In particular, a mis-specificied model for the relaxed molecular clock, or a mis-specifiied prior, can have a substantial impact on the estimation of divergence dates - which in turn could severely mislead the inference about the underlying diversification process. This thus raises the following question: would that be possible to conduct inference and testing of diversification models without having to go through the dangerous step of molecular dating?
In his article ""Probabilities of tree topologies with temporal constraints and diversification shifts"" [1], Gilles Didier introduces a recursive method for computing the probability of a tree topology under some diversification model of interest, without knowledge of the exact dates, but only interval constraints on the dates of some of the nodes of the tree. Such interval constraints, which are derived from fossil knowledge, are typically used for molecular dating: they provide the calibrations for the relaxed clock analysis. Thus, what is essentially proposed by Gilles Didier is to use them in combination with the tree topology only, thus bypassing the need to estimates divergence times first, before fitting a diversification model to a phylogenetic tree.
This article, which is primarily a mathematical and algorithmic contribution, is then complemented with several applications: testing for a diversification shift in a given subclade of the phylogeny, just based on the (undated) tree topology, with interval constraints on some of its internal nodes; but also, computing the age distribution of each node and sampling on the joint distribution on node ages, conditional on the interval constraints. The test for the presence of a diversification shift is particularly interesting: an application to simulated data (and without any interval constraint in that case) suggests that the method based on the undated tree performs about as well as the classical method based on a dated tree, and this, even granting the classical approach a perfect knowledge of the dates - given that, in practice, one in fact relies on potentially biased estimates. Finally, an application to a well-known example (rate shifts in cetacean phylogeny) is presented.
This article thus represents a particularly meaningful contribution to the methodology for diversification studies; but also, for molecular dating itself: it is a well known problem in molecular dating that computing and sampling from the conditional distributions on node ages, given fossil constraints, and more generally understanding and visualizing how interval constraints on some nodes of the tree impact the distribution at other nodes, is a particularly difficult exercise. For that reason, the algorithmic routines presented in the present article will be useful in this context as well.

References

[1] Didier, G. (2020) Probabilities of tree topologies with temporal constraints and diversification shifts. bioRxiv, 376756, ver. 4 peer-reviewed and recommended by PCI Evolutionary Biology. doi: 10.1101/376756

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
no declaration

Reviews

Evaluation round #2

DOI or URL of the preprint: https://doi.org/10.1101/376756

Author's Reply, 19 Dec 2019

Download author's reply

Dear Nicolas,

I uploaded the revised manuscript on BioRXiv (it is available on their site). I fixed the issues pointed out in the last review, except the remark about the paragraph at the end of page 7. Since Amaury Lambert does not plan to publish his alternative method, I briefly exposed it page 13 of the revision. The revised manuscript includes the modifications suggested in the e-mail from PCIEvolBiol (mentions of PCIEvolBiol in the aknowledgements section and in a footnote in the first page). I (temporarily) gave up using the PCI template since the combination font/text width led to issues in typesetting the formulae. Many thanks for your work.

with best regards,

Gilles

https://doi.org/10.24072/pci.evolbiol.100183.ar2

Decision by Nicolas Lartillot, posted 23 Nov 2019

Dear Gilles,

Your revised manuscript has been reviewed by Pr. Amaury Lambert. As you will see, only minor points remain to be fixed: if you could just have a look at them. In particular, I agree that it would be important to refer to the alternative method proposed by Amaury Lambert directly in the main text, referring to his review. The reviewing process is public, and thus it is probably a good thing to refer the Readers to it directly from the manuscript, so as to invite them to read it and compare the two algorithms.

We are very close to final acceptance. Once you have submitted your final version, I will proceed with the recommendation.

witth best regards,

nicolas lartillot

https://doi.org/10.24072/pci.evolbiol.100183.d2

Reviewed by Amaury Lambert, 19 Nov 2019

Download the review https://doi.org/10.24072/pci.evolbiol.100183.rev21

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.1101/376756

Author's Reply, 04 Sep 2019

Download author's reply Download tracked changes file

Dear Nicolas,

Please find joined my detailed response to the reviewers comments and the revised version with the modifications in blue color.

Best wishes,

Gilles

https://doi.org/10.24072/pci.evolbiol.100183.ar1

Decision by Nicolas Lartillot, posted 24 Apr 2019

Dear Gilles Didier,

There is a general consensus among the reviewers that this manuscript represents an important contribution in the field of diversification studies. The algorithmic and computational results are potentially useful, and their derivation is tight and rigorous.

On the other hand, there is also a general feeling that, as it stands, the manuscript is very technical and does not sufficiently emphasize the intuitions behind the mathematical developments or the potential applications to specific research questions in diversification studies. In the end, there is a legitimate concern that this highly technical presentation will make the manuscript not accessible to most readers of the targeted audience and will not do justice to the practical significance of the work.

The reviewers have made several suggestions to improve the overall presentation and make it less arduous, among which:
- getting rid of the combinatorial factors related to the labelling of the tree, by labelling it from the start;
- doing the recursion only in terms of the constraints on node ages, leaving the piece-wise constant aspect of the model hidden in the details -- in fact, the whole derivation could even be conducted under a homogeneous birth-death, then just suggesting that the calculation could be generalized to arbitrary piecewise constant. or even other time-varying, versions of the process, without major modifications.
- using both simpler and more explicit notations;
- relying a graphical example for explaining the intuition behind the quadratic recursive algorithm (e.g. continuing on the example given in figure 3).

I agree with those suggestions. I would even go further, and suggest a different way to organize the manuscript: in the main text, a more general and more intuitive description of the main algorithmic ideas could be given, relying more heavily on a graphical example such as the one given in figure 3, and leaving all technical aspects of the derivation (much of the current main text) in an appendix. Then, as suggested by one of the reviewers, more emphasis could be put on the applications. This would give the reader with two options: either a fast track (to get the general idea and appreciate the significance of the work in terms of its potential applications), or the complete story, for the more theoretically inclined readers.

The english also needs improvement.

Of note: one of the reviewers point out an alternative integration method, which might have a better complexity as a function of tree size. This should probably be examined and discussed.

Concerning the application to testing for diversification shifts, I would have some additional comments:

(1) in practice, the shift time is not known, but one may have good fossil data giving an upper and/or lower bound for the age of the last common ancestor of the subclade. Similarly, the time of origin of the entire clade is not known either, but some interval constraint derived from fossil information might be available concerning the age of the root. I was wondering if the test could be designed so as to rely on this practically more relevant fossil information instead of relying on the knowledge of the shift time (and of the time of origin, which is fixed and assumed known, right?).

(2) comparing LambdaN with LambdaP is theoretically interesting, but not so useful in practice (since exact knowledge of divergence times, such as assumed by LambdaP, is lacking). In real-world applications, one would instead want to compare LambdaN with a plug-in version of LambdaP relying on an explicit dating of the tree obtained using relaxed clock approaches. In this context, a key question is whether LambdaN shows more robustness, without loosing so much in sensitivity. This point could be discussed.

(3) ideally, a empirical example could be presented based on a previously published case (this relates to the suggestion of one of the reviewers, to put more emphasis on the applications).

Additional requirements of the managing board:

As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that (if adequate):
-Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad (to pay) or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data.
-Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused.
-Details on experimental procedures are available to readers in the text or as appendices.
-Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.”

All the best. The Managing Board of PCI Evol Biol.

https://doi.org/10.24072/pci.evolbiol.100183.d1

Reviewed by Amaury Lambert, 09 Apr 2019

Download the review https://doi.org/10.24072/pci.evolbiol.100183.rev11

Reviewed by anonymous reviewer 1, 15 Apr 2019

This manuscript focuses on the calculation of the joint probability density of a tree topology and internal node ages under the piecewise-constant birth-death and sampling model of diversification with shifts in the birth and death parameters during the course of evolution and time constraints. Being able to evaluate this density in an efficient manner is important as it is a the core of macro-evolutionary approaches that characterize the fluctuation of species diversity across taxa and time.

My comments are mainly about elements used to derive the main results and not about the main results themselves. First, I did not quite get what the times si, i=0..k exactly correspond to. They are not arbitrary values since s0 and sk are obviously not arbitrary times. When leaving s0 aside, they are neither random variables corresponding the times of sampling events or the times at which lineages die since, in Figure 1 left, there are four of these events but only two values of s (i.e., s1 and s2). Giving a precise definition for these times would help.

On page 6, the time tauni are not defined previously. Also, the relationship between the node ages and tree topology are not well defined in the current manuscript in my opinion. Indeed, a tree topology induces a partial ordering of internal node ages which interactions with the time constraints is not explicitly dealt with.

On Figure 3, my understanding is that the set of all start sets A with node b in A is {{b,a}}. I do not understand why the author considers then that the trees to the right of the summation sign represent the set of all start-sets with b in A (beside the fact that a set of start-sets is not a set of trees if I am not mistaken).

A brief illustration of how the quadratic computation works on the toy example of Figure 3 would probably be very helpful. Moreover, it was not clear to me whether the computation time would stay quadratic when increasing the number of shifts. It would be interesting to mention whether the proposed approach remains computationally efficient (or not) whenever the number of shifts increases.

On Section 7 onward, the author keeps referring to P(\tau, ... | \tau). I think the \tau on the left of the conditional should be removed. Also, corrolary 1 gives the cumulative density function for the age of a particular node. It is not obvious to me that obtaining the derivative of this function is straightforward (in order the get the pdf). I would recommend adding some explanations here. I also did not understand why birth, death and sampling parameters are considered here as three separate parameters as the birth-death-sampling process only has two identifiable parameters (see Equation 6 in Stadler, Journal of Theoretical Biology, 2009. 261: 58-66).

Moreover, it is not clear how one can derive the joint density of tree topology and all internal node ages from the results presented in this study. This joint density is needed in case one wants to use the piecewise constant birth-death-sampling model in standard phylogenetic inference using MCMC.

The caption of Figure 5 makes references to three row while only two are displayed here.

On page 15, Lemma ?? needs fixing.

https://doi.org/10.24072/pci.evolbiol.100183.rev12

Reviewed by Dominik Schrempf, 19 Feb 2019

Download the review https://doi.org/10.24072/pci.evolbiol.100183.rev13

User comments

No user comments yet

or Register
Submit a preprint