Predicting small ancestors using contemporary genomes of large mammals
Reconstruction of body mass evolution in the Cetartiodactyla and mammals using phylogenomic data
Recommendation: posted 05 December 2017, validated 05 December 2017
Recent methodological developments and increased genome sequencing efforts have introduced the tantalizing possibility of inferring ancestral phenotypes using DNA from contemporary species. One intriguing application of this idea is to exploit the apparent correlation between substitution rates and body size to infer ancestral species' body sizes using the inferred patterns of substitution rate variation among species lineages based on genomes of extant species .
The recommended paper by Figuet et al.  examines the utility of such approaches by analyzing the Cetartiodactyla, a clade of large mammals that have mostly well resolved phylogenetic relationships and a reasonably good fossil record. This combination of genomic data and fossils allows a direct comparison between body size predictions obtained from the genomic data and empirical evidence from the fossil record. If predictions seem good in groups such as the Cetartiodactyla, where there is independent evidence from the fossil record, this would increase the credibility of predictions made for species with less abundant fossils.
Figuet et al.  analyze transcriptome data for 41 species and report a significant effect of body mass on overall substitution rate, synonymous vs. non-synonymous rates, and the dynamics of GC-content, thus allowing a prediction of small ancestral body size in this group despite the fact that the extant species that were analyzed are nearly all large.
A comparative method based solely on morphology and phylogenetic relationships would be very unlikely to make such a prediction. There are many sources of uncertainty in the variables and parameters associated with these types of approaches: phylogenetic uncertainty (topology and branch lengths), uncertainty about inferred substitution rates, and so on. Although the authors do not account for all these sources of uncertainty the fact that their predicted body sizes appear sensible is encouraging and undoubtedly the methods will become more statistically sophisticated over time.
 Romiguier J, Ranwez V, Douzery EJP and Galtier N. 2013. Genomic evidence for large, long-lived ancestors to placental mammals. Molecular Biology and Evolution 30: 5–13. doi: 10.1093/molbev/mss211
 Figuet E, Ballenghien M, Lartillot N and Galtier N. 2017. Reconstruction of body mass evolution in the Cetartiodactyla and mammals using phylogenomic data. bioRxiv, ver. 3 of 4th December 2017. 139147. doi: 10.1101/139147
Bruce Rannala (2017) Predicting small ancestors using contemporary genomes of large mammals. Peer Community in Evolutionary Biology, 100042. https://doi.org/10.24072/pci.evolbiol.100042
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #1
DOI or URL of the preprint: 10.1101/139147
Version of the preprint: 1
Author's Reply, 08 Nov 2017
Decision by Bruce Rannala, posted 30 Sep 2017
Here are my comments to the authors:
This is an important paper that just needs a few minor changes/clarifications. The authors should revise according to the recommendations of the two reviewers (myself and an anonymous reviewer). In particular, the anonymous reviewer and I both had some concerns about the uncertainty of the phylogeny. I would like to see a bit more analysis to determine whether incomplete lineage sorting may be a source of phylogenetic ambiguity for these data and I would like to see the raxML tree with branch lengths included in the paper (as suggested by the anonymous reviewer). Please also respond directly to the following comments by myself and the anonymous reviewer:
Page 12, Correlations of substitution rates/ratios and LHTs: I am not familiar with the COEVOL program but if it is producing the posterior distribution of the correlation coefficient why not provide the posterior mean and credible set rather than a p-value? (which is not a very Bayesian thing to do).
Tables 1 and 2: I think the legends must be reversed.
Figure 2: Are the points on this graph mean posterior dN/dS versus log_10(BM)? This should be stated in the legend.
l88: how strong are the reported correlations ? l101: same question.
l151: I am not sure what is meaning of this sentence. Does this refer to a better reconstruction of ancestral LHT ?
l169: Are you referring to phylogenetic inertia for leaf nodes ? Meaning that there is no need to actually compute a correlation in actual species before proceeding to the inference.
l197: few words on the "home made scripts" would be welcome. How do they filter out mis-aligned regions ?
l209: any justification for the log_10 transformation ?
l261: any insight on why this index rather than any other ?
l289: Kr/Kc ratio... As it is not so standard, can you define it ?
Table 1: what about reporting the median/mean/mode ? Plots of the posterior densities would also be very informative regarding the strength and robustness of the estimations.