Machine learning methods are useful for Approximate Bayesian Computation in evolution and ecology

Michael Blum ?

doi:10.24072/pci.evolbiol.100090

Machine learning methods are useful for Approximate Bayesian Computation in evolution and ecology

Michael Blum ? based on reviews by Dennis Prangle and Michael Blum ?

A recommendation of:

ABC random forests for Bayesian parameter inference

Louis Raynal, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P. Robert, Arnaud Estoup (2017), arXiv, 1605.05537v4 [stat.ME], ver. 4 peer-reviewed and recommended by Peer Community in Evolutionary Biology https://doi.org/10.48550/arXiv.1605.05537

Read preprint in preprint server Now published in a journal

Abstract

ZH-CN

Submission: posted 06 July 2017
Recommendation: posted 17 November 2017, validated 17 November 2017

Cite this recommendation as:
?, M. (2017) Machine learning methods are useful for Approximate Bayesian Computation in evolution and ecology. Peer Community in Evolutionary Biology, 100090. 10.24072/pci.evolbiol.100090

Recommendation

It is my pleasure to recommend the paper by Raynal et al. [1] about using random forest for parameter inference. There are two reviews about the paper, one review written by Dennis Prangle and another review written by myself. Both reviews were positive and included comments that have been addressed in the current version of the preprint.

The paper nicely shows that modern machine learning approaches are useful for Approximate Bayesian Computation (ABC) and more generally for simulation-driven parameter inference in ecology and evolution.

The authors propose to consider the random forest approach, proposed by Meinshausen [2] to perform quantile regression. The numerical implementation of ABC with random forest, available in the abcrf package, is based on the RANGER R package that provides a fast implementation of random forest for high-dimensional data.

According to my reading of the manuscript, there are 3 main advantages when using random forest (RF) for parameter inference with ABC. The first advantage is that RF can handle many summary statistics and that dimension reduction is not needed when using RF.

The second advantage is very nicely displayed in Figure 5, which shows the main result of the paper. If correct, 95% posterior credibility intervals (C.I.) should contain 95% of the parameter values used in simulations. Figure 5 shows that posterior C.I. obtained with rejection are too large compared to other methods. By contrast, C.I. obtained with regression methods have been shrunken. However, the shrinkage can be excessive for the smallest tolerance rates, with coverage values that can be equal to 85% instead of the expected 95% value. The attractive property of RF is that C.I. have been shrunken but the coverage is of 100% resulting in a conservative decision about parameter values.

The last advantage is that no hyperparameter should be chosen. It is a parameter free approach, which is desirable because of the potential difficulty of choosing an appropriate acceptance rate.

The main drawback of the proposed approach concerns joint parameter inference. There are many settings where the joint parameter distribution is of interest and the proposed RF approach cannot handle that. In population genetics for example, estimation of the severity and of the duration of the bottleneck should be estimated jointly because of identifiability issues. The challenge of performing joint parameter inference with RF might constitute a useful research perspective.

References

[1] Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A. 2017. ABC random forests for Bayesian parameter inference. arXiv 1605.05537v4, https://arxiv.org/pdf/1605.05537
[2] Meinshausen N. 2006. Quantile regression forests. Journal of Machine Learning Research 7: 983-999. http://www.jmlr.org/papers/v7/meinshausen06a.html

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
A.E. acknowledges financial support by the National Research Fund ANR (France) through the European Union program ERA-Net BiodivERsA (project EXOTIC) and the project ANR-16-CE02-0015-01 (SWING), and the INRA scientific department SPE (AAP-SPE 2016). C.P.R. is supported by an IUF 2016– 2021 senior grant.

Reviews

Evaluation round #1

DOI or URL of the preprint: https://arxiv.org/abs/1605.05537

Version of the preprint: 3

Author's Reply, 15 Nov 2017

Download author's reply

Decision by Michael Blum ?, posted 21 Sep 2017

Dear authors,

Dennis Prangle and myself have reviewed your paper about using random forest for parameter inference. We both are very positive about this paper and I am willing to recommand it for PCI Evol Biol pending slight or minor modifications suggested by Dennis Prangle and myself.
Looking forward receiving a revised version of this preprint.
With my best regards
Michael Blum

Reviewed by Dennis Prangle, 30 Aug 2017

Download the review

Reviewed by Michael Blum ?, 21 Sep 2017

Download the review