Machine learning methods are useful for Approximate Bayesian Computation in evolution and ecology
ABC random forests for Bayesian parameter inference
Recommendation: posted 17 November 2017, validated 17 November 2017
It is my pleasure to recommend the paper by Raynal et al.  about using random forest for parameter inference. There are two reviews about the paper, one review written by Dennis Prangle and another review written by myself. Both reviews were positive and included comments that have been addressed in the current version of the preprint.
The paper nicely shows that modern machine learning approaches are useful for Approximate Bayesian Computation (ABC) and more generally for simulation-driven parameter inference in ecology and evolution.
The authors propose to consider the random forest approach, proposed by Meinshausen  to perform quantile regression. The numerical implementation of ABC with random forest, available in the abcrf package, is based on the RANGER R package that provides a fast implementation of random forest for high-dimensional data.
According to my reading of the manuscript, there are 3 main advantages when using random forest (RF) for parameter inference with ABC. The first advantage is that RF can handle many summary statistics and that dimension reduction is not needed when using RF.
The second advantage is very nicely displayed in Figure 5, which shows the main result of the paper. If correct, 95% posterior credibility intervals (C.I.) should contain 95% of the parameter values used in simulations. Figure 5 shows that posterior C.I. obtained with rejection are too large compared to other methods. By contrast, C.I. obtained with regression methods have been shrunken. However, the shrinkage can be excessive for the smallest tolerance rates, with coverage values that can be equal to 85% instead of the expected 95% value. The attractive property of RF is that C.I. have been shrunken but the coverage is of 100% resulting in a conservative decision about parameter values.
The last advantage is that no hyperparameter should be chosen. It is a parameter free approach, which is desirable because of the potential difficulty of choosing an appropriate acceptance rate.
The main drawback of the proposed approach concerns joint parameter inference. There are many settings where the joint parameter distribution is of interest and the proposed RF approach cannot handle that. In population genetics for example, estimation of the severity and of the duration of the bottleneck should be estimated jointly because of identifiability issues. The challenge of performing joint parameter inference with RF might constitute a useful research perspective.
 Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A. 2017. ABC random forests for Bayesian parameter inference. arXiv 1605.05537v4, https://arxiv.org/pdf/1605.05537
 Meinshausen N. 2006. Quantile regression forests. Journal of Machine Learning Research 7: 983-999. http://www.jmlr.org/papers/v7/meinshausen06a.html
Michael Blum (2017) Machine learning methods are useful for Approximate Bayesian Computation in evolution and ecology. Peer Community in Evolutionary Biology, 100036. 10.24072/pci.evolbiol.100036
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #1
DOI or URL of the preprint: https://arxiv.org/abs/1605.05537
Version of the preprint: 3
Author's Reply, 15 Nov 2017
Decision by Michael Blum, posted 21 Sep 2017
Dennis Prangle and myself have reviewed your paper about using random forest for parameter inference. We both are very positive about this paper and I am willing to recommand it for PCI Evol Biol pending slight or minor modifications suggested by Dennis Prangle and myself.
Looking forward receiving a revised version of this preprint.
With my best regards