Genomic epidemiology seeks to better understand the transmission dynamics of infectious pathogens using molecular sequence data. Phylodynamic methods have given genomic epidemiology new power to track the transmission dynamics of pathogens by combining phylogenetic analyses with epidemiological modeling. In recent year, applications of phylodynamics to chronic viral infections such as HIV and hepatitis C virus (HVC) have provided some of the best examples of how phylodynamic inference can provide valuable insights into transmission dynamics within and between different subpopulations or risk groups, allowing for more targeted interventions.
However, conducting phylodynamic inference under complex epidemiological models comes with many challenges. In some cases, it is not always straightforward or even possible to perform likelihood-based inference. Structured SIR-type models where infected individuals can belong to different subpopulations provide a classic example. In this case, the model is both nonlinear and has a high-dimensional state space due to tracking different types of hosts. Computing the likelihood of a phylogeny under such a model involves complex numerical integration or data augmentation methods . In these situations, Approximate Bayesian Computation (ABC) provides an attractive alternative, as Bayesian inference can be performed without computing likelihoods as long as one can efficiently simulate data under the model to compare against empirical observations .
Previous work has shown how ABC approaches can be applied to fit epidemiological models to phylogenies [3,4]. Danesh et al.  further demonstrate the real world merits of ABC by fitting a structured SIR model to HCV data from Lyon, France. Using this model, they infer viral transmission dynamics between “classical” hosts (typically injected drug users) and “new” hosts (typically young MSM) and show that a recent increase in HCV incidence in Lyon is due to considerably higher transmission rates among “new” hosts . This study provides another great example of how phylodynamic analysis can help epidemiologists understand transmission patterns within and between different risk groups and the merits of expanding our toolkit of statistical methods for phylodynamic inference.
 Rasmussen, D. A., Volz, E. M., and Koelle, K. (2014). Phylodynamic inference for structured epidemiological models. PLoS Comput Biol, 10(4), e1003570. doi: https://doi.org/10.1371/journal.pcbi.1003570
 Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics, 162(4), 2025-2035.
 Ratmann, O., Donker, G., Meijer, A., Fraser, C., and Koelle, K. (2012). Phylodynamic inference and model assessment with approximate bayesian computation: influenza as a case study. PLoS Comput Biol, 8(12), e1002835. doi: https://doi.org/10.1371/journal.pcbi.1002835
 Saulnier, E., Gascuel, O., and Alizon, S. (2017). Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study. PLoS computational biology, 13(3), e1005416. doi: https://doi.org/10.1371/journal.pcbi.1005416
 Danesh, G., Virlogeux, V., Ramière, C., Charre, C., Cotte, L. and Alizon, S. (2020) Quantifying transmission dynamics of acute hepatitis C virus infections in a heterogeneous population using sequence data. bioRxiv, 689158, ver. 5 peer-reviewed and recommended by PCI Evol Biol. doi: https://doi.org/10.1101/689158
Given that the manuscript has already gone through two rounds of review and that the major concerns of the reviewers' from the second round have largely been addressed, I have decided not to send the manuscript out for another round of review. However, there are still a few issues I hope the authors can quickly address before I write my recommendation.
It's stated that the origin of the epidemic in classic hosts is estimated to be in 1957 but it was not possible to estimate when the epidemic in 'new' host started. But doesn't the 1957 estimate reflect the MRCA of all samples, regardless of whether they are "classic" or "new"?
Line 119 goodness-of-fit [test]?
Fig 3. do the summary statistics from the true phylogenies fall within the regions predicted by posterior simulations for individual statistics or just for the PCA on the summary statistics? Would it be more convincing to show the original summary statistics?
Lines 192-194: This is hard to understand, why would adding stages of infection make it "almost impossible to simulate phylogenies"?
Lines 201-203: "Although the multi-type birth-death model is unlikely to be directly applicable... because it links the two epidemics via mutation... whereas in our case the linking here the links is done via transmission events". This is not true. The multi-type birth-death model can handle type changes due to transmission or mutation/migration. Please remove!!
Lines 204-205: "We were unable to conclude anything from this analysis which rises the limitation of the likelihood-based approach for this dataset". In fairness to likelihood-based approaches, it is probably worth noting why the MTBD models implemented in BEAST did not work on this data set. In the response letter, the authors say that this is due to poor mixing. But is this due to difficulties in jointly estimating the phylogeny and evolutionary parameters along with the epidemiological parameters? Does the MCMC converge if the phylogeny is fixed (as was done for ABC)?
Both of the original reviewers have now reviewed your revised manuscript. Unfortunately, both still raise substantial concerns that their original concerns were not adequately addressed or deserve more attention.
I strongly agree with the opinion of the reviewers, especially since their criticisms concern the main findings of the paper regarding transmission dynamics in the different risk groups. I would especially urge the authors to carefully address the concerns of the reviewers with respect to:
1) Sampling differences between the "new" and "classic" risk groups and how this impacts the estimated epidemic growth rates in each group. 2) The comments from Reviewer #2 about the priors on the ratio of the transmission rates between risk groups. I agree with the reviewer here that that the priors should give symmetric or equal prior probability to either risk group having a higher transmission rate.
I have read over your revised manuscript on biorxiv and I would like to send it back out for review. However, I noticed that several very helpful and productive comments made by the reviewers who are very knowledge experts in this field were not addressed. For example, the prior on nu in Figure 2 is still constrained to be greater than one, so it is technically impossible for the new hosts to have an estimated transmission rate less than classic hosts. This is however just one example, it appears that several other helpful suggestions made by the reviewers were also ignored.
To make the most of the reviewers time, please submit another version either with further edits or a response letter detailing why you have not followed the recommendations of the reviewers.
Your preprint has been reviewed by two experts with substantial experience in the field of viral phylodynamics. Both the reviewers and I appreciate that identifying populations and risk factors driving the transmission dynamics of HCV and other chronic infections is an extremely important topic relevant to public health. However, both reviewers raise substantial concerns about the analysis that I believe need to be addressed before I can offer a recommendation. In particular, one reviewer raises serious concerns about the quality of the phylogenetic reconstruction and therefore the conclusions that can be drawn from a single ML tree. The other reviewer points out that some of the main conclusions, such as in which risk group the epidemic is growing faster in, are not clearly supported by the data and may in fact be largely influenced by the author's choice of priors (i.e. no prior support is given to the alternative scenario that the epidemic is growing faster in 'classic' hosts). Both reviewers also point out that the model needs to be more clearly defined, including how individuals are classified into risk groups, and I would suggest maybe briefly describing the model and how different host types are defined before the results section in your revision.
Many of the reviewers concerns could be addressed by performing a second analysis using the multi-type birth death models implemented in the BDMM package in BEAST 2. The authors make a point in the Discussion that birth-death models might not be applicable here because the two epidemics are linked by transmission rather than mutation, but the 'types' assigned to lineages can either represent the type of the host (as in new or classical) or the type of pathogen (mutant or non mutant) under multi-type birth-death models. Performing this additional analysis with BDMM would allow the authors to compare their ABC methods with more traditional likelihood-based phylodynamic methods, which would lend trust to authors conclusions especially since ABC methods are still in their infancy and many readers might be interested in this comparison. Furthermore, fitting a multi-type birth-death model in BEAST would allow for joint inference of the phylogeny with the epidemic parameters, addressing the first reviewer's point about tree uncertainty.
In addition to the reviewers many thoughtful comments, I would add the following points as well:
Line 64: "date of the second epidemic" -- what is this second epidemic?
Line 54: "The width of the posterior distribution indicates our ability to infer a parameter" -- This is not necessarily true... the width of the posterior could be very wide with very long tails, but most of the posterior density could still be centered around a narrow range of values.
Bayesian should be capitalized throughout.
Could the authors comment on why was the infectious period inferred to be so short for "new" hosts?
Additional requirements of the managing board:
Please ignore this message if you already took there requirements into consideration. As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that:
-Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad (to pay) or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data.
-Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused.
-Details on experimental procedures are available to readers in the text or as appendices.
-Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.”