Not all genetic elements composing genomes are there for the benefit of their carrier. Many have no consequences on fitness, or too mild ones to be eliminated by selection, and thus stem from neutral processes. Many others are indeed the product of selection, but one acting at a different level, increasing the fitness of some elements of the genome only, at the expense of the “organism” as a whole. These can be called selfish genetic elements, and come into a wide variety of flavours , illustrating many possible means to cheat with “fair” reproductive processes such as meiosis, and thus get overrepresented in the offspring of their hosts. Producing copies of itself through transposition is one such strategy; a very successful one indeed, explaining a large part of the genomic content of many organisms. Killing non carrier gametes following meiosis in heterozygous carriers is another one. Less know and less common is the ability of some elements to turn heterozygous carriers into homozygous ones, that will thus transmit the selfish elements to all offspring instead of half. This is achieved by nucleic sequences encoding so-called “Homing endonucleases” (HEs). These proteins tend to induce double strand breaks of DNA specifically in regions homologous to their own insertion sites. The recombination machinery is such that the intact homologous region, that is, the one carrying the HE sequence, is then used as a template for the reparation of the break, resulting in the effective conversion of a non-carrier allele into a carrier allele. Such elements can also occur in the mitochondrial genomes of organisms where mitochondria are not strictly transmitted by one parent only, offering mitochondrial HEs some opportunities for “homing” into new non carrier genomes. This is the case in yeasts, where HEs were first reported [2,3].
In this new study, based on genomic experimental data from the fungal maize pathogen Ustilago maydis, Julien Dutheil and colleagues  document one possible evolutionary pathway for which little evidence existed before: the passage of a mitochondrial HE into the nuclear genome. The GC content of this region leaves little doubt on its mitochondrial origin, and homologs can indeed be found in the mitochondrial genomes of close relatives. Strangely enough, U. maydis itself does not appear to carry this selfish element in its own mitochondria, suggesting it may have been acquired from a different species, or be subject to a sufficiently rapid turnover to have been recently lost.
Many elements of the story uncovered by this study remain mysterious. How, in the first place, was this HE gene inserted in a nuclear genomic region that shows no apparent homology with its original insertion site, making typical “homing” a not-so-likely explanation? This question may in fact be generalised to many HE systems: is the first insertion into a homing site always the product of a typical homing event, which implies the presence of an homologous template DNA fragment, or can HE genes insert through other means? But then, why specifically in regions that would be targeted by the nuclease they encode? What is the evolutionary fate of this newly inserted element? The new gene may well be on its way to pseudogenisation, as suggested by the truncation of its upper part, precluding its functioning as a HE, and the lack of evidence of selective constraints through dN/dS analysis; but the mutation generated by the insertion event may have phenotypic implications, possibly through the partial truncation of another gene, encoding a helicase. How old is this insertion? The fact that it has accumulated some mutations makes a very recent event rather unlikely, but this insertion has been detected in only one isolate of U. maydis, suggesting it is not so frequent in natural populations.
Whatever the answers to these open questions, that will hopefully be addressed by further work on this system, the present study has revealed that horizontal transmission enlarges the scope of possible evolutionary consequences of HE genes, that may move not only between mitochondrial genomes, but also occasionally into a nucleus.
 Burt, A., and Trivers, R. (2006). Genes in Conflict: The Biology of Selfish Genetic Elements. Belknap Press.
 Coen, D., Deutch, J., Netter, P., Petrochillo, E., and Slonimski, P. (1970). Mitochondrial genetics. I. Methodology and phenomenology. Symposia of the Society for Experimental Biology, 24, 449-496.
 Colleaux, L., D’Auriol, L., Betermier, M., Cottarel, G., Jacquier, A., Galibert, F., and Dujon, B. (1986). Universal code equivalent of a yeast mitochondrial intron reading frame is expressed into E. coli as a specific double strand endonuclease. Cell, 44, 521–533. doi: 10.1016/0092-8674(86)90262-X
 Dutheil, J. Y., Münch, K., Schotanus, K., Stukenbrock, E. H., and Kahmann, R. (2020). The insertion of a mitochondrial selfish element into the nuclear genome and its consequences. bioRxiv, 787044, ver. 4 peer-reviewed and recommended by PCI Evolutionary Biology. doi: 10.1101/787044
Decision on “The transfer of a mitochondrial selfish element to the nuclear genome and its consequence” by Dutheil et al (doi.org/10.1101/787044).
I thank the authors for their additional work that has clarified a number of points. Although the reviewers both consider the new version as sufficiently revised, I must confess I still have some concerns that I try to summarise below. In brief, with the clearer overall picture we now have in hands, this genomic change looks very much to me like a very recent mutational event, with little, if any, evolutionary implications. It is of interest to note that some HEG can occasionally insert somewhere in the nuclear genome, and I acknowledge that one had to investigate this insertion in more details. But the finding that it is most likely on its way to pseudogenisation arguably mitigates the significance of this observation. I would find it unfair to simply not recommend this study for this reason, but I suggest below some further revisions to try and clarify this point, and avoid giving excessive expectations to the reader. I apologise in advance if some of the issues raised below stem from my too limited knowledge of the system. If this is the case, may be some additional clarifications are needed to make the paper most accessible to a wide readership.
I will start with a summary of what I understood, so that the authors can correct me wherever there is a misunderstanding:
strong similarity with mitochondrial sequences of close relatives (but not present in Ustilago maydis itself)
More specifically, this sequence is an insertion of a mitochondrial homing endonuclease. It is not clear how this insertion got there specifically.
(I have one question on this point, that can hopefully be addressed in the paper. The question is general at first: how do HEGs integrate IN THE FIRST PLACE at their homing site? I do understand how HEGs can spread through gene conversion following cleavage; BUT how does the first insertion occur at the target site? In my understanding, HEGs do not themselves carry the capacity to insert; they are DNA cutting enzymes. Strangely, I did not find an answer to that seemingly basic question in the literature; did I misunderstand something? => The question then becomes more specific: how did this HEG insert at this site in the nuclear genome. The authors mentioned in their rebuttal letter that not much can be said on the insertion site; this should then be stated explicitly, saying that we have no idea how the HEG got there; obviously, the answer to the general question above will affect the answer to this specific one).
The gene is no longer acting as an HEG (anyway, it cannot be, since it is not inserted in a homing site; right?). It has lost the HEG active sites and its start codon. It does have a potential start codon, but it is not transcribed. The dN/dS is hard to estimate because the branch is very short, but it may be high. Most likely, it is on its way to pseudogenisation.
The integration of this sequence correlates with the possible truncation of a neighbouring gene (UMAG11065), which includes many paralogs in the genome. Some paralogs are coding for fully functional helicase enzymes. May be UMAG11065 was truncated upon the insertion of UMAG11064, maybe it was truncated before (but the later hypothesis is not considered by the authors?). Now UMAG11065 is expressed a little bit.
Experimental knockout of UMAG11064 has no phenotypic consequence; the knockout of UMAG11065 has some effects in some particular experimental conditions, that may be related to the loss of its activity; (which does not mean this activity is a function maintained by selection). In brief, there is no evidence that UMAG_11065 is functional either.
The UMAG11064 is only found in this particular strain. This suggests it could be a very recent mutational event. Based on the phylogeny of UMAG11064 and its homologues, the authors suggest the insertion may have predated the split between Sporisorium reilianum and Sporisorium scitamineum, but in fact, (1) the (S. reilianum, S. scitamineum) node is poorly supported and (2) it may as well be that UMAG_11064 came very recently from another source, not sampled here (from a strain that that would indeed branch at that position in the tree). The fact that it is not present in other natural populations argues against an ancient insertion, and against a new functional role.
Could the authors comment on this summary to help us reach a decision? In the additional comments below, I suggest some modifications on the basis of my understanding of the story.
The term “transfer” tends to suggest that the “mitochondrial selfish element” has retained its capacity to act as a such. I would suggest the following: “Accidental insertion of a mitochondrial selfish element to the nuclear genome and its consequences”
Some revisions of the abstract would reduce risks of misunderstandings
L18: “Some HE genes are found within Group I introns, where they further facilitate their excision”. Without further explanations, one wonders here in what sense do the HEGs facilitate excision of the introns. The sentence would also suggest that HEGs alone are not selfish invasive elements, although they are. Being in an intron just reduces harmful effects. Overall, I would thus suggest to remove this sentence from the abstract.
L22: “HE that integrated into”; I suggest adding again the adverb “accidentally”
L24: “or a horizontal transfer”. In some sense, transfer from mtDNA to the nucleus is already a horizontal transfer, even if this occurs from your own mitochondria. I would thus suggest “or a horizontal transfer from a different species”
L25: “acquired a new start codon,” => in fact, the start codon is just a remnant of a methionine codon that happened to be there; the current phrasing would suggest a new start codon was selected for. Something like this may be more appropriate: “The telomeric HE underwent mutations in its active site and lost its original start codon. A potential other start codon was retained downstream, but we did not detect significant transcription of the newly created open reading frame, suggesting the inserted is not functional.”
L29: the last two sentences starting with “This unusual homing event” are problematic in my view. “Creation of two new genes seems inappropriate. The ‘homing” term can also be questioned because it should be restricted, in my understanding, to the conversion event following a DSB at the homing site. Here we rather have an accidental insertion event, through an unknown mechanism. Instead of those two sentences, I would suggest a more modest ending of the abstract. First, the absence of the insertion in other strains should be mentioned, as an indication that this event is likely recent and, in any case, not fixed. The abstract could end by stating that such mutations may be important in some cases, although this is not apparently the case here. Something like: “The absence of this insertion in other field isolates suggests it likely represents a recent mutational event, and brings no support for a putative adaptive significance. These findings indicate that mitochondrial HEGs can occasionally insert in the nuclear genome, a particular mutational event that may constitute a source of adaption, although we found not support for such evolutionary implications in that case.”
L58: “using the HEG itself as a template”; if I understood correctly, what is used as a template is the homologous chromosome, which happens to carry the HEG. I find it slightly unclear to state that the HEG is used as a template: it just happens to be part of the template.
L119: “…which suggests that UMAG_11064 is an authentic nuclear gene.” It seems to me that the nuclear location of the gene is already well established at that stage by the genome assembly + PCR control. So, to me, this new piece of data (the fact that there is no copy of this gene in the mitochondria) should rather be seen as an argument that the gene was either lost from mitochondria following its transfer, or acquired from a different species.
L136: “The amino-acid sequence of UMAG_11064 matches the N-terminal…” may be an indication of the level of identity at the protein level would be useful here.
L163: “To further assess the possibility that the UMAG11064 gene is evolving under positive selection, …”. I would suggest a rephrasing with something like: “the possibly high dN/dS seen in the UMAG11064 branch could be explained both by relaxed purifying selection or positive selection. To assess the validity of these two explanations…”
L189: “the cox1 gene seems to be a hotspot of Group I introns in smut fungi”; to make the idea more explicit, I would suggest: “… of Group I introns encoding HEG in smut fungi”
L190: “Lastly, intron 1 in S. reilianum was not detected in U. maydis.” The next bit of this paragraph deals, if I understood well, with the UMAG_11064 gene alignment. But strangely the paragraph starts with this sentence about the absence of this gene in U. maydis (presumably in the CO1 gene?). It seems to me this information should not be presented here, or not in this way.
L199: “…detected 13 homologous sequences…”; may be use “paralogous” here instead of “homologous’ to make it clear that you are looking homologous sequences in the same genome?
L202: Could the authors state why they go for phylogenetic reconstruction here? What do they want to know with this analysis? I can see why it was required for the inserted gene, but it is not see clear for this gene. If it is only part of the dN/dS analysis, it should not pbe presented and discussed in details, and should not make a figure.
L208: “… but rather to its truncation”: but this assumes UMAG04486 corresponds to the ancestral form of UMAG11065. Are there good reasons to think this is the case?
L215: “Our results suggested, however, that the UMAG11065 gene evolved under purifying selection (dN/dS ratio equal to 0.342)”; but it seems to me you do not known if this selection regime still holds after the insertion of UMAG11064. The question is: is UMAG_11065 still functional and subject to purifying selection following this truncation. I suspect it is difficult to answer this question with the data in hands.
A remark on figures and supplementary figures legends: it is currently very difficult to follow which figure is which, since the figure numbers are not in the pdf.
Figure 6: It is slightly disturbing not to see UMAG11065 on these pictures, considering it the closest to UMAG11064?
L233: “an ancestor of the two strains 518 and 521”; shouldn’t that be “an ancestor of the three strains 518, 521 and SG200” ? But in any case, all these are coming from a single spore in the lab? I think this should be emphasised: only one occurrence of this insertion was found, likely very recent.
L318: “potentially had non-neutral effects”; but this is highly hypothetical; the neutral explanation proposed for the other story by Louis and Haber seems to correspond rather well to the current one.
L321: “However, an alternative start codon was detected,…”; yes, but the gene is not expressed. This would tend to suggest it is not functional.
L333 and following ones: in my view, arguing for an adaptive role is too speculative based on the data in hands.
Dear Julien and colleagues,
Many thanks for submitting your manuscript “The transfer of a mitochondrial selfish element to the nuclear genome and its consequences”.
I now have in hands two reviews, that are both positive on your work, but also highlight some potential points of improvements.
I concur with the suggestions they have made. In particular, I feel a dN/dS analysis, as suggested by reviewer 1, may give support to your suggestion that the inserted gene may be on its way to pseudogenisation. I also found that reviewer 2 made some suggestions that could lead to strong improvements, and would also make your paper more accessible to non-specialists. Just like him, I was wondering what kind of selective forces may turn a HEG to an intron. In that respect, it may be useful to clarify if this type I intron is still able to act as a HEG in the lineages where it is present. In other words, is it correct to denote this elements as a selfish element in its original location and in what respect? It may also make sense to ask why this particular nuclear site became a new insertion site. Can there be any prediction on the specificity of insertion sites? I also concur with Jan to say that horizontal transmission from a different species appears to be the easiest scenario; especially considering the fact that natural strains don' carry the mitochondrial or the nuclear version. Correct?
I also found that not enough emphasis was given to the finding that natural strains don’t appear to carry this insertion. If this was confirmed, the data may be interpreted as the result of lab rearing conditions, that may allow a slightly delirious mutation - such as this insertion - to be maintained, because of reduced population size or special environmental conditions? I would also suggest computing a tree of the various homologues; this may help the reader to understand the various plausible scenarios.
Finally, I have some minor remarks that are listed below.
Provided that these various comments are taken into account, including those mentioned in their evaluation, but which I have not reported here, I think your manuscript can be made suitable for recommendation by PCI.
Hoping that these comments will seem relevant to you.
L57: “As the recognized sequence is highly specific, the insertion typically happens at a homologous position”: it should be made clearer that specificity could target any region; but that only those targeting homologous positions do invade. Correct?
L73 (and abstract) “all kingdoms of life”: what living groups are referred to here? All domains (bacteria, archaea and eukaryotes) or more specifically different groups within eukaryotes?
How do HEG invade mitochondrial genomes? Is there an equivalent to homologous double strand break repair in mitochondria and chloroplasts?
L126: I assume this is nucleotidic identity? Please confirm.
L127: “Two other very similar sequences…” are they also HEG?
=> A tree showing the different homologs, their assigned functions and origins may be helpful here. This tree should also show the branch where the frameshift most likely occurred.
Sup tabs: I have had difficulties when trying to visualise the tables because there are commas inside definition fields; tab delimited fields would make it easier to read.
Are there good reasons to believe that the ancestor HEG targeted this insertion site?
L177: “suggesting that the latter was truncated because of the UMAG_11064 insertion.” But why would the insertion generate the truncation; mays be “following” instead of “because” would be more appropriate?
L184: “Interestingly, this gene family also contains the gene UMAG_03394…” In what sense is this interesting? Is there an implicit that the reader should make?
L197: “The UMAG_11072 gene…” what does this information tell? Is this a positive control? Of what exactly?
L202: wouldn’t a secondary loss be equally likely?
One question that could be addressed in the discussion: is there a link between the loss of this intron in CO1 and this nuclear insertion?
With regard to the scenario proposed in figure 7: it seems to me that one should highlight that there is basically no selective explanation for any of the transitions that seem to have taken place. Why did this insertion remain? Why did CO1 lose this intron? Why was UMAG_11065 gene shortened?
L244: “but the former cannot have happened…” unless the nuclear insert comes from another species?
L249: not clear why this hypothesis is not presented as the most likely, on the basis of these divergence levels…
I am not sure the arguments are strong enough to suggest that this insertion has any effect, either positive or negative, on fitness. The fact that it appears to be polymorphic does not argue for a strong positive effect anyhow.
L287: “It likely represents a snapshot of evolution, when a mutational event occurred, but selection did not have time yet to act.” I don’t get this idea. Is it argued that there are selective effects, but very mild ones?
L289: “Its absence in any field isolates of U. maydis sequenced so far…” I had not noticed before that this is only found in the lab. To me, this changes substantially the take home message: it is in fact very plausible that this mutation would be selected against in the field; more generally, the fact that there is no indication of any fitness consequence of this insertion would lead me to take this as an example of a special mutational event. The fact that natural strains don’t have the intron and don’t have the nuclear insertion either also argues against the view that the intron was transferred to the nucleus => more likely a horizontal transfer.
Could such a transfer have happened in the lab? Are the two species kept in close contact?
Additional requirements of the managing board:
Please ignore this message if you already took there requirements into consideration.
As indicated in the 'How does it work?’ section and in the code of conduct, please make sure that:
-Data are available to readers, either in the text or through an open data repository such as Zenodo (free), Dryad (to pay) or some other institutional repository. Data must be reusable, thus metadata or accompanying text must carefully describe the data.
-Details on quantitative analyses (e.g., data treatment and statistical scripts in R, bioinformatic pipeline scripts, etc.) and details concerning simulations (scripts, codes) are available to readers in the text, as appendices, or through an open data repository, such as Zenodo, Dryad or some other institutional repository. The scripts or codes must be carefully described so that they can be reused.
-Details on experimental procedures are available to readers in the text or as appendices.
-Authors have no financial conflict of interest relating to the article. The article must contain a "Conflict of interest disclosure" paragraph before the reference section containing this sentence: "The authors of this preprint declare that they have no financial conflict of interest with the content of this article." If appropriate, this disclosure may be completed by a sentence indicating that some of the authors are PCI recommenders: “XXX is one of the PCI XXX recommenders.”