In this manuscript, FJ Novo used genome-wide "epigenetic" marks (histone modifications, DNA methylation, chromatin accessibility, transcription factor binding) with chromatin contacts and gene expression data, to detect putative regulatory elements in the human brain. The evolution of these elements was then studied by comparative genomics.
I am very sympathetic to the aims of this paper, and the starting point of integrating functional genomics in one species with comparative genomics is sound. But I was disappointed both by the results and by the writing. I recommend to the author
I was disappointed that all the functional genomics integration led to the study of only 3 genes. Moroever, while correlative evidence is sufficient to discuss large scale patterns, I expect stronger evidence than that presented on page 8 to specifically infer the function of a regulatory element. Especially given the "manual inspection" step, which means that the analysis cannot be reproduced and is inherently subjective.
Page 10, the link with educational attainment is interesting, but it should be noted that such complex phenotypes, like size or life expectancy, can be affected by an extremely high number of pathways. Thus this does not necessarily imply a role in the brain, in itself.
The manuscript systematically represents evolution as a progress from "lamprey or earlier species" to fishes, to "chicken onwards", which is erroneous. These are all present day species, which have evolved for the same time. We do not have evidence of functional genomics of the ancestral "earlier" species. It is possible and interesting to infer some of their characteristics from comparative data in a phylogenomic framework, but that is not done here.
"BRE1 is a vertebrate innovation appearing in Gnathostomes": since homology was determined by Blastn, it is possible that other species have an ortholog, but which is too divergent for detection. For protein sequences, it is not unusual that Blastp fails to detect true orthologs, which are detected by psi-Blast.
"We observed that coelacanth, spotted gar and elephant shark have orthologs for TANK, PSDM14 and TBR1 in the same order and orientation than mammals": how does this compare to an expectation from 3 random genes?
It is surprising that the manuscript discusses a duplication in teleostei fishes (pp 11-12) without mentioning the teleost fish genome duplication, and the enrichment in transcription factors and in brain expressed genes in the retention of genes.
"The classical and largely outdated view of promoter-enhancer interactions suggested that a regulatory element would most likely regulate the activity of the closest gene": reference needed, or you risk attacking a straw man.
The manuscript by Francisco Novo, Identification and evolutionary analysis of eight non-coding genomic elements regulating neurodevelopmental genes, describes a detailed evolutionary analysis of candidate non-coding regulatory elements. Eight regulatory elements were selected based on their proximity to three genes – TBR1, EMX2, and LMO4 – which encode transcription factors likely to play roles in nervous system development. The bulk of the study describes an analysis of publicly available genomic data to identify the location of regulatory elements, combined with an effort to characterize the evolutionary origin of these candidate enhancers using a number of sequence based analyses. Overall this study is well done and will be of substantial interest to researchers in the field.
(1) The candidate enhancers selected for detailed analysis were largely chosen based on the frequency of contacts in Hi-C data collected from human fetal brain. Novo makes the assumption that these regulatory elements, which bear the marks associated with enhancers and form loop interactions with the target genes of interest, regulate the transcription of these genes. Although there is mounting evidence to support the notion that these enhancers are more likely to regulate expression of the candidate genes (see especially Fulco CP et. al. (2016) Science, PMID# 27708057), there are undoubtedly exceptions to this assumption and no direct functional validation is available for most of the regulatory elements in the present study. The manuscript would benefit from toning down the language that implies a causal relationship between candidate enhancers and the genes of interest (including in the title). I also think that some discussion on the limitations of Hi-C data for this task, mostly noting that it is not a direct functional validation of enhancer activity, would also be useful.
(2) Special care should be taken when interpreting contact frequencies in the Virtual 4C plots that are nearby the anchor points (shown in Figures 2, 4 and Supplementary Fig. S6), especially near EMX2 (Fig. 4). Hi-C data, and indeed all chromosome confirmation and capture data, has a high signal in nearby regions that lie along the “diagonal” of a Hi-C heatmap. This is often interpreted by many authors as “background”. The Y-axis reads “Hi-C read value”, which I take to mean the un-normalized contact frequencies between two loci – it would be useful for readers to make it clear if normalization was applied to correct for the decay as a function of distance that is commonly found in Hi-C contact frequencies. In either case, it is possible that these contacts are biologically relevant, but this limitation should be considered carefully, and noted in the text, when interpreting the biological function of these putative loop interactions.
(3) Many enhancers in mammals recruit RNA polymerase II, which transcribes short, unstable non-coding RNAs (Kim et. al. (2010) Nature, PMID# 20393465). Could the poorly characterized non-coding RNAs overlapping several of the BREs reflect transcription of enhancer-templated RNAs transcribed from the enhancer itself?!
(4) The author tracks the evolutionary origin of DNA sequences that are identified as candidate enhancers using experiments in either human or mouse. Many enhancers that have an orthologous sequence in the genome of another species are not conserved at the functional level (see a variety of work by Duncan Odom’s lab, as well as others). While Novo is careful throughout the manuscript not to imply that DNA sequence conservation reflects functional conservation, adding an explicit note to the text that there is a major disconnect between conservation at these two levels would be useful for readers.
In addition, several of the enhancers described herein are conserved at the DNA sequence level in both human and mouse. In these cases a direct comparison between publicly available data in human and mouse may help to sort this out.
(5) Fig. 4 would be easier to read if the position of BREs near EMX2 were included.