Retroviral Elements in Human Evolution and Neural Development

Human embryogenesis and the development of its most unique product, the human brain, are believed to be precisely regulated by factors adopted during human evolution that differentiate us from other species. Nevertheless, increasing evidence shows an unthinkable “alien” factor may have contributed to the process. Pervasive horizontal gene transfer between species mediated by retroviruses is such a defining factor of evolution [1]. Retroviral infections occurred in germline cells and were able to transfer the genomic codes vertically from parent to offspring. These genes once integrated into the host chromosome, can get dispersed and exist in multiple mutated copies throughout the host genome. As a result, retroviral genes and other retro elements contribute to about 50% of the human genome. Of these, 20% belong to the group of LINEs and over 8% consists of HERVs which are relatively intact since they were acquired more recently [2]. From an evolutionary point of view, these retroviral elements have at least a few known functions that could benefit the human host. Generally, the vast amount of such “relic” genes in the genome can provide a specific buffer zone to preserve functional genes against further viral infections and other gene mutation causing events. The similarities of gene sequences and functions provide a more specific competition to limit further similar viral infections [3]. These functions are evidenced by the abnormal shares of mutations and translocations within the retroviral elements compared with other functional genes. Other functions of the HERV proteins lent to the host include the immune regulatory functions, such as an immunosuppressive function mediated by a domain located in the transmembrane subunit of the HERV-W [4,5]. In the present review, we focus on the effects of retroviral elements on human embryogenesis and neural development.

cells. These retro elements include LINE-1 [6][7][8], the older HERVs, and the most recently incorporated HERV, HERV-K, which is also the best-preserved family in the human genome [9][10][11]. The oldest full-length envelope gene identified to date in humans, HEMO [human endogenous MER34 (medium-reiteration-frequency-family-34) ORF], was captured as part of a retrovirus, MER34 which was incorporated into the mammalian genome more than 100 Mya [12]. This virus has a preserved open reading frame for only the envelope gene. This protein is detectable in the blood of pregnant women and is highly expressed in pluripotent stem cells and tumors [12]. Other relatively old HERV genes include two almost identical HERV-V envelope genes in chromosome 19, ENVV1 and ENVV2. These viral genes are found in simian species and humans but not in the pro-simian species. ENVV1 has a 477 amino-acid long open reading frame and ENVV2 has an open reading frame containing 535 amino acids. Variation is only observed in the C-terminus of the genes by a ~60 amino-acid truncation of ENVV1, due to a one nucleotide insertion leading to a frame shift. Both ENVV1 and ENVV2 show placenta-specific expression in humans and a baboon species [13]. HERV-R (ERV3) envelope also expresses in human placenta, as well as in developing tissues, such as the adrenal cortex, kidneys, tongue, heart, liver and CNS [14].
HERV transcripts are increased during cell transformation and in human pluripotent stem cells [15]. We compared the expression of HERV elements in induced pluripotent stem cells (iPSCs) and differentiated neural cells using RNA-Seq analysis [16]. We found that 4,305 HERV annotated regions of the Human Genome (hg38) were expressed in at least one cell type profiled; 1302 regions were expressed exclusively in iPSCs and 574 regions were differentially expressed between one or more cell types. Most of the differential expressions were between iPSCs and other cell types, suggesting that maximal expression of these genes occurs in early embryogenesis, then they get silenced during neural development.
The mechanisms of retro elements influencing embryonic development are just being realized. The LTRs function in a stage-specific manner by activating transcription, altering protein-coding sequences, producing noncoding RNAs, and even supporting the evolution of new protein-coding genes, resulting in mRNAs, lncRNAs, or proteins with regulatory roles (46). In the 2 cell-stage, many gene' transcripts are initiated from ERV LTRs, such as MERVL (47). Some ERV families contain preserved splice sites that join the ERV segment with non-ERV exons in their genomic vicinity (48). The envelope protein of HERV-W is expressed in mammals as syncytin. It has been adopted by human species as an essential cellular protein for trophoblast and placental development [1,17,18]. Another HERV, HERV-Fb1, transcribes the protein suppressyn, which only expresses in the placenta in vivo, competes with sycytin-1 for its receptor (ASCT-2), and inhibits sycytin-1 mediated trophoblast cell-cell fusion [19]. HERV-H activation is another marker for pluripotent stem cells [20] and plays a regulatory role in the stemness and differentiation potential of pluripotent stem cells [21]. HERV-H is expressed as a non-coding RNA that regulates transcriptional factors [22]. Along these lines, we have shown that HERV-K subtype HML-2 envelope protein is expressed in human pluripotent stem cells but not in differentiated neural cells. HERV-K envelope regulates stemness and the differentiation potential of the cells by interacting with cell membrane molecules and cell matrix networks, leading to the activation of signaling mechanisms implicated in cell proliferation [16].

Retroviral Elements and Neural Development
Recently acquired HERV-K subfamily HML-2 has nearly full-length viral sequences with open reading frames for gag, pro, pol, and env genes [23]. HML-2 activation has been observed in pluripotent stem cells [24], mesenchymal stem cells [25] and certain tumors [26]. We found that HML-2 envelope expression increased in pluripotent stem cells but diminished during neuronal differentiation [16]. When HML-2 envelope was forced to express by transfecting iPSCs with an HML-2 env containing plasmid, the neuronal induction process was inhibited, as indicated by the cells' morphology and lower nestin expression. Nestin is a marker of neural stem cells but is not expressed in pluripotent stem cells. On the contrary, inhibition of HML-2 env in iPSCs facilitated induction of neural stem cells and eventually neuronal differentiation [16].
LINE-1s are abundant retrotransposons that comprise approximately 20% of mammalian genomes. Activation of LINE-1 occurs mainly in early embryonic development and during hippocampal neurogenesis [27]. Active LINE-1 retrotransposons can create insertions [28], deletions, and new splice sites to the genome [27,29]. Somatic LINE-1 retrotransposition during neurogenesis is a source of genotypic variation among neurons. The highest concentration of such LINE-1 activation in the human adult brain is in the dentate gyrus, the hotspot of adult neurogenesis [30]. A single-cell retrotransposon capture sequencing (RCseq) study on individual human hippocampal neurons estimated that 13.7 somatic LINE-1 insertions occurred per hippocampal neuron. These genomic loci carried the sequence hallmarks of target-primed reverse transcription, suggesting pervasive LINE-1 mosaicism in hippocampal neurons [31].

Mechanisms of HERV Control During Embryonic and Neural Development
There are many mechanisms that regulate HERV expression throughout development and in differentiated cells. Transcriptional activity of HERVs is regulated by binding of epigenetic modifiers and transcription factors to the long terminal repeats (LTRs) that flank the 5' and 3' ends of the viral genome. Many HERVs exist as only solo LTRs, most likely due to homologous recombination between the 5' and 3' LTRs that results in deletion of the internal coding sequence [32]. Transcription profiles of HERV expression in healthy human tissues indicate that HERV proviruses that retain intact coding sequences are differentially expressed in a cell type specific manner, with the highest expression observed in the thyroid glands, skin, reproductive organs and tissues of embryonic origin, and the lowest expression in non-dividing, terminally differentiated cells [33]. Aberrant expression of endogenous retroviral genes at an inappropriate embryonic stage or in tissues where their transcription is normally suppressed is strongly associated with development of disease. Therefore, control of HERV expression by epigenetic regulation is a crucial part of tissue homeostasis. DNA (CpG) methylation performed by DNA methyltransferase 1 (DNMT1) is an important mechanism of retroelement silencing, particularly for the most intact and transcriptionally active member, HERV-K/HML-2; the LTRs of HERVs are often hypermethylated in healthy tissues, and loss of CpG methylation results in increased HML-2 transcription that may contribute to oncogenesis [34,35]. In addition to DNA methylation, chromatin remodeling due to acetylation or methylation of histone tails plays an important role in regulating the accessibility of the LTR to transcription factors and other proteins. Specifically, trimethylation of lysine 9 on histone H3 (H3K9Me3) by the lysine methyltransferase SETDB1 is strongly associated with repression of both HERV and LINE-1 elements. Loss of this histone mark resulted in global upregulation of retroelements in colorectal cancer cells [36]. These methylation patterns are established during embryogenesis when members of the Krüppel associated box zinc finger protein (KRAB-ZFP) family, such as TRIM28/KAP1, bind to HERV promoters in a sequence-specific manner. A repressive complex that includes DNMTs and SETDB1 is then recruited to induce formation of heterochromatin [37].
It has been hypothesized that nucleosomal positioning plays a role in HERV transcription by a mechanism observed in latent HIV infections. In a transcriptionally inactive HIV promoter, there is a nucleosome positioned immediately downstream of the transcription start site, which prevents access to transcriptional machinery. Binding of specific transcription factors to enhancers in the LTR or differential methylation or acetylation of the histones within the nucleosome induces repositioning of the nucleosome to allow for transcription [38]. HERV-K/HML-2, contains many intact promoter and enhancer elements in its LTRs that can regulate its expression [39]. Importantly, there is genetic variation in the LTR among HML-2 elements in the genome; therefore, the promoter, enhancer, and transcription factor binding sites at a given locus are variable. There is a positive correlation between HERV-K/HML-2 LTR sequence variation and promoter expression patterns [40]. Sequencing data from the 1000 Genomes Project, observed that an active form of the chromosome 11p15.4 HML-2 locus was polymorphic in the human population with an allele frequency of 51% [40]. In addition, locus, 3q12.3 was observed to be fixed in humans but absent from the orthologous virus in chimpanzees and gorillas [40]. These data suggest that transcription factor binding site differences between HML-2 LTRs may play a role in the differential expression seen among individuals.
Recent work in breast cancer cell lines has revealed a role for the progesterone-response element and the octamer-binding transcription factor 4 binding sites in some HML-2 LTRs [41]. An isoform of the progesterone receptor was found to bind the progesterone-response element in the LTR of HML-2. This binding was mediated by a physical interaction between the progesterone receptor and octamer-binding transcription factor 4 transcription factor [41]; however, the role of this interaction in brain development or disease has not yet been studied. The HML-2 promoter also contains several putative binding sites for the ubiquitously expressed RNA-binding protein TDP-43, which is dysregulated in neurodegenerative diseases such as ALS and FTLD [42]. Overexpression of TDP-43 in iPSC-derived neurons caused upregulation of HML-2 transcripts and induced cytotoxicity, which suggests that TDP-43 may function as a transcriptional regulator of HERVs in the central nervous system. The HERV-K LTR, also contains multiple transcriptional initiator sites, which can be targeted by other transcription factors such as microphthalmia-associated transcription factor. The latter is responsible for the significantly enhanced HERV-K expression in malignant melanoma [43]. In another case, an HML-2 insertion upstream of PRODH gene exhibits tissue-specific enhancer activity with maximal expression in hippocampus. The enhancer activity is regulated by methylation and involves the binding of SOX2. PRODH is the mitochondrial proline dehydrogenase that regulates proline catabolism. PRODH is critical for normal CNS function and has been associated with schizophrenia [44].

Role of Envelope Proteins in Development
Retroviral envelope proteins are known to facilitate cell-to-cell adhesion and in some cases cause cell fusion. For example, HERV-W syncytin leads to fusion of trophoblasts to form the placenta [17,45]. Interestingly, HERV proteins often use membrane transport proteins as receptors, such as Alanine, Serine, Cysteine Transporter 2 (also named SLC1A5) for syncytin-1 (HERV-W) [46,47], and CD98 heavy chain, also named solute carrier family 3 member 2 for HERV-K envelope [16]. Interactions between HML-2 envelope and the CD98 heavy chain leads to activation of signaling pathways in pluripotent stem cells to maintain cell adhesion and stem cell morphology [16]. Amongst them are the mTOR and LPCAT1 pathways. LPCAT1 is downstream of mTOR and catalyzes the conversion of lysophosphatidlycholine to phosphatidylcholine [48] and the palmitolysation of histone H4 to open the chromatin and maintain stemness [49]. HML-2 envelope expression is also associated with expression of ribosomal protein S6 (rpS6), a crucial effector of the mTOR signaling pathway [16]. Active mTOR regulates cell size [50] and cell proliferation by increasing protein synthesis and regulating ribosome biogenesis and autophagy, subsequently affecting the cytosol viscosity [51]. Thus HML-2 envelope is also critical for pluripotent stem cell function by regulating cell functions through the evolutionarily conserved mTOR pathway. Increased levels of mTOR activation in human outer subventricular zone radial glia is a critical factor in the differentiation of human brains from non-human primates [52]. We observed that transfection of rhesus neural stem cells with HML-2 env resulted in high levels of rpS6 and LPCAT1 expression, further implying that HML-2 incorporation in the human genome caused the increased activity of mTOR in human stem cells and may have played a role in human brain evolution [16].

Role of Gag Proteins in Synaptic Function
The neuronal gene Arc/Arg3.1 plays an essential role in the consolidation of synaptic plasticity and long-term memory [53] and the dysregulation of Arc is implicated in cognitive diseases [54]. Arc is highly conserved among mammals, birds, reptiles, and amphibians, but is not present in fish [55,56]. A crystal structure of the Arc protein revealed significant homology to the capsid region of the human immunodeficiency virus (HIV) gag protein [55]. Furthermore, Arc is known to contain an internal ribosomal entry site, a feature common in the translation of viral genes [57]. Sequence analysis indicated that Arc evolved from the Ty3/gypsy family of retrotransposons, which are present in animal, plant, and fungal kingdoms [58]. During the domestication of Arc in vertebrates, the N-terminal domain, which mediates binding with synaptic proteins was acquired, and the zinc knuckle and reverse transcriptase portions were lost [55,58].
In the presence of RNA, purified Arc protein spontaneously assembles into oligomeric, virus-like capsid structures [56]. Arc protein interacts with and binds to its own mRNA. Through the release of synaptic vesicles, Arc transfers its mRNA intercellularly between neurons. This mRNA can then be translated locally in the dendrites [56]. Disruption of this transfer in Drosophila results in aberrations in synapse maturation and activity-dependent plasticity [59]. Tetrapod and fly Arc genes originated independently, from distinct lineages of Ty3/gypsy retrotransposons, but both are involved in intercellular trafficking of RNA and are essential for proper neuronal function [56,59]. The human genome contains at least 85 genes that encode for proteins resembling viral gag proteins [58]. Therefore, it is likely that there are additional retroviral gag elements that have been coopted for essential physiological functions. Additionally, it is possible that other viral elements, such as the viral protease, may have played a role in the evolution of the human nervous system.

Neural Development
The RNA of LINE-1 is localized to the nucleus in embryonic stem cells and preimplantation embryos. DMNT-1 is responsible for the high CpG methylation of LINE-1 lineages that are younger than 12.5 million years, corresponding to hominoid-specific elements, including many that are human-specific. Deletion of DMNT-1 resulted in hypomethylation of DNA and chromatin remodeling and increased activation of these younger LINE-1s, which are responsible for transcriptional enhancement of many proteincoding genes involved in neuronal functions and psychiatric disorders [60]. LINE-1 neuronal transcription and retrotransposition are also increased in the absence of methyl-CpG-binding protein 2, another protein involved in global DNA methylation [61].
LINE-1 functions as a nuclear RNA scaffold, which recruits Nucleolin and Kap1/Trim28 to repress Dux, the master activator of a transcriptional program specific to the 2-cell embryo. In humans, there is a strong association between the establishment of accessible chromatin and embryonic genome activation. A large proportion of the early activated genes and HERVs are bound by DUX4 and become accessible as early as the 2-to 4-cell stages [62].
Many transcription factors important to neural development regulate, or are regulated by, LINE-1. For example, one LINE-1 element contains overlapping Sox2 and T-cell factor/lymphoid enhancer factor (TCF/LEF)-binding sites (Sox/LEF), which make up a transcriptional site regulated by Wnt and β-catenin signals. Wnt3a ligand and β-catenin increased acetylated histone H3 levels in the LINE-1 genomic region, inducing the active chromatin state, causing an increase in the amount of LINE-1 ORF2 mRNA in neuronal stem cells. Wnt3a ligand and β-catenin signaling may also affect nearby genes such as DCX and neurogulin-4, which are important in regulating neural development [63]. The SOX-11 protein also binds the LINE-1 promoter, causing induction of LINE-1 transcription in neural differentiating conditions [64].

Summary and Future Directions
We have presented evidence from recent literature that retroviral elements incorporated in the human genome have played important roles during human evolution. Although most of the insertions have been silenced, a few of the retroviral elements such as HERVs and LINE-1s can still be active in human embryogenesis and play important functions in human development. The fine regulation of HERV-K and LINE-1 are especially important in human brain development. HERV-K activation is a key event that regulates mTOR activation, which differentiates human beings from other primates. Additionally, the increased LINE-1 activity during neural genesis mediates adult neuron diversity in the brain.
The regulation of the retroviral elements is influenced by a general mechanism regulating DNA methylation status, which is lower in certain stages of embryogenesis. There are also regulatory mechanisms specific to species, tissues, even cell types leading to activation of individual retroviral elements. It is notable that there are also interactions among retroviral elements, which may be either positive or negative associations. Our knowledge of the role of retroviral elements in brain development and disease is still rudimentary but this is a fertile and promising area of investigation that can provide novel insights into disease pathogenesis and identify new targets for treatment.

CD98HC
CD98 Heavy Chain   Retro elements activation and function in human embryogenesis and nervous system development.