Differential Expression of Long Non-coding Ribonucleic Acid (RNA) Genes in Endometriosis-associated Ovarian Cancer (EAOC): A Pilot Meta-analysis for Pathological Insights and Potential Diagnostic Biomarker Identification

Introduction: Endometrioid and clear cell carcinomas of the ovary are the most common subtypes of epithelial malignancy arising from endometriosis and are often termed endometriosis-associated ovarian carcinomas (EAOCs). There is a paucity of experimental evidence in the medical literature regarding the role of long non-coding ribonucleic acid (RNA) gene expression in the pathogenesis of these carcinomas. Purpose: There is a need to develop understanding of the pathogenesis of these carcinomas for neoplastic risk stratification in endometriosis and to develop novel diagnostic biomarkers. Clear cell carcinoma of the ovary, in particular, has a poor prognosis as a result of resistance to standard platinum-based chemotherapy. Methods: RNAseq datasets from EAOCs were downloaded from Gene Expression Omnibus (GEO) and compared with normal ovarian control sequences using a customized bioinformatic pipeline. Results: We found 88 differentially expressed non-coding RNA molecules present in both endometrioid and clear cell carcinoma types compared with controls. A further 117 were specifically differentially expressed in the endometrioid carcinoma group and 128 in clear cell carcinoma samples alone. Genes of interest for further study from the 88 shared set in both EAOC types include CASC9, RP4-561L24.3, SLC2A1-AS1, LUCAT1, XIST, CASC15, and MIR99AHG. These genes appear to influence ferroptosis as a common pathway. Conclusions: Alterations in the ferroptosis pathway may be a key event in development of EAOC in ovarian endometriosis patients. Further work is required to elucidate the function of the candidate RNA genes identified in this study by in-vitro, cell line and cultured organoid experiments. These candidate RNA gene biomarkers have potential clinical utility in early diagnosis, risk stratification of endometriosis, and post-surgical monitoring.


Clinical pathology
Ovarian cancer affects 15 women per 100,000 in Europe [1] but it is not one disease. Epithelial malignancy the most common type of ovarian malignancy and defines the groups termed carcinoma [2,3]. Other malignant subtypes include sarcomas, germ cell tumors and sex-cord stromal tumors [1]. Ovarian carcinomas are subdivided based on histological features, the most common being high-grade serous carcinoma which makes up around 70% of ovarian carcinomas [2,4]. Up to 10% of ovarian carcinomas are endometrioid subtype, having phenotypic and molecular resemblances to endometroid adenocarcinomas that arise in the endometrial cavity [3,5] Clear cell carcinoma (OCCC) and endometrioid carcinoma of the ovary (EnOC) occur on a background of ovarian endometriosis in as many as 70% of cases [6,7]. OCCC is equally as common as endometroid type, perhaps reflecting this shared origin and collectively are often referred to as Endometriosis-Associated Ovarian Carcinomas (EAOCs). Mucinous carcinoma is less common than endometriosisrelated carcinomas at around 3% of ovarian carcinomas [2]. Mucinous and low-grade serous carcinomas are rare. Lowgrade serous carcinoma has a distinct molecular origin from high-grade serous carcinoma and is regarded as entirely separate entity despite the similarity in their names [2, 4,8].
Ovarian endometriosis is a common estrogen dependent disease affecting up to 10% of reproductive-age women around the world [18,19]. It is characterized by the presence of endometrial glands and stroma in sites outside the endometrial cavity [19]. There is an increased lifetime risk of ovarian endometriosis progressing to malignancy of around 1% [20][21][22]. There is genomic and histological data to suggest that malignancy occurs through an intermediate, dysplastic stage called atypical endometriosis [20,[23][24][25][26].There is genomic, gene expression, and immunohistochemical evidence to support the theory that endometriosis is a premalignant condition [27][28][29][30][31][32] (Table 1).

Non-coding RNA
There is emerging data to suggest that elements of the human non-coding genome make a contribution to the pathogenesis of endometriosis-associated ovarian cancers (EAOC) [37]. The non-coding genome plays a part in the development of malignancies across a range of tumor types through transcriptional regulation and control of protein translation by non-coding RNA molecules [38][39][40].
The RNA molecules responsible for regulating the protein coding genome are divided into long (more than 200 nucleotides) and short molecules (<200 nucleotides). Small RNA molecules include microRNAs (miRNAs) which can direct messenger RNA for degradation before translation. piRNA molecules are PIWI-protein interacting and responsible for silencing transposons in the human genome [41]. Short RNA molecules may be derived from transfer RNA molecules (tsRNA) and these can stabilize messenger RNA for translation in opposition to microRNAs [42]. LncRNA genes play a role in human carcinogenesis by binding to and regulating transcription factors for protein coding genes, inactivating microRNAs that target messenger RNA transcripts for degradation, modifying protein function and cellular localization, influencing chromatin and histone modification, and regulating alternative splicing of mRNA [43]. These functions can affect a number of cell-signaling pathways in the development of cancer including control of cell proliferation, apoptosis and propagating epithelial-mesenchymal transition which is said to confer the ability of epithelial cells to invade connective tissue and metastasize [44][45][46][47][48] (Figure 1).

Long non-coding RNA
It has been shown that many of the non-coding somatic mutations present in EAOC converge on the PAX8 pathway in a range of ovarian cancer subtypes including endometrioid and clear cell subtypes [49]. Endometrial endometrioid adenocarcinoma of the uterine corpus has overlapping molecular pathogenetic characteristics compared with endometrioid adenocarcinoma of the ovary [50]. LncRNA molecule MALAT1 has been shown to be involved in the pathogenesis of endometrioid adenocarcinoma arising from the endometrial cavity by promoting epithelial-mesenchymal transition [51]. NEAT 1, OVAAL, H19, and HOTAIR have also been shown to have altered expression profiles in endometrioid adenocarcinoma [52][53][54][55][56].
Other lncRNA molecules that have been implicated in ovarian carcinogenesis are derived from studies that do not specify the histological subtype of ovarian malignancy. This is largely due to the use of ovarian carcinoma cell lines, most of which derive from high-grade serous carcinoma. The lncRNA   [56][57][58][59][60][61].
This study is a meta-analysis of published RNA sequencing (RNA-seq) data sets generated through high-throughput sequencing methods for differential expression analysis using a customized bioinformatics pipeline. The aim was to document the differential gene expression profile of EAOC with focus on lncRNA genes. Secondarily, the function and pathway involvement of these lncRNA genes was to be sought from in silico tools and databases for insights into EAOC pathogenesis.

Methods
This study is a meta-analysis of data generated by RNA sequencing by other researchers posted in a public access online database for the purpose of further analysis.

Data set identification
An online search of the NCBI (National Center for Biotechnology Information) gene expression omnibus (GEO) [62] repository was performed using keywords ovary, endometriosis-associated ovarian carcinoma, ovarian cancer, atypical endometriosis, endometriosis, clear cell carcinoma and ovarian endometrioid carcinoma to identify suitable RNA-seq datasets. The search results were further filtered using the terms 'homo sapiens' , and 'expression profiling by high throughput sequencing' . Tissue samples from ovarian carcinomas other than EAOCs, metastatic disease, fetal and embryonic tissues, fluid cytology samples, stem cells, circulating tumor cells and cell lines were excluded. Application of exclusion criteria yielded 1960 human RNA-seq data sets; five were for normal ovarian tissue (Geo Accession numbers GSM1010948, GSE127873, GSE137608, GSE135485, GSE18927), four were for endometriosis-related controls (eutopic and ectopic endometrial tissues and normal endometrium from healthy patients; accession numbers GSE118928, GSE99949, GSE87809, GSE87810), and one for EAOC samples (GSE121103). There were no RNA-seq data sets available that included atypical endometriosis samples.
Datasets were downloaded using the NCBI SRA-toolkit.

Clinical details of ovarian carcinoma tissue samples
Patient information for each of the source samples used to generate the RNA-seq data for GSE121103 is given in Table  2. One of the endometrioid carcinoma samples (EnOC) failed to generate reads of sufficient quality for publication to GEO but the investigators do not specify which of the samples this refers to in their paper [37].

Bioinformatic pipeline
Fastq files are first assessed for quality using FastQC [63]. The RNA sequencing reads were aligned to the reference genome using STAR aligner using the GeneCounts argument for the quantMode flag, which generates a gene count table file labelled ReadsPerGene.out.tab. The resulting alignment from STAR generates a file of summary mapping statistics annotated as Log.final.out. This gives an indication of the quality of the sample analyzed by reflecting the proportion of input reads, the average RNA molecule read length versus the number of unmapped and chimeric reads and mismatch rate.
A data matrix of factors informed the normalization step which was carried out by using the DESeq2 package. DESeq2 adjusts for the variation in expression counts according to variation in the read depth for each gene by using a generalized linear  [64]. Following normalization, DESeq2 performs differential expression analysis to calculates the fold change in expression for each transcript between a control sample set and a sample of interest. DESeq2 calculates a P-value, corrected for multiple sampling, to indicate whether the fold change is statistically significant. Statistical significance was set at the 0.05 level. The gene lists were filtered for base mean expression (absolute expression level) >10 and log fold change >2.

Biological interpretation
The location-based display function in Ensembl was used to find the region detail map for each lncRNA gene and interrogated to find the nearest protein coding gene according to current annotations in the GRCh38 reference [65,66]. There is evidence that most long non-coding transcripts exert their transcriptional influence by acting on protein coding genes in cis, that is to say, by acting upon genes that are located near to them in the genome [67][68][69][70]. RNA databases 'RNAcentral' , 'Rfam' , 'NONCODE' , 'LNCipedia' , 'LNCBook' , and 'lncrnadb' were also interrogated to provide up-to-date information regarding all aspects of lncRNA genes identified. Online bioinformatics tools and databases, including NCBI, Ensembl, OMIM, Clinvar, Genecards, RISE, Diana and PubMed were used to identify functional annotations and pathway interactions for lncRNA transcripts and their cis-located protein coding genes [71][72][73][74][75][76][77][78].

Ethical considerations
Meta-analysis was chosen as the method of study, in part, due to the temporal constraints on ethical review but also because of financial limitations. Specific ethical approval was not required for this study as it was a re-analysis of published patient sequencing data in the public domain. The tissue samples used in the original study by Lin et al, 2019 [61] were collected with informed consent and given approval by the ethical review board of the University of Southern California.

Results
The samples for normal endometrium were of insufficient quality to use as control material for this study. Normal ovarian tissue was therefore used as control material as sequencing read outs were of good quality.
The primary aim of describing the differential gene expression of EAOCs was achieved. The secondary aim of functional characterization was also achieved but required assumption of in cis function of all lncRNA genes to inform interpretation. A total of 35,697 transcripts were differentially expressed in the ovarian endometrioid adenocarcinoma (EnOC) sample set from 4 patients (n=4) and 33,939 transcripts from the ovarian clear cell carcinoma (OCCC) samples (n=5). Both transcript expression lists were filtered by removing all protein coding genes, microRNAs (less than 200 nucleotides in length), processed transcripts, pseudogenes (processed and unprocessed), small nuclear RNA (snRNA) and small nucleolar (snoRNA), mitochondrial RNA and molecules classified as miscellaneous, leaving only transcripts annotated as lncRNA. The total transcript expression list included 333 lncRNAs significantly up-regulated or down-regulated in endometrioid and clear cell adenocarcinoma groups with 88 being present in both the endometrioid and clear cell adenocarcinoma groups. There was differential expression of 117 lncRNA transcripts in the endometrioid group alone and 128 differentially expressed transcripts in the clear cell group (Figure 2).   Table 5.
The Ensembl genome region detail map showed protein coding genes located near to the lncRNA transcripts differentially expressed in our meta-analysis. Of note, the    is located near to CASC9 within the group of differentially expressed transcripts found in both endometrioid and clear cell carcinomas (see central two columns of Table 6). NRF2 (Nuclear-Factor Erythroid2-Related Factor) is located near to LUCAT1 within the data for endometrioid adenocarcinoma transcripts. The protein coding gene HIF-1α is also near to XIST, which was differentially expressed in the clear cell carcinoma group. Examination of KEGG [79] pathways, PathCards [80] and other integrated functional databases [81,82] showed that genes in red (see Table 6) were involved in ferroptosis, an iron-dependent form of programmed cell death [83][84][85][86][87][88][89].

Discussion
Long non-coding RNAs have several modes of function in human cells and these fall into three broad categories; post-mRNA processing, chromatin reprogramming and regulation of protein coding gene transcription and enhancer sites [90]. Most lncRNA transcripts are said regulate transcription factors of nearby protein coding gene and are thus cis-acting [68][69][70]91]. There is a paucity of published literature regarding lncRNA function with mRNA and miRNA interactions for the majority of lncRNA genes listed in the results above (see Table  6). Understanding that many long non-coding RNA molecules function in cis allowed detailed exploration of the genomic sites of origin of these molecules and generation of a list of protein coding genes that represent the most likely targets to be controlled by the lncRNAs identified. Analysis of these genes with long-non-coding RNA species in the Genecards database has identified potential molecular function and highlighted biological pathways in which they function. For example, lncRNA cancer susceptibility 9 (CASC9) is situated next to the hepatocyte nuclear factor 4 gamma (HNF4G) protein coding gene (also known as NR2A2).
OVAAL (ovarian adenocarcinoma amplified lncRNA) expression has been reported as amplified in ovarian highgrade serous carcinoma [92]. In this study, OVAAL showed reduced expression in ovarian clear cell carcinoma (log fold change -6.16, p=0.001). Research suggests that OVAAL behaves as an oncogene by enhancing cell survival through initiation of the RAF/MEK/ERK pathway and avoiding cellular senescence mechanisms [93]. However, down-regulated expression of OVAAL, as identified in this study, appears to contradict this evidence and would suggest a tumor suppressor function in vivo.
Findings by Zou et al., 2015 support the hypothesis that CASC9 interacts with protein coding HNF4G by acting in-cis based on evidence from datamining bioinformatic online databases [94]. HNF4G is one of a subfamily of liver-specific transcription factors important for organ development in utero [95]. HNF4 has also been shown to be overexpressed in Islet of Langerhans cells of the pancreas in young people with maturity onset diabetes [96]. The morphological feature of cytoplasmic clearing in OCCC is due to intracytoplasmic glycogen accumulation and this may be due to alterations in glycogen metabolism as a result of alterations in HNF4G function (Ji et al, 2018). HNF4G has been documented as being involved in OCCC pathogenesis [37]. Also, strong immunohistochemical expression of a different subset of HNF, HNF1beta, has been shown in OCCC but this is not used in routine diagnostic histopathological practice due to a lack of specificity and distinct morphology [97]. Furthermore, the genes for the group of transcription factors comprising HNF4 and HNF1 are found on different chromosomes and activate transcription of different cytochrome p450 enzymes [98].
Cdc42 interacts with breast cancer anti-estrogen resistance protein 3 (BCAR3) which interacts with RP4-561L24.3. Cdc42 codes for a cell membrane protein found in macrophages and is responsible, in part, for coordinated and effective phagocytosis [99]. Reduced expression of cdc42 protein on the surface of macrophages has been shown to differ between tissues with endometriosis and EAOC. Loss of cdc42 expression may play a role in the malignant transformation of endometriosis. Further, cdc42 protein also plays a key role in the MAPK pathway that controls cellular proliferation, antiapoptosis and cellular differentiation.
In addition, BCAR3, RP4-561L24.3 is co-located with GCLM and DNTTIP2 on chromosome 1 [100]. In this meta-analysis, RP4-561L24.3 was expressed at the highest overall level in the combined EAOC group but simultaneously differentially under-expressed in comparison with controls. RP4-561L24.3 may influence ferroptotic pathways in the pathogenesis of endometriosis-associated carcinoma via its interaction with GCLM [101]. Ferroptosis is a form of programmed cell death resulting from a combination of iron and lipid peroxidation in mitochondria with a toxic accumulation of reactive oxygen species [102]. Resistance to ferroptosis is thought to be critical development in the pathogenesis of endometriosis [103]. This makes biological sense in a context of endometriosis where the local tissue environment will contain increased amounts of iron-rich hemosiderin as a consequence of menstrual bleeding [103]. However, the role of ferroptosis in the development of endometriosis-associated malignancy has yet to be described in full [89]. A recent report showed an increased sensitivity of EAOCs to Erastin therapy, which targets the ferroptosis pathway, leads to increased rates of cell death [89]. If resistance to ferroptosis is an important mechanism in the development of EAOC, one could hypothesize that gene expression studies will show increased expression of tumor suppressor genes in normal cells with reduced differential expression in malignant cells [104,105]. This suggests that RP4-561L24.3 functions as a tumor suppressor as it is the most down regulated lncRNA Further evidence of a role for ferroptosis in the development of EAOC comes from the finding that LUCAT1 controls NRF2 expression via miRNA-495 [106]. NRF2 (also known as NFE2L2) has been shown to have a critical role in ferroptosis [101]. NRF2 (Nuclear-Factor Erythroid2-Related Factor) is a transcription factor that regulates genes with promotors containing antioxidant response elements [85]. Genes regulated by NRF2 are upregulated in response to oxygen free radicals released in a context of inflammation and injury as seen in ovarian tissues affected by endometriosis [107].
It is also interesting to note that cells with high levels of nrf2 protein activity are enriched for gamma-glutamyl peptides which are produced as a result of GCLC/GCLM activity [86]. Kang et al. suggest that GCLM may be regulated in cis by RP4-561L24. 3. It would make sense that reduced differential expression of genes known to act in the ferroptosis pathway play a key role in EAOC pathogenesis as it is known that menstrual cycling in endometriotic lesions results in generation of excessive irons and free radicals [89].
SLC2A1 is a member of the family of solute carriers. The antisense lncRNA molecule SLC2A1-AS1 is expressed at a high level in our EAOC data. This regulates transcription of SLC2A1 on the sense strand of DNA [108]. Transcription of SLC2A1 is induced under hypoxic conditions and influenced by increasing levels of hypoxia inducible factor 1 alpha (HIF-1alpha) [109].
SOX4 is said to be involved in the pathway regulating ferroptosis [110,111]. This may be regulated in cis by CASC15 due to co-location with SOX4 in the human genome [100]. HMG-box 4) is a transcription factor that has been implicated in carcinogenesis as high expression of SOX4 is thought to contribute to dedifferentiation, cell survival and epithelial-mesenchymal transition [112]. On the basis of its molecular interactions, CASC15 may have an important role to play in development of EAOC. Researchers suggest that CASC15 functions as a tumor suppressor gene [113] in which case one would expect to see reduced levels of expression compared with controls in this study. This is not the case, however, as CASC15 was shown to be expressed at a high level and was not differentially expressed. This casts some doubt on the role of CASC15 in the development of EAOC and requires further experimental investigation.

SOX4 (SRY-related
Mir-99a-Let7c Cluster Host Gene (MIR99AHG), otherwise known as LINC00478, resides near to protein coding gene Ubiquitin Specific Peptidase 25 (USP25). USP25 is a peptidase enzyme responsible for cleavage of ubiquitin from proteins destined for cellular degradation by proteolysis in the proteosome [114]. It thereby prevents target proteins from breakdown and has been shown to interact with components of the MAPK pathway [115]. USP25 also inhibits induction of ferroptosis in malignancy [116].

Strengths and Limitations of This Study
Limitations of this study include not having access to the source tissue to confirm the disease entities listed in the patient clinical information table were histologically accurate. There is concern regarding the specimen classified as FIGO grade 3 endometrioid adenocarcinoma of the ovary as this is a difficult histological diagnosis to make. The morphology of grade 3 endometrioid adenocarcinoma is often solid, poorly differentiated and similar to high-grade serous carcinoma. The distinction relies upon the use of a panel of immunohistochemistry markers which were not described in original publication [37].
Another limitation of the study was the use of normal ovarian tissue as a source of control RNA sequence for comparison with the malignant tissues. It would have been more accurate to use ovarian endometriosis samples given that this is said to be the cell of origin for most EAOCs [6] but these were of insufficient quality for use.
The online databases used to study the pathways are at an early stage in evolution with gaps in knowledge content. In many cases there is no known functional information or interactions regarding the lncRNA species identified in this meta-analysis. Furthermore, many of the assertions made are based on in silico prediction rather than experimental data. This indicates a need for further work by, for example, performing knockdown experiments in organoid models of endometriosis and EAOCs to help classify function and contribute to the knowledge base. A major assumption of the interpretation of our data rests on a cis-rather than transregulatory function of the lncRNA molecules identified in this work [69].

Conclusions
LncRNA genes RP4-561L24.3, CASC15, CASC9, SLC2A-AS1, LUCAT1, XIST and MIR99AHG are candidate biomarkers for further exploration in the pathogenesis of EAOC. Some of these lncRNA molecules may have an influence on the ferroptosis pathway by cis-acting regulation of transcription factors of nearby protein coding genes [69]. In vitro experiments are required to provide evidence in respect of this hypothesis. Numerous online databases have incomplete information relating to lncRNA function and interactions that makes further experimentation necessary.
Ferroptosis is a form of programmed cell death that is induced in response to cellular stresses involving iron and lipid metabolism in mitochondria and might play a part in the pathogenesis of EAOC. LncRNA molecules identified in this meta-analysis may inform clinical diagnostic pathways by development of non-invasive methods of detection for early diagnosis, risk stratification of endometriosis and more effective post-surgical patient monitoring for recurrence.

Conflicts of Interest
The authors declare no conflicts of interest.