Inference of Clonal Copy Number Alterations from RNA-Sequencing Data

66 Tissues are composed of various types of interacting cells [1]. To understand the cellular organization and function in tissues, it is necessary to identify all of the different cell types and the locations of these different cell types within tissue structures. The transformative advances in experimental and computational methods will help us to build the complex map of the tissues and study how tissue organization influences the cell’s molecular state and interactions in healthy and diseased tissue [1-6].


66
Tissues are composed of various types of interacting cells [1]. To understand the cellular organization and function in tissues, it is necessary to identify all of the different cell types and the locations of these different cell types within tissue structures. The transformative advances in experimental and computational methods will help us to build the complex map of the tissues and study how tissue organization influences the cell's molecular state and interactions in healthy and diseased tissue [1][2][3][4][5][6].
Over the past few years, the development and application of single-cell sequencing methods have revolutionized the entire field of biology thus enabling us to study the cellular heterogeneity of the cancer tissues [7]. Understanding the clonal architecture of the tumor and the interplay between the malignant and non-malignant cells within the tumor ecosystem provides significant insights into the tumor recurrence, treatment, initiation, progression, and metastasis [8][9][10]. For example, the ratios of specific immune cell types in the tumor predict overall survival and response to different immunotherapies [11]. Researchers used single-cell RNA sequencing (scRNA-seq) to examine heterogeneities in malignant and nonmalignant cell types and states in various cancer types such as melanoma and glioma [7,12,13]. With the advent of scRNA-Seq, it is now possible to identify tumor-infiltrating immune cell types and tumor-associated malignant/non-malignant cell types such as endothelial cells (ECs) and cancer-associated fibroblasts (CAFs) in the tumor and identify transcriptional alterations within these cells groups [7,[12][13][14][15][16][17]. Single-cell DNA sequencing is another new approach for elucidating the genomic diversity of tumor clonal architecture [2,10]. However, we still do not have the technological capabilities to simultaneously probe the genome and transcriptome at the single cell level and a large scale.
Many algorithms have been developed for detecting copy number variations (CNV) events from DNA sequencing data using depth of coverage analysis [18,19]. These tools rely on uniform coverage of genome by DNA-sequencing reads. However, statistical approaches for CNV detection using RNA-sequencing data is very limited since it is very hard to discriminate between differential expression and an underlying copy number variation using only RNA-Seq data. Another challenge is that RNA-Seq signal is generally concentrated on the exonic regions and most of the genome is not covered. Thus, the identified CNVs will reflect the copy number states of the genes and the copy number of intergenic regions may not be represented well. Regardless, the copy number of genes is extremely useful information for characterizing CNV architecture of, for example, the copy number of oncogenes and tumor suppressor genes. It is worth noting that this issue is similar to the whole exome-sequencing based CNV detection because the whole exome-sequencing covers only the targeted exonic regions in the genome.
Although many tools identify CNVs from exome sequencing data, there is a lack of methods for detecting CNVs solely from RNA sequencing data [7,20]. We developed one of the first methods that identifies, visualizes and integrates CNV events using scRNA-Seq data [21]. In addition to CNVs, we also need to estimate SNPs and indels from scRNA-Seq data for understanding the clonal architecture of the tumor. Identification of SNPs and indels from scRNA-Seq data, however, is also very challenging because of allelic dropouts, non-uniform and low coverage. This makes it very hard to distinguish between real variations and technical artifacts.
Inference of CNV, SNP, and indel from RNA-Seq data is essential for understanding the correlation between the genomic and the transcriptomic properties of different cell types and clones within the tumor ecosystem [22]. These correlations will provide significant insight into tumor initiation, progression, and metastasis. Since, it remains technically challenging to assay both the genome and transcriptome from the same cell, until now gene signatures of different tumor clones have not been studied very well. There is an increasing need for developing CNV, SNP and indel inference algorithms from scRNA-Seq data especially with the growing number of scRNA-Seq studies.
Another important motivation in calling CNV, SNP, and indels from scRNA-Seq datasets is the capability of detecting low-allele frequency variants. Somatic variants in even a small minority of cells can have large phenotypic effects. Detecting somatic mutations with a low allele frequency is especially important in cancer. It is very challenging to detect low allele frequency mutations using bulk DNA sequencing. Even though several methods have been developed to detect low allele frequency mutations such as MuTect and Strelka from bulk DNA sequencing [23,24], these methods still cannot reliably detect mutations with lower than 0.1 allele frequency rate for the samples with an average sequencing depth. Unfortunately, currently, there is a lack of cancer single-cell DNA sequencing data but instead, there is a growing number of cancer scRNA-Seq data. Therefore, there is an increasing need for developing CNV, SNP and indel inference algorithm from scRNA-Seq data.
In addition to the above points, the analysis framework that CaSpER utilizes can be extended to other functional genomics datasets such as ChIP-Sequencing, epigenomics, and spatial transcriptomics datasets, which are currently not performed as often as RNA-sequencing. Spatial approaches aid us to build spatially resolved gene expression patterns of different cell types within a tissue. The inference of CNV events from spatial transcriptomics datasets provides us the clonal architecture of the tumor together with the spatial information.
Single-cell RNA-seq is powerful in detecting different cell types or states but requires tissue dissociation, thus losing the information about the original location of the cells. On the other hand, spatial approaches provide spatial information but measures expression levels of only a small number of transcripts (in situ hybridization) or lack singlecell resolution (spatial transcriptomics assays) [3][4][5][6]25]. We can infer the spatial location of cells by integrating scRNA-Seq with spatial transcriptomics. In the future, we believe that there will be advances in approaches that integrate spatially transcriptomics data with scRNA-Seq data. These approaches will enable us to generate the complex map of the tumor revealing the interactions of different cell types within the tumor ecosystem ( Figure 1). Figure 1: Approaches that integrate spatially transcriptomics data with scRNA-Seq data.