What is fusion gene? What is Chimeric RNAs?
A fusion gene is a chimeric gene formed by the fusion of partial sequences of two genes, usually due to chromosomal translocations or deletions. These chimeric genes can form abnormal transcripts or proteins in subsequent biological processes, which can lead to or promote the development of tumors.
For example, in chronic granulocytic leukemia, molecular biology is characterized by the detection of a BCR-ABL fusion gene; this fusion gene translates a fusion protein with strong tyrosine kinase activity, which leads to excessive cell proliferation, inhibition of apoptosis and consequent development of various pathologies.
The first found of gene fusions date back to the 1960s, when Hungerford and Nowell described their initial observation that two patients with chronic granulocytic leukemia (CML) had a characteristic small chromosome, named the “Philadelphia chromosome”.
A “chimeric RNA” is any transcript that consists of exons of different parental genes. The fusion transcripts are not necessarily all derived from fusion genes. In addition to transcripts of fusion genes, chimeric RNAs can also originate from the trans-splicing of two independent precursor mRNAs and the variable splicing of two adjacent genes.
Why it is important to identify fusion transcripts in cancer research?
Chromosomal abnormalities occur frequently in human tumors. Chromosomal translocations and gene fusions were originally identified in hematological malignancies. Disease subtypes can be defined by detecting the type of chromosomal abnormality. In cancer, however, certain recurrent gene fusions are used as diagnostic markers for cancer and have been targeted for treatment with substantial clinical success. The development and widespread use of sequencing technologies has accelerated the identification and detection of genetic aberrations.
Indeed, accurate detection of fusion genes or transcripts is important for the prevention, treatment and overall understanding of such oncological diseases.
Methods of Fusion Characterization
Traditional methods of cell biology analysis
There are many experimental and computational methods to detect fusion transcripts. Prior to next-generation sequencing (NGS), fusion identification in hematological malignancies relied on traditional cytogenetic karyotyping to detect relatively large chromosomal rearrangements. Examples include fluorescence in situ hybridization (FISH), spectral karyotyping (SKY), multicolor FISH (M-FISH), comparative genomic hybridization (CGH), which has identified more rearrangements and high-density array comparative genomic hybridization (a-CGH).
However, traditional cytogenetic and non-cytogenetic methods are based on predefined fusion targets. As such, they are limited by the need for a priori knowledge and are not suitable for large-scale ab initio gene fusion discovery. In contrast, sequencing-based methods such as whole genome sequencing (WGS) and RNA sequencing are widely used to identify previously unidentified gene fusions.
High throughput de novo gene fusion discovery has become a reality with the development of NGS technology, which can analyze the entire genome and transcriptome to exhaustively identify copy number alterations, somatic point mutations, structural rearrangements and gene expression alterations. Large sample throughput and deep sequencing platforms are now widely used to characterize cancer genomes, and the throughput levels of NGS are unmatched by FISH. The Cancer Genome Atlas (TCGA) describes that DNA and RNA sequence aberrations in at least 25 different cancer types can be identified at the genome-wide level using NGS. The use of targeted RNA sequencing increases the sensitivity of fusion detection and provides a more comprehensive characterization of the tumor transcriptome.
NGS not only provides a large amount of data information at once, allowing the discovery of new transcripts, but also expands the potential to predict fusion loci. This can be combined with phenotypic data to identify fusions and other somatic sequence variations. At the same time, the combination of phenotypic data can help us to identify changes in the cancer genome that are functionally relevant. Several bioinformatics methods have been developed to detect fusion transcripts from RNA-Seq data, such as ChimeraScan, SnowShoes-FTD, TopHat-Fusion, FusionMap and FusionSeq.
However, the greatest computational challenge in identifying fusion transcripts is the abnormal frequency of false positives, which is caused by the direct application of short-read mappers. PacBio SMRT and Nanopore sequencing technologies, which provide direct access to complete full-length transcripts without splicing, result in higher-quality transcripts and facilitate the study of mRNA structure, such as alternative splicing, fusion genes, allelic expression, etc.