In a recent study published in Nature, a group of researchers provided a comprehensive genomic characterization of colorectal carcinoma (CRC), a type of cancer that starts in the colon or rectum, through whole-genome sequencing (WGS) of 2,023 samples, identifying novel driver genes, molecular subgroups, and potential clinical implications.
Background
CRC is the third most common cancer globally. Previous CRC sequencing efforts were limited in scope, focusing on a few hundred cases and primarily using whole-exome or gene panel sequencing, leaving the full range of genomic alterations and clinical associations unclear. Further research is needed to explore the functional significance of newly identified driver mutations and to develop targeted therapies for diverse CRC subgroups.
About the study
Sample collection for the present study followed a detailed protocol, beginning with ethics approval granted by the Health Research Authority (HRA) Committee East of England–Cambridge South research ethics committee. The 100,000 Genomes Project (100kGP) cancer program, a high-throughput tumor sequencing initiative for National Health Service (NHS) cancer patients, facilitated the collection of samples across thirteen Genomic Medicine Centres (GMCs) established by the NHS and 100kGP.
Specialist nurses and other staff identified patients scheduled for CRC resections, and all participants provided written informed consent. Blood samples were taken, and tumor samples were evaluated in histopathology cut-ups, with associated clinicopathological data retrieved from health records.
Frozen tumor sub-samples underwent assessment for purity and other histological characteristics after which blood and tumor samples passing quality control were sent to regional genetics laboratories for Deoxyribonucleic Acid (DNA) extraction. Extracted DNA was transferred to the 100kGP central national biorepository, where Illumina performed WGS of paired tumor-constitutional DNA.
Processed Binary Alignment/Map (BAM) files were then transferred to Genomics England for quality checks, additional processing, and data storage. All sequencing and clinicopathological data were subsequently transferred to the Colorectal Cancer Domain (GECIP) for further quality control and data analysis, ensuring the integrity and thoroughness of the genomic data utilized in this study.
Study results
CRCs were classified into three established subtypes such as DNA polymerase ε proofreading-deficient (POL), microsatellite instability-positive (MSI) (mismatch repair deficient), and microsatellite-stable (MSS). Among the 2,023 samples analyzed, 18 were POL, 364 were MSI, and 1,641 were MSS, with nearly all metastasis samples falling into the MSS category. While MSI and POL cancers exhibited near-diploid genomes, MSS cancers displayed highly variable ploidy.
The mutational signature activities of single-base substitutions (SBS), doublet-base substitutions (DBS), and small insertions–deletions (indels) were generally consistent with existing research, though some novel patterns emerged. Notably, SBS93, a signature associated with oesophageal and gastric cancers, was found in approximately 40% of MSS primary tumors but was almost absent in MSI cases.
Driver gene identification was conducted separately for MSI, MSS primary, POL, and MSS metastasis CRCs, leading to the discovery of 193 putative CRC driver genes. Among these, 89 were identified in MSS primary, 49 in POL, 96 in MSI, and 39 in MSS metastasis groups. A total of 57 drivers were found across multiple subtypes, while the remaining 136 were subtype-specific. Several newly identified candidate driver genes were previously unreported in cancer, including those involved in Ribonucleic Acid (RNA) regulation and transcriptional control.
The study also highlighted new roles for minor Rat Sarcoma (RAS) ( a family of related proteins involved in transmitting signals within cells) and Mitogen-Activated Protein (MAP) kinase pathway genes, which appear to function as modifiers of major RAS drivers rather than as substitutes.
MSS tumors typically harbored four pathogenic driver mutations, compared to 23 in primary MSI and 30 in POL tumors. The study identified 30 shared driver genes between MSS and MSI cancers, emphasizing common pathways like Wingless-related Integration Site (WNT), RAS–Rapidly Accelerated Fibrosarcoma (RAF) (a family of serine/threonine-specific protein kinases)–Mitogen-Activated Protein Kinase (MEK)– Extracellular Signal-Regulated Kinase (ERK), Phosphoinositide 3-Kinase (PI3K), and Transforming Growth Factor Beta (TGFβ)– Bone Morphogenetic Protein (BMP). However, distinct functional differences were observed between MSS and MSI tumors, particularly in immune escape mechanisms and the tolerance for multiple or non-canonical changes in driver pathways.
The identification of driver mutations remains a challenge, especially in hypermutated cancers and low-quality samples. This study replicated only 7% of nearly 1,000 previously reported CRC drivers. Structural variants (SVs) and copy number alterations (CNAs) were also analyzed, revealing nine SV signatures across the cohort, including previously unreported unbalanced inversions and translocations.
The study found 45 non-fragile SV hotspots in MSS primary tumors and three in MSI tumors, identifying several candidate driver changes and recurrent SV hotspots. Moreover, extrachromosomal DNA (ecDNA) was more prevalent in MSS primary tumors, with a modest role in oncogene amplification compared to other cancer types.
Conclusions
To summarize, this study provides a comprehensive analysis of the genomic landscape of CRC, identifying numerous novel driver mutations, SVs, and CNAs and highlighting the distinct molecular characteristics of MSI, MSS, and POL subtypes. The findings offer valuable insights into the complex biology of CRC and potential avenues for targeted therapies.