A deep-learning method creates a comprehensive reference atlas of CD8+ T cells, integrating single-cell RNA sequencing data and T cell receptor diversity to enhance understanding of immune responses across diseases
In a recent study published in Nature Methods, researchers developed "scAtlasVAE," a deep-learning model to integrate large-scale single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) data, creating a comprehensive CD8+ T cell atlas with paired T cell receptor (TCR) information.
They found associations between CD8+ T cell subtypes, characterized three distinct exhausted T cell subtypes, and revealed diverse clonal and transcriptomic patterns in autoimmune and immune-related inflammation.
Background
Emerging research highlights the involvement of CD8+ T cells in autoimmune diseases, contributing to both disease progression and protective mechanisms. Advances in single-cell immune profiling now allow simultaneous analysis of CD8+ T cell transcriptomes and TCR repertoires, revealing cellular heterogeneity, clonal dynamics, and functional transitions. However, our understanding of CD8+ T cell subtypes and their clonal landscapes across diverse conditions remains unclear.
Variational autoencoder (VAE)-based methods excel at integrating large datasets but struggle with cross-atlas comparisons and semi-supervised training across diverse annotation criteria. In the present study, researchers developed a VAE-based deep-learning model for integrating and aligning scRNA-seq datasets across studies, enabling cell subtype annotation transfer.
About the study
A CD8+ T cell atlas was created using data from 68 studies, 961 single-cell immune profiling samples, and over 1.1 million cells across 42 conditions. The data were processed and quality-controlled, filtering out cells with insufficient gene expression or TCR information.
To integrate and analyze this dataset, scAtlasVAE, a deep learning framework, was developed. It employs a batch-unconditional encoder and batch-conditional decoder to correct batch effects and reconstruct gene expression data using a zero-inflated negative binomial distribution. scAtlasVAE supports both unsupervised and supervised modes, enabling tasks like atlas integration, cell subtype annotation, and transfer learning.
scAtlasVAE was benchmarked against existing methods. Clonotype analysis was performed to identify distinct T cell subtypes with unique TCR repertoires, as well as shared clonotypes across different conditions. The model was further validated for its ability to predict CD8+ T cell subtypes in query datasets.
Further, the study used various bioinformatics tools for gene expression analysis and regulatory network inference to explore the functional roles of CD8+ T cells in different diseases.
Results and discussion
scAtlasVAE demonstrated superior performance across multiple benchmarking tasks, including single-atlas integration, cross-atlas integration, and cell subtype annotation transfer. In comparisons with existing methods like scVI, scANVI, scPoli, SCALEX, Scanorama, Harmony, and Seurat, scAtlasVAE showed enhanced batch effect correction and preservation of biological variance.
Benchmarking on two established atlases, TCellLandscape and TCellMap, confirmed its effectiveness in both zero-shot and full-shot transfer learning modes. Unlike other methods, scAtlasVAE incorporates independent predictors for distinct cell subtypes, enabling better alignment of annotations across datasets.
Using this framework, CD8+ T cells were clustered into 18 subtypes grouped into eight major categories, including naive T cells, central/effector memory T cells, recently activated effector T cells, mucosal-associated invariant T (MAIT) cells, innate-like T cells with high cytotoxic potential (ILTCK), tissue-resident memory T cells, exhausted T cells (Tex), and proliferating cells. Cross-atlas integration validated these subtypes and revealed novel populations like ILTCK-like cells (ILTCK-LCs). Paired TCR analysis showed reduced diversity and increased clonal expansion in disease states and tumors. Among exhausted T cells, three subtypes (GZMK+, ITGAE+, XBP1+) showed distinct transcriptomic profiles and tumor-specific enrichment, reflecting diverse roles in cancer immunity.
Clonotype sharing revealed that GZMK+ Tex cells shared clonotypes with both tissue-resident and circulating subtypes, while ITGAE+ and XBP1+ Tex cells had more restricted sharing patterns. The high-resolution analysis identified Tex subtypes, including ISG+, DUSP1+, TCF7+, and TNFRSF9+ cells. In CPI-induced inflammation (irAE), ITGAE+ Tex cells were predominant, showing differences from autoimmune inflammation. ITGAE+ Tex cells displayed enhanced cytotoxic pathways.
The scAtlasVAE model successfully annotated external datasets, confirming Tex subtype alignments across atlases and highlighting similarities between irAE inflammation and cancer TILs. Clonotype sharing between ILTCK-LCs and MAIT cells suggested functional diversity.
The study's limitations include reliance on user-defined annotations for rare subtypes, restricted application to CD8+ T cells, and the need for experimental validation of newly defined subtypes.
Conclusion
In conclusion, the TCR-integrated atlas and scAtlasVAE developed in this study offer a valuable resource for studying CD8+ T cell heterogeneity and dynamics. They enable cross-dataset integration and enhance insights into diverse biological systems.