Transformer-Based Integrative Patient Representations from Single-Cell RNA Data

Abstract

Single-cell RNA sequencing (scRNA-Seq) is a powerful tool to explore cellular heterogeneity in healthy and diseased states, yet its translation into clinical insights has been limited. To bridge the gap between detailed cellular analysis and broader patient-level representations usable for phenotyping, we introduce a novel transformer-based architecture capable of embedding single-cell data into meaningful patient-level embeddings. This approach utilizes a self-supervised learning phase to construct integrative patient representations, which are then refined using contrastive learning techniques. On a dataset covering 7 million cells across 1223 individuals with diverse disease states, we show that learned embeddings are meaningful representations for a variety of downstream analytical tasks. Here, our approach proves robust against unbalanced datasets and shows indications of learning similarities between related diseases, such as COVID-19 and flu.

Type
Publication
Learning Meaningful Representations of Life Workshop (LMRL ‘25), April 28, 2025, Singapore, Singapore
Johannes Lohmöller
Johannes Lohmöller
Researcher of Computer Science

My research interests include privacy-preserving methods for confidential computing.