scpFormer: Foundation Model for Single-Cell Proteomics Integration
Researchers have unveiled scpFormer, a transformer-based foundational model tailored for single-cell proteomics, which has been pre-trained on more than 390 million cells. This innovative model replaces traditional index-based tokenization with a continuous, sequence-anchored method, merging Evolutionary Scale Modeling (ESM) and value-aware expression embeddings to align variable antibody panels within a unified semantic space, avoiding artificial discretization. It produces global cell representations that excel in large-scale batch integration and unsupervised clustering. Additionally, its open-vocabulary design facilitates in silico panel expansion, which supports the reconstruction of biological manifolds in sparse clinical datasets. The study also delves into the learned logic of protein co-expression, addressing the challenges posed by fragmented targeted antibody panels in single-cell proteomic data integration.
Key facts
- scpFormer is a transformer-based foundation model for single-cell proteomics.
- Pre-trained on over 390 million cells.
- Uses continuous, sequence-anchored tokenization instead of index-based.
- Combines Evolutionary Scale Modeling (ESM) with value-aware expression embeddings.
- Maps variable antibody panels into a shared semantic space without artificial discretization.
- Generates global cell representations for batch integration and clustering.
- Open-vocabulary architecture facilitates in silico panel expansion.
- Aids reconstruction of biological manifolds in sparse clinical datasets.
Entities
—