TriProRep: Structure-Aware Pretraining for Protein Structure Prediction

other · 2026-05-23

A recent investigation has unveiled TriProRep, a pretraining strategy that is aware of structural nuances and simultaneously examines three interconnected residue-level perspectives: amino-acid identity, backbone geometry, and local full-atom geometry, which are encoded through VQ-VAE tokenizers. By training to recover original tokens from views corrupted by the generator, TriProRep effectively learns to differentiate between plausible yet incorrect cross-view augmentations and the authentic protein. The researchers also present RepSP, a benchmark designed for assessing protein representations in structure-predictive contexts, evaluating three applications: homodimer co-folding from apo-chain representations, predicting residue-level interaction properties of homodimers, and additional structure-predictive tasks. This study, available on arXiv (2605.22133v1), indicates that pretrained representations enhance structure prediction beyond traditional functional annotations.

Key facts

TriProRep is a structure-aware pretraining method for protein representation learning.
It models three aligned residue-level views: amino-acid identity, backbone geometry, and local full-atom geometry.
Views are discretely encoded via VQ-VAE tokenizers.
Pretraining recovers original tokens from generator-corrupted views.
RepSP is a benchmark for evaluating protein representations in structure-predictive settings.
RepSP tests homodimer co-folding from apo-chain representations.
RepSP tests residue-level prediction of homodimer-derived interaction properties.
The study is published on arXiv with ID 2605.22133v1.

TriProRep: Structure-Aware Pretraining for Protein Structure Prediction

Key facts

Entities

Institutions

Sources