CoRe-Gen: Robust Spectrum-to-Structure Generation under Imperfect Fingerprint Conditions
CoRe-Gen, an innovative technique, tackles the difficulty of determining molecular structures from tandem mass spectra (MS/MS) for de novo generation that exceeds database limitations. This method breaks down the process into two stages: predicting fingerprints from spectra and then decoding structures from those fingerprints, utilizing extensive molecular datasets. However, its implementation depends on predicted fingerprints instead of ideal ones, leading to structured errors that can affect generation. To enhance the intermediate conditions, CoRe-Gen employs synthetic-spectrum pretraining for the encoder, addresses deployment-related noise with frequency-aware fingerprint corruption during decoder training, and reduces remaining errors through structure-aware strategies.
Key facts
- CoRe-Gen is a method for molecular structure elucidation from tandem mass spectra.
- It addresses de novo generation beyond database coverage.
- The approach decomposes the task into spectrum-to-fingerprint prediction and fingerprint-to-structure decoding.
- Deployment relies on predicted fingerprints, not oracle ones, causing structured errors.
- CoRe-Gen uses synthetic-spectrum pretraining of the encoder.
- It employs frequency-aware fingerprint corruption during decoder training.
- The method mitigates residual errors using structure-aware techniques.
- The paper is available on arXiv with ID 2605.12980.
Entities
Institutions
- arXiv