PASA Watermarking Algorithm for LLM-Generated Text Under Semantic Attacks
Researchers have introduced a novel watermarking technique known as PASA, designed to identify text generated by LLMs while being resilient against semantic-invariant attacks such as paraphrasing. This algorithm, detailed in a publication on arXiv (2605.10977), functions at the semantic level by utilizing clusters in latent embedding space and a distributional relationship between token sequences and auxiliary sequences, coordinated through a secret key and semantic history. The methodology is founded on a theoretical model that defines an ideal embedding-detection combination, ensuring a balance between detection precision, robustness, and distortion. Evaluations across various LLMs and attack scenarios demonstrate that PASA maintains its strength even against aggressive paraphrasing, addressing a significant weakness in current watermarking techniques for responsible AI use.
Key facts
- PASA is a watermarking algorithm for LLM-generated text.
- It is robust against semantic-invariant attacks like paraphrasing.
- PASA operates on semantic clusters in a latent embedding space.
- It uses shared randomness synchronized by a secret key and semantic history.
- The algorithm achieves fundamental trade-offs among detection accuracy, robustness, and distortion.
- Evaluations were conducted across multiple LLMs and semantic-invariant attacks.
- PASA remains robust even under strong paraphrasing.
- The paper is available on arXiv with ID 2605.10977.
Entities
Institutions
- arXiv