PASA Watermarking Algorithm for LLM-Generated Text Under Semantic Attacks

ai-technology · 2026-05-13

Researchers have introduced a novel watermarking technique known as PASA, designed to identify text generated by LLMs while being resilient against semantic-invariant attacks such as paraphrasing. This algorithm, detailed in a publication on arXiv (2605.10977), functions at the semantic level by utilizing clusters in latent embedding space and a distributional relationship between token sequences and auxiliary sequences, coordinated through a secret key and semantic history. The methodology is founded on a theoretical model that defines an ideal embedding-detection combination, ensuring a balance between detection precision, robustness, and distortion. Evaluations across various LLMs and attack scenarios demonstrate that PASA maintains its strength even against aggressive paraphrasing, addressing a significant weakness in current watermarking techniques for responsible AI use.

Key facts

PASA is a watermarking algorithm for LLM-generated text.
It is robust against semantic-invariant attacks like paraphrasing.
PASA operates on semantic clusters in a latent embedding space.
It uses shared randomness synchronized by a secret key and semantic history.
The algorithm achieves fundamental trade-offs among detection accuracy, robustness, and distortion.
Evaluations were conducted across multiple LLMs and semantic-invariant attacks.
PASA remains robust even under strong paraphrasing.
The paper is available on arXiv with ID 2605.10977.

PASA Watermarking Algorithm for LLM-Generated Text Under Semantic Attacks

Key facts

Entities

Institutions

Sources