ARTFEED — Contemporary Art Intelligence

PASA Watermarking Algorithm for LLM-Generated Text Under Semantic Attacks

ai-technology · 2026-05-13

Researchers have introduced a novel watermarking technique known as PASA, designed to identify text generated by LLMs while being resilient against semantic-invariant attacks such as paraphrasing. This algorithm, detailed in a publication on arXiv (2605.10977), functions at the semantic level by utilizing clusters in latent embedding space and a distributional relationship between token sequences and auxiliary sequences, coordinated through a secret key and semantic history. The methodology is founded on a theoretical model that defines an ideal embedding-detection combination, ensuring a balance between detection precision, robustness, and distortion. Evaluations across various LLMs and attack scenarios demonstrate that PASA maintains its strength even against aggressive paraphrasing, addressing a significant weakness in current watermarking techniques for responsible AI use.

Key facts

  • PASA is a watermarking algorithm for LLM-generated text.
  • It is robust against semantic-invariant attacks like paraphrasing.
  • PASA operates on semantic clusters in a latent embedding space.
  • It uses shared randomness synchronized by a secret key and semantic history.
  • The algorithm achieves fundamental trade-offs among detection accuracy, robustness, and distortion.
  • Evaluations were conducted across multiple LLMs and semantic-invariant attacks.
  • PASA remains robust even under strong paraphrasing.
  • The paper is available on arXiv with ID 2605.10977.

Entities

Institutions

  • arXiv

Sources