ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport

ai-technology · 2026-05-25

Researchers have developed ArcMark, a new watermarking method for large language models (LLMs) that can embed multiple bytes of information into generated text without distorting token predictions. Existing watermarks typically encode a single bit per token, limiting their capacity. ArcMark, based on coding and information-theoretic principles, can reliably embed data such as user IDs, model versions, or even the prompt itself, dramatically expanding potential applications for responsible LLM use. The approach is presented in a paper on arXiv (2602.07235) and promises distortion-free multi-byte watermarking.

Key facts

ArcMark is a new multi-byte LLM watermarking method.
It embeds information without perturbing average next-token predictions.
Existing watermarks typically encode a single bit per token.
ArcMark can embed user IDs, model versions, or prompts.
The method is based on coding and information-theoretic principles.
The paper is available on arXiv with ID 2602.07235.
It aims to promote responsible use of large language models.
The approach is described as distortion-free.

ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport

Key facts

Entities

Institutions

Sources