ArcMark: Distortion-Free Multi-Byte LLM Watermark via Optimal Transport
Researchers have developed ArcMark, a new watermarking method for large language models (LLMs) that can embed multiple bytes of information into generated text without distorting token predictions. Existing watermarks typically encode a single bit per token, limiting their capacity. ArcMark, based on coding and information-theoretic principles, can reliably embed data such as user IDs, model versions, or even the prompt itself, dramatically expanding potential applications for responsible LLM use. The approach is presented in a paper on arXiv (2602.07235) and promises distortion-free multi-byte watermarking.
Key facts
- ArcMark is a new multi-byte LLM watermarking method.
- It embeds information without perturbing average next-token predictions.
- Existing watermarks typically encode a single bit per token.
- ArcMark can embed user IDs, model versions, or prompts.
- The method is based on coding and information-theoretic principles.
- The paper is available on arXiv with ID 2602.07235.
- It aims to promote responsible use of large language models.
- The approach is described as distortion-free.
Entities
Institutions
- arXiv