ARTFEED — Contemporary Art Intelligence

SLAM: Structural Linguistic Activation Marking for LLM Watermarks

ai-technology · 2026-05-09

A novel watermarking technique for large language models, named SLAM (Structural Linguistic Activation Marking), has been released on arXiv. Unlike traditional approaches that alter token distributions, SLAM integrates watermarks into the structural linguistic geometry by utilizing sparse autoencoders to navigate and adjust the residual-stream directions that encode attributes such as voice, tense, and clause order, thereby preserving lexical sampling and semantics. When evaluated on Gemma-2 2B and 9B, SLAM demonstrated a detection accuracy of 100% with a minimal quality cost of just 1-2 reward points, in contrast to 7.5-11.5 for KGW, EWD, and Unigram. While maintaining naturalness and diversity similar to unwatermarked models, SLAM exhibits resilience against word-level modifications but shows susceptibility to other types of attacks. The research can be found at arXiv:2605.05443.

Key facts

  • SLAM stands for Structural Linguistic Activation Marking
  • It is a white-box watermarking scheme for LLMs
  • Uses sparse autoencoders to identify residual-stream directions encoding linguistic structure
  • Steers those directions at generation time without constraining lexical sampling or semantics
  • Tested on Gemma-2 2B and 9B models
  • Achieves 100% detection accuracy
  • Quality cost of 1-2 reward points vs 7.5-11.5 for KGW, EWD, and Unigram
  • Resists word-level edits but has complementary robustness profile

Entities

Institutions

  • arXiv

Sources