ARTFEED — Contemporary Art Intelligence

SemanticZip: Lossy Text Compression via LLM Decompression

ai-technology · 2026-05-26

A new framework called SemanticZip proposes lossy text compression where an LLM decompresses compact codes into task-relevant meaning, rather than exact byte reconstruction. The pilot study formalizes LLM-mediated decompression with a protected/lossy packet architecture and evaluates six representation regimes—structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji—over five author-constructed diagnostic cases. The approach treats model-based decompression as part of the codec, assessing recovery of semantic commitments rather than exact text. No benchmark claims are made; the paper serves as a proof-of-concept.

Key facts

  • SemanticZip is a lossy text compression framework using LLMs as semantic decompressors.
  • It does not require byte-identical reconstruction, unlike lossless compression.
  • The framework defines a protected/lossy packet architecture.
  • Six representation regimes are evaluated: structured prose, JSON, CCL-Core, CCL-Min, SemanticZip ASCII, and SemanticZip emoji.
  • Five author-constructed diagnostic cases are used.
  • An independent decoder LLM reconstructs typed semantic atoms from compressed codes.
  • The paper is a pilot framework, not a benchmark claim.
  • Published on arXiv with ID 2605.24541.

Entities

Institutions

  • arXiv

Sources