ARTFEED — Contemporary Art Intelligence

LLMs Enable Taxonomy-Agnostic PII Annotation in HTTP Traffic

ai-technology · 2026-05-09

A new paper on arXiv (2605.06305) proposes using Large Language Models (LLMs) for taxonomy-agnostic annotation of Personally Identifiable Information (PII) in HTTP traffic, addressing the scarcity of labelled data. The multi-stage pipeline combines deterministic pre-processing with LLM-based classification and validation, and includes a synthetic traffic generator for evaluation.

Key facts

  • Paper ID: arXiv:2605.06305
  • Published on arXiv
  • Focuses on automated privacy audits of web and mobile applications
  • Analyses outbound HTTP traffic for PII leakage
  • Existing detectors rely on scarce manually labelled traffic
  • Existing detectors are tied to fixed label taxonomies
  • Proposes LLM-based pipeline for taxonomy-agnostic annotation
  • Pipeline includes deterministic pre-processing, label-level classification, instance-level annotation, and output validation
  • Includes LLM-based generator for synthetic HTTP traffic

Entities

Institutions

  • arXiv

Sources