LLMs Enable Taxonomy-Agnostic PII Annotation in HTTP Traffic

ai-technology · 2026-05-09

A new paper on arXiv (2605.06305) proposes using Large Language Models (LLMs) for taxonomy-agnostic annotation of Personally Identifiable Information (PII) in HTTP traffic, addressing the scarcity of labelled data. The multi-stage pipeline combines deterministic pre-processing with LLM-based classification and validation, and includes a synthetic traffic generator for evaluation.

Key facts

Paper ID: arXiv:2605.06305
Published on arXiv
Focuses on automated privacy audits of web and mobile applications
Analyses outbound HTTP traffic for PII leakage
Existing detectors rely on scarce manually labelled traffic
Existing detectors are tied to fixed label taxonomies
Proposes LLM-based pipeline for taxonomy-agnostic annotation
Pipeline includes deterministic pre-processing, label-level classification, instance-level annotation, and output validation
Includes LLM-based generator for synthetic HTTP traffic

LLMs Enable Taxonomy-Agnostic PII Annotation in HTTP Traffic

Key facts

Entities

Institutions

Sources