LLMs Enable Taxonomy-Agnostic PII Annotation in HTTP Traffic
A new paper on arXiv (2605.06305) proposes using Large Language Models (LLMs) for taxonomy-agnostic annotation of Personally Identifiable Information (PII) in HTTP traffic, addressing the scarcity of labelled data. The multi-stage pipeline combines deterministic pre-processing with LLM-based classification and validation, and includes a synthetic traffic generator for evaluation.
Key facts
- Paper ID: arXiv:2605.06305
- Published on arXiv
- Focuses on automated privacy audits of web and mobile applications
- Analyses outbound HTTP traffic for PII leakage
- Existing detectors rely on scarce manually labelled traffic
- Existing detectors are tied to fixed label taxonomies
- Proposes LLM-based pipeline for taxonomy-agnostic annotation
- Pipeline includes deterministic pre-processing, label-level classification, instance-level annotation, and output validation
- Includes LLM-based generator for synthetic HTTP traffic
Entities
Institutions
- arXiv