Feature-Inversion Trap: New Benchmark Exposes LLM Detector Failures on Personalized Text
Researchers have introduced the first benchmark for detecting personalized machine-generated text (MGT), revealing that current detectors suffer significant performance drops when faced with LLM-generated imitations of a specific author's style. The study, published on arXiv (2510.12476v3), identifies a 'feature-inversion trap' where features that work for general MGT become misleading in personalized contexts. The benchmark, built from literary and blog texts paired with LLM-generated imitations, shows that even state-of-the-art detectors can fail. The authors propose a simple method to predict detector reliability. This work addresses the growing risk of identity impersonation as LLMs become more adept at imitating personal writing styles.
Key facts
- First benchmark for personalized MGT detection introduced
- Benchmark built from literary and blog texts with LLM imitations
- State-of-the-art detectors show significant performance drops
- Feature-inversion trap identified as cause of detector failures
- Simple method proposed to predict detector reliability
- Study published on arXiv (2510.12476v3)
- Addresses risk of identity impersonation by LLMs
- No prior work had examined personalized MGT detection
Entities
Institutions
- arXiv