TorchSight: Open-Source Local LLM for Security Document Classification

ai-technology · 2026-05-22

TorchSight is an open-source local system designed for the classification of security documents, utilizing a fine-tuned Qwen 3.5 27B model. It has been trained on 78,358 samples sourced from 13 permissively licensed origins, along with synthetic data from GPT-4, encompassing seven security categories and 51 subcategories. In tests involving 1,000 documents, it achieved a category-level accuracy of 95.0% (95% CI: 93.5-96.2), surpassing commercial alternatives that recorded 75.4-79.9% under identical conditions. Additionally, when evaluated on a separate external dataset of 500 samples, it maintained an accuracy of 93.8%, showcasing its strong performance. This system effectively addresses the challenge of scanning documents for sensitive information without dependence on cloud services or rule-based solutions.

Key facts

TorchSight is an open-source local system for security document classification.
It uses a fine-tuned Qwen 3.5 27B model.
Trained on 78,358 samples from 13 permissively licensed sources and GPT-4 synthetic data.
Covers seven security categories and 51 subcategories.
Achieved 95.0% category-level accuracy on 1,000 documents (95% CI: 93.5-96.2).
Commercial models scored 75.4-79.9% under the same prompting protocol.
On 500 held-out samples, accuracy was 93.8%.
Designed to avoid sending data to external cloud infrastructure.

Entities

—

Sources

arXiv cs.AI — 2026-05-21