DNA Synthesis Hazard Screening Fails Under Taxonomic Shift
A recent study indicates that existing DNA-synthesis screening techniques are ineffective when dealing with dangerous sequences from taxonomic families not included in reference databases. Researchers demonstrate that, under the constraints of Conformal Risk Control's certified miss-rate, low-discrimination signals lead to thresholds falling below safe test masses, causing a 100% false-flag rate. To remedy this issue, the team develops three signals: k-mer Jaccard similarity to known toxins, trimmed-mean scores from a panel of five LLM judges, and cosine similarity to clustered embedding centroids. By integrating these through a monotone logistic aggregator and calibrating with Conformal Risk Control, the screener guarantees an expected false negative rate of ≤ α. In leave-one-taxonomic-family-out folds at α=0.05 on UniProt KW-0800 reviewed toxins, the calibrated screener achieves a 0% test miss rate. The research is accessible on arXiv.
Key facts
- Current DNA-synthesis screening fails for sequences from taxonomic families not in reference sets.
- Baseline collapses to 100% false-flag rate under taxonomic shift.
- Three signals are composed: k-mer Jaccard similarity, trimmed-mean of five-LLM judge panel, cosine similarity to embedding centroids.
- Signals fused under monotone logistic aggregator and calibrated by Conformal Risk Control.
- Certifies expected false negative rate ≤ α.
- Tested on UniProt KW-0800 reviewed toxins with leave-one-taxonomic-family-out folds at α=0.05.
- Calibrated screener achieves 0% test miss rate.
- Paper published on arXiv with ID 2605.00074.
Entities
Institutions
- arXiv
- UniProt