CNN-CodeBERT Framework for Three-Class Credential Leakage Detection

ai-technology · 2026-06-01

A new hybrid framework combining CNNs and CodeBERT achieves state-of-the-art results in detecting credential leaks in source code. The three-class model distinguishes genuine credentials from placeholders and weak credentials, reducing false positives. On a dataset of 9,426 samples across 10 languages, it achieves an MCC of 0.86 and macro F1 of 0.90, with 93% recall and 89% precision for genuine leaks. High severity alerts dropped by 33% without compromising security. The work addresses the 23.8 million secrets exposed in 2024.

Key facts

23.8 million secrets exposed in 2024
Three-class classification framework
CNN-CodeBERT hybrid model
Dataset of 9,426 samples across 10 programming languages
Matthews Correlation Coefficient of 0.86
Macro F1-score of 0.90
93% recall and 89% precision for genuine credential leaks
33% reduction in high severity alerts (from 373 to 250)

Entities

—

Sources

arXiv cs.AI — 2026-06-01