CNN-CodeBERT Framework for Three-Class Credential Leakage Detection
A new hybrid framework combining CNNs and CodeBERT achieves state-of-the-art results in detecting credential leaks in source code. The three-class model distinguishes genuine credentials from placeholders and weak credentials, reducing false positives. On a dataset of 9,426 samples across 10 languages, it achieves an MCC of 0.86 and macro F1 of 0.90, with 93% recall and 89% precision for genuine leaks. High severity alerts dropped by 33% without compromising security. The work addresses the 23.8 million secrets exposed in 2024.
Key facts
- 23.8 million secrets exposed in 2024
- Three-class classification framework
- CNN-CodeBERT hybrid model
- Dataset of 9,426 samples across 10 programming languages
- Matthews Correlation Coefficient of 0.86
- Macro F1-score of 0.90
- 93% recall and 89% precision for genuine credential leaks
- 33% reduction in high severity alerts (from 373 to 250)
Entities
—