ARTFEED — Contemporary Art Intelligence

CNN-CodeBERT Framework for Three-Class Credential Leakage Detection

ai-technology · 2026-06-01

A new hybrid framework combining CNNs and CodeBERT achieves state-of-the-art results in detecting credential leaks in source code. The three-class model distinguishes genuine credentials from placeholders and weak credentials, reducing false positives. On a dataset of 9,426 samples across 10 languages, it achieves an MCC of 0.86 and macro F1 of 0.90, with 93% recall and 89% precision for genuine leaks. High severity alerts dropped by 33% without compromising security. The work addresses the 23.8 million secrets exposed in 2024.

Key facts

  • 23.8 million secrets exposed in 2024
  • Three-class classification framework
  • CNN-CodeBERT hybrid model
  • Dataset of 9,426 samples across 10 programming languages
  • Matthews Correlation Coefficient of 0.86
  • Macro F1-score of 0.90
  • 93% recall and 89% precision for genuine credential leaks
  • 33% reduction in high severity alerts (from 373 to 250)

Entities

Sources