SecureVibeBench: Benchmarking AI Code Security via Real Vulnerabilities

ai-technology · 2026-04-27

Researchers have unveiled SecureVibeBench, a benchmark comprising 105 secure coding tasks in C/C++ drawn from 41 projects within OSS-Fuzz. This benchmark aims to assess code agents utilizing large language models by recreating situations where human developers have inadvertently added vulnerabilities. It includes authentic multi-file modifications in extensive repositories, contextual alignments based on actual open-source vulnerabilities with clearly defined introduction points, and a thorough evaluation that merges functionality testing with security assessments using both static and dynamic oracles. Five widely-used code agents were tested. This initiative fills a gap in current benchmarks that overlook scenarios of human-introduced vulnerabilities, facilitating equitable comparisons between human developers and AI agents.

Key facts

SecureVibeBench includes 105 C/C++ secure coding tasks
Tasks sourced from 41 projects in OSS-Fuzz
Benchmark reconstructs vulnerability-introducing scenarios by human developers
Requires multi-file edits in large repositories
Uses real-world open-source vulnerabilities with precisely identified introduction points
Evaluation combines functionality testing and security checking with static and dynamic oracles
Five popular code agents were evaluated
Addresses gap in existing benchmarks for fair human-AI comparison

SecureVibeBench: Benchmarking AI Code Security via Real Vulnerabilities

Key facts

Entities

Institutions

Sources