ARTFEED — Contemporary Art Intelligence

SecureVibeBench: Benchmarking AI Code Security via Real Vulnerabilities

ai-technology · 2026-04-27

Researchers have unveiled SecureVibeBench, a benchmark comprising 105 secure coding tasks in C/C++ drawn from 41 projects within OSS-Fuzz. This benchmark aims to assess code agents utilizing large language models by recreating situations where human developers have inadvertently added vulnerabilities. It includes authentic multi-file modifications in extensive repositories, contextual alignments based on actual open-source vulnerabilities with clearly defined introduction points, and a thorough evaluation that merges functionality testing with security assessments using both static and dynamic oracles. Five widely-used code agents were tested. This initiative fills a gap in current benchmarks that overlook scenarios of human-introduced vulnerabilities, facilitating equitable comparisons between human developers and AI agents.

Key facts

  • SecureVibeBench includes 105 C/C++ secure coding tasks
  • Tasks sourced from 41 projects in OSS-Fuzz
  • Benchmark reconstructs vulnerability-introducing scenarios by human developers
  • Requires multi-file edits in large repositories
  • Uses real-world open-source vulnerabilities with precisely identified introduction points
  • Evaluation combines functionality testing and security checking with static and dynamic oracles
  • Five popular code agents were evaluated
  • Addresses gap in existing benchmarks for fair human-AI comparison

Entities

Institutions

  • arXiv
  • OSS-Fuzz

Sources