MultiVul: Multimodal Contrastive Framework for Software Vulnerability Detection
A novel framework named MultiVul has been introduced to enhance the detection of software vulnerabilities by aligning representations of code and comments. Traditional approaches often depend on single-modality code representations, which overlook valuable semantic insights found in comments. MultiVul incorporates dual similarity learning and consistency regularization, supplemented by a variety of code-text pairs. Testing on the DiverseVul and Devign datasets using four LLMs (DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, CodeLlama-7B) demonstrates an F1 score improvement of up to 27.07% compared to methods based on prompting.
Key facts
- MultiVul is a multimodal contrastive framework for vulnerability detection.
- It aligns code and comment representations.
- Uses dual similarity learning and consistency regularization.
- Augmented with diverse code-text pairs.
- Tested on DiverseVul and Devign datasets.
- Evaluated across four LLMs: DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, CodeLlama-7B.
- Achieves up to 27.07% F1 improvement over prompting-based methods.
- Addresses limitation of single-modality code representations.
Entities
—