Code Evaluation Metrics Tested for Plagiarism Detection

other · 2026-04-30

A new study that appeared on arXiv (2604.25778) looks into how well Code Evaluation Metrics (CEMs) can detect source code plagiarism across six levels of modifications, from L1 to L6. The researchers evaluated several metrics, including CodeBLEU, CrystalBLEU, RUBY, Tree Structured Edit Distance (TSED), and CodeBERTScore, using the ConPlag and IRPlag datasets. They also compared these metrics to top tools like JPlag and Dolos. The results show that without preprocessing, these metrics are not very effective at spotting plagiarism.

Key facts

Study compares five CEMs against SOTA SCPDTs JPlag and Dolos
Uses ConPlag (raw and template-free) and IRPlag datasets
Evaluates plagiarism across modification levels L1-L6
CEMs tested: CodeBLEU, CrystalBLEU, RUBY, TSED, CodeBERTScore
Threshold-free ranking-based measures used for evaluation
Findings indicate CEMs cannot reliably detect plagiarism without preprocessing
Published on arXiv with ID 2604.25778
Focuses on academic integrity in software engineering education

Code Evaluation Metrics Tested for Plagiarism Detection

Key facts

Entities

Institutions

Sources