LLMs Show Promise in Code Analysis Across Multiple Languages
A recent study on arXiv evaluates 21 sophisticated large language models (LLMs) for key code analysis capabilities across four programming languages: C, Java, Python, and Solidity. The study looks into syntax parsing, static semantics inference, and dynamic reasoning, testing the models on nine different tasks, including generating abstract syntax trees (AST), constructing control flow graphs (CFG), and conducting taint analysis. Using a three-tier approach—automated metrics, expert assessments, and consistency checks—the research analyzed 3,124 code samples. Results indicate that while LLMs can perform zero-shot code analysis, their performance varies by task and language. This study points to the potential for LLMs to improve or even replace traditional language-specific analysis tools in software engineering, particularly for debugging and security assessments.
Key facts
- 21 state-of-the-art LLMs evaluated
- Nine code analysis tasks tested
- Four languages: C, Java, Python, Solidity
- 3,124 code samples analyzed
- Three-layer evaluation protocol used
- Tasks include AST generation, CFG construction, data dependency, taint analysis, flaky test reasoning
- Study structured around syntax parsing, static semantics inference, dynamic reasoning
- arXiv:2305.12138v5
Entities
Institutions
- arXiv