LLMs Unreliable for Autonomous Smart Contract Security Auditing

ai-technology · 2026-05-13

A recent investigation published on arXiv examines the potential of Large Language Models (LLMs) to substitute conventional static-analysis tools in detecting vulnerabilities in smart contracts. The findings indicate that LLMs exhibit lexical bias and insufficient validation of external data, resulting in elevated false positive rates. Techniques for prompting demonstrate a balance between precision and recall. The researchers created a tailored automated framework that reached 92% accuracy in classifying the outputs of the model. The study ultimately concludes that while LLMs are not ready to function as independent security auditors, they could enhance the capabilities of current tools.

Key facts

Study assesses LLMs as replacements or complements to static-analysis tools for smart contract vulnerability detection.
LLM efficacy undermined by lexical bias and lack of rigorous validation of external data inputs.
Reliance on non-semantic heuristics like identifier naming leads to high false positives.
Prompting techniques reveal trade-off between precision and recall.
Custom automated framework achieves 92% accuracy in classifying model outputs.
Paper available on arXiv under Computer Science > Cryptography and Security.
Blockchain transactions' irreversible nature makes vulnerability identification essential.
LLMs are increasingly integrated into developer workflows but their reliability as autonomous auditors remains unproven.

LLMs Unreliable for Autonomous Smart Contract Security Auditing

Key facts

Entities

Institutions

Sources