MLLMs Fail Vision-to-Code: 'Mirage' Defect in Circuit Translation
A recent investigation uncovers a significant vulnerability in multimodal large language models (MLLMs) that convert circuit diagrams into register-transfer-level (RTL) code. Researchers from arXiv have identified a phenomenon termed 'Mirage': when a circuit diagram is substituted with a blank image, the model's Pass@k metric either remains stable or improves. This happens because the models ignore visual input, depending instead on identifier semantics found in the module header to access standard RTL templates. This discovery reveals a hidden defect in AI-driven code generation, raising concerns about reliability in safety-critical hardware applications. The research, documented as arXiv:2604.27969v1, emphasizes the rigorous reliability challenge that circuit-to-Verilog translation presents for vision-to-code generation, as circuit diagrams contain essential timing, topology, and bit-level semantics vital for silicon fabrication.
Key facts
- MLLMs are used to translate visual artifacts into code, including circuit diagrams to RTL code.
- The study reveals the 'Mirage' phenomenon: blank images yield unchanged or higher Pass@k scores.
- Models bypass visual input and exploit identifier semantics in module headers.
- This defect is covert and undermines trustworthiness in AI-assisted code generation.
- Circuit diagrams encode timing, topology, and bit-level semantics critical for hardware safety.
- The research is published as arXiv:2604.27969v1.
- Circuit-to-Verilog translation is an extreme reliability test for vision-to-code generation.
Entities
Institutions
- arXiv