Qwen3.6-35B-A3B Outperforms Claude Opus 4.7 in AI-Generated Pelican Illustration Test

ai-technology · 2026-04-17

Simon Willison's comparative benchmark testing revealed that Alibaba's Qwen3.6-35B-A3B model generated superior SVG illustrations of a pelican riding a bicycle compared to Anthropic's Claude Opus 4.7. The Qwen model, running as a 20.9GB quantized version on a MacBook Pro M5 via LM Studio, produced more accurate bicycle frames and included clever SVG comments like sunglasses on a flamingo. Willison conducted the tests on April 16, 2026, using both his established pelican benchmark and a secret backup test involving a flamingo riding a unicycle. While acknowledging the benchmark's absurd nature, he noted a historical correlation between pelican quality and model usefulness, though this connection broke with these latest results. Despite Qwen's performance in this specific task, Willison expressed doubt that the quantized model surpasses Anthropic's proprietary release in overall power or utility.

Key facts

Simon Willison published benchmark results comparing AI models on April 16, 2026
Alibaba's Qwen3.6-35B-A3B outperformed Anthropic's Claude Opus 4.7 in generating SVG illustrations
The test involved creating images of a pelican riding a bicycle
Qwen ran as a 20.9GB quantized model on a MacBook Pro M5 using LM Studio
A secondary test involved generating "a flamingo riding a unicycle"
Willison noted Qwen produced better bicycle frames and included clever SVG comments
The pelican benchmark has historically correlated with model usefulness
Willison expressed doubt that Qwen is more powerful overall than Claude Opus 4.7

Entities

Artists

Simon Willison

Institutions

Alibaba
Anthropic
Unsloth
LM Studio

Sources

Simon Willison — 2026-04-16