Qwen3.6-35B-A3B Outperforms Claude Opus 4.7 in AI-Generated Pelican Illustration Test
Simon Willison's comparative benchmark testing revealed that Alibaba's Qwen3.6-35B-A3B model generated superior SVG illustrations of a pelican riding a bicycle compared to Anthropic's Claude Opus 4.7. The Qwen model, running as a 20.9GB quantized version on a MacBook Pro M5 via LM Studio, produced more accurate bicycle frames and included clever SVG comments like sunglasses on a flamingo. Willison conducted the tests on April 16, 2026, using both his established pelican benchmark and a secret backup test involving a flamingo riding a unicycle. While acknowledging the benchmark's absurd nature, he noted a historical correlation between pelican quality and model usefulness, though this connection broke with these latest results. Despite Qwen's performance in this specific task, Willison expressed doubt that the quantized model surpasses Anthropic's proprietary release in overall power or utility.
Key facts
- Simon Willison published benchmark results comparing AI models on April 16, 2026
- Alibaba's Qwen3.6-35B-A3B outperformed Anthropic's Claude Opus 4.7 in generating SVG illustrations
- The test involved creating images of a pelican riding a bicycle
- Qwen ran as a 20.9GB quantized model on a MacBook Pro M5 using LM Studio
- A secondary test involved generating "a flamingo riding a unicycle"
- Willison noted Qwen produced better bicycle frames and included clever SVG comments
- The pelican benchmark has historically correlated with model usefulness
- Willison expressed doubt that Qwen is more powerful overall than Claude Opus 4.7
Entities
Artists
- Simon Willison
Institutions
- Alibaba
- Anthropic
- Unsloth
- LM Studio