Frontier model cuts LLM costs by routing 80% of failures to cheap agent

other · 2026-04-29

Mendral reduced LLM costs by using a model hierarchy: a cheap Haiku agent triages CI failures, routing only 20% to expensive Opus. Haiku detects duplicates via exact and semantic search, costing 25x less than full investigation. Opus plans investigations and spawns Haiku sub-agents for focused tasks, never reading raw logs. The system uses SQL interface to ClickHouse for log access, avoiding prompt bloat. A case study shows Opus diagnosing a pnpm install failure (missing 'make') by querying trend data and git history via sub-agents. Haiku handles 65% of tokens but only 36% of spend; without hierarchy, daily bill doubles. The pattern applies to high-volume event streams like security logs or IoT telemetry.

Key facts

Opus 4.6 costs less than Sonnet 4.0 due to model hierarchy
80% of CI failures never reach Opus; triager match costs 25x less
Haiku agent uses exact matching and semantic search (pgvector) for duplicate detection
Agents access logs via SQL interface to ClickHouse, not pushed prompts
Opus spawns Haiku sub-agents for digging; sub-agents capped at one level deep
Case study: Opus diagnosed 'gyp ERR! not found: make' via sub-agents querying git log and ClickHouse
Haiku handles ~65% of input tokens but ~36% of LLM spend
Pattern generalizes to security logs, IoT telemetry, financial data

Frontier model cuts LLM costs by routing 80% of failures to cheap agent

Key facts

Entities

Institutions

Sources