Frontier model cuts LLM costs by routing 80% of failures to cheap agent
Mendral reduced LLM costs by using a model hierarchy: a cheap Haiku agent triages CI failures, routing only 20% to expensive Opus. Haiku detects duplicates via exact and semantic search, costing 25x less than full investigation. Opus plans investigations and spawns Haiku sub-agents for focused tasks, never reading raw logs. The system uses SQL interface to ClickHouse for log access, avoiding prompt bloat. A case study shows Opus diagnosing a pnpm install failure (missing 'make') by querying trend data and git history via sub-agents. Haiku handles 65% of tokens but only 36% of spend; without hierarchy, daily bill doubles. The pattern applies to high-volume event streams like security logs or IoT telemetry.
Key facts
- Opus 4.6 costs less than Sonnet 4.0 due to model hierarchy
- 80% of CI failures never reach Opus; triager match costs 25x less
- Haiku agent uses exact matching and semantic search (pgvector) for duplicate detection
- Agents access logs via SQL interface to ClickHouse, not pushed prompts
- Opus spawns Haiku sub-agents for digging; sub-agents capped at one level deep
- Case study: Opus diagnosed 'gyp ERR! not found: make' via sub-agents querying git log and ClickHouse
- Haiku handles ~65% of input tokens but ~36% of LLM spend
- Pattern generalizes to security logs, IoT telemetry, financial data
Entities
Institutions
- Mendral
- Hacker News
- ClickHouse
- GitHub CLI