Terminus-4B: Small Model Matches Frontier LLMs in Agentic Execution

ai-technology · 2026-05-07

A new paper on arXiv (2605.03195) introduces Terminus-4B, a finetuned small language model that rivals frontier models in agentic terminal execution. Modern coding agents often delegate specialized subtasks to subagents, which are smaller loops handling narrow responsibilities like search, debugging, or terminal execution. This keeps the main agent's context clean by isolating verbose outputs. Typically, frontier models are used as subagents. The researchers post-trained Qwen3-4B via supervised finetuning and reinforcement learning with a rubric-based LLM-as-judge reward. Their evaluation spans various frontier models, training ablations, and main agent configurations, showing that a smaller model can achieve comparable performance.

Key facts

Terminus-4B is a post-trained Qwen3-4B model
Uses supervised finetuning (SFT) and reinforcement learning (RL)
RL reward is rubric-based LLM-as-judge
Task: agentic terminal execution
Compared against frontier models
Evaluation includes training ablations and main agent configurations
Published on arXiv with ID 2605.03195
Modern coding agents use subagents for specialized subtasks

Terminus-4B: Small Model Matches Frontier LLMs in Agentic Execution

Key facts

Entities

Institutions

Sources