ARTFEED — Contemporary Art Intelligence

Terminus-4B: Small Model Matches Frontier LLMs in Agentic Execution

ai-technology · 2026-05-07

A new paper on arXiv (2605.03195) introduces Terminus-4B, a finetuned small language model that rivals frontier models in agentic terminal execution. Modern coding agents often delegate specialized subtasks to subagents, which are smaller loops handling narrow responsibilities like search, debugging, or terminal execution. This keeps the main agent's context clean by isolating verbose outputs. Typically, frontier models are used as subagents. The researchers post-trained Qwen3-4B via supervised finetuning and reinforcement learning with a rubric-based LLM-as-judge reward. Their evaluation spans various frontier models, training ablations, and main agent configurations, showing that a smaller model can achieve comparable performance.

Key facts

  • Terminus-4B is a post-trained Qwen3-4B model
  • Uses supervised finetuning (SFT) and reinforcement learning (RL)
  • RL reward is rubric-based LLM-as-judge
  • Task: agentic terminal execution
  • Compared against frontier models
  • Evaluation includes training ablations and main agent configurations
  • Published on arXiv with ID 2605.03195
  • Modern coding agents use subagents for specialized subtasks

Entities

Institutions

  • arXiv

Sources