FT-Dojo: Benchmark for Autonomous LLM Fine-Tuning

ai-technology · 2026-05-22

FT-Dojo has been unveiled by researchers as an interactive benchmarking platform designed for the autonomous fine-tuning of large language models, featuring 13 distinct tasks spanning 5 different domains. This system establishes a standardized task interface, a common raw-data repository, a controlled execution environment, a structured feedback mechanism, and a separate evaluation process. Additionally, the team has created FT-Agent, an autonomous framework focused on fine-tuning that employs structured iteration planning, rapid validation, and multi-tiered feedback analysis to enhance data and training methodologies. Experimental results indicate that FT-Agent consistently outperforms baseline methods.

Key facts

FT-Dojo is an interactive benchmark environment for autonomous LLM fine-tuning.
It comprises 13 tasks across 5 domains.
FT-Dojo standardizes a task interface, shared raw-data repository, sandboxed execution environment, structured feedback protocol, and held-out evaluation procedure.
FT-Agent is a fine-tuning-oriented autonomous framework.
FT-Agent uses structured iteration planning, fail-fast validation, and multi-level feedback analysis.
Experiments show FT-Agent provides stable improvement over baselines.
The work addresses the labor-intensive nature of fine-tuning LLMs for vertical domains.
End-to-end LLM fine-tuning has not been systematically studied as an interactive agent task before.

Entities

—

Sources

arXiv cs.AI — 2026-05-21