DockSmith: AI Agent for Reliable Docker Environment Building
DockSmith is an advanced agentic Docker builder aimed at addressing the challenges of constructing reliable environments for the training and assessment of software engineering agents. It considers environment creation as a fundamental agentic skill that encompasses long-term tool utilization, reasoning about dependencies, and recovering from failures. Utilizing extensive Docker-building trajectories from a SWE-Factory-style pipeline, which includes loop detection and cross-task success memory, a 30B-A3B model has achieved leading results on Multi-Docker-Eval, recording a 39.72% Fail-to-Pass rate and a 58.28% Commit Rate. Additionally, DockSmith enhances performance in out-of-distribution scenarios.
Key facts
- DockSmith is an agentic Docker builder for software engineering agents.
- It addresses the bottleneck of reliable Docker environment construction.
- Training uses SWE-Factory-style pipeline with loop-detection and cross-task success memory.
- A 30B-A3B model achieves 39.72% Fail-to-Pass and 58.28% Commit Rate on Multi-Docker-Eval.
- DockSmith improves out-of-distribution performance.
Entities
Institutions
- arXiv