Reinforcement Learning Achieves Expert-Level Chip Placement via Reward Learning
A new reinforcement learning framework for chip placement achieves expert-level layouts by learning from expert designs rather than optimizing wirelength alone. Researchers identified reward design as the key gap between RL and human experts. Their method infers step-by-step expert trajectories from final layouts, using them as demonstrations to train a reward model that captures latent implicit rewards. Experiments show the framework learns efficiently from even a single design and generalizes well to unseen cases. The work addresses a critical step in physical design, where prior RL methods often failed to match expert quality.
Key facts
- Chip placement is a critical step in physical design.
- Existing RL-based methods focus on wirelength optimization and often fail to achieve expert-quality layouts.
- Reward design is identified as the primary cause for the performance gap with experts.
- The new approach learns directly from expert layouts to derive a reward model.
- The method infers step-by-step expert trajectories from final expert layouts.
- Trajectories are used as demonstrations or preferences to train a model capturing latent implicit rewards.
- The framework can learn efficiently from even a single design.
- The framework generalizes well to unseen cases.
Entities
Institutions
- arXiv