SweRank: Efficient Software Issue Localization via Code Ranking

other · 2026-04-24

A novel framework named SweRank has been developed to tackle the challenge of software issue localization, which entails pinpointing code locations that correspond to natural language descriptions such as bug reports or feature requests. While current LLM-based agentic methods show promise, they are hindered by significant latency and expenses due to their dependence on multi-step reasoning and closed-source LLMs. Conventional code ranking models, which focus on query-to-code or code-to-code retrieval, find it difficult to handle the verbose and failure-oriented nature of issue localization queries. SweRank is a retrieve-and-rerank framework aimed at enhancing efficiency and effectiveness. To aid in training, researchers constructed SweLoc, a comprehensive dataset sourced from public GitHub repositories, containing authentic issue descriptions linked to relevant code locations. This research is presented in arXiv paper 2505.07849.

Key facts

SweRank is a retrieve-and-rerank framework for software issue localization.
It addresses limitations of LLM-based agents (latency, cost) and traditional code ranking models.
SweLoc is a large-scale dataset constructed from public GitHub repositories.
The dataset pairs real-world issue descriptions with code locations.
The paper is available on arXiv with ID 2505.07849.
Issue localization identifies files, classes, or functions relevant to a description.
LLM-based approaches often use closed-source models and complex reasoning.
Traditional code ranking models are not optimized for verbose issue descriptions.

SweRank: Efficient Software Issue Localization via Code Ranking

Key facts

Entities

Institutions

Sources