Sliceformer: Dataflow-Aware LM for Static Program Slicing

other · 2026-05-01

A team of researchers has introduced Sliceformer, an innovative method that redefines static program slicing as a sequence-to-sequence problem utilizing compact language models such as CodeT5+. This technique employs dataflow-aware pretraining objectives that utilize data flow graphs (DFG) to help models understand data dependencies via dataflow-preserving statement permutation and dataflow-aware span corruption. This advancement tackles issues found in current learning-based methods, which often struggle with precise dependency modeling and unrestricted generation, leading to the inclusion of fabricated tokens and statements in slices. The findings are documented in arXiv:2604.26961.

Key facts

Static program slicing isolates code relevant to specific variables.
Sliceformer uses small language models such as CodeT5+.
Dataflow-aware pretraining leverages data flow graphs (DFG).
Pretraining includes dataflow-preserving statement permutation.
Pretraining includes dataflow-aware span corruption.
Existing LMs suffer from inaccurate dependency modeling.
Existing LMs produce slices with hallucinated tokens and statements.
The approach is detailed in arXiv:2604.26961.

Entities

—

Sources

arXiv cs.AI — 2026-05-01