Sliceformer: Dataflow-Aware LM for Static Program Slicing
A team of researchers has introduced Sliceformer, an innovative method that redefines static program slicing as a sequence-to-sequence problem utilizing compact language models such as CodeT5+. This technique employs dataflow-aware pretraining objectives that utilize data flow graphs (DFG) to help models understand data dependencies via dataflow-preserving statement permutation and dataflow-aware span corruption. This advancement tackles issues found in current learning-based methods, which often struggle with precise dependency modeling and unrestricted generation, leading to the inclusion of fabricated tokens and statements in slices. The findings are documented in arXiv:2604.26961.
Key facts
- Static program slicing isolates code relevant to specific variables.
- Sliceformer uses small language models such as CodeT5+.
- Dataflow-aware pretraining leverages data flow graphs (DFG).
- Pretraining includes dataflow-preserving statement permutation.
- Pretraining includes dataflow-aware span corruption.
- Existing LMs suffer from inaccurate dependency modeling.
- Existing LMs produce slices with hallucinated tokens and statements.
- The approach is detailed in arXiv:2604.26961.
Entities
—