Sheet as Token: Graph-Enhanced Framework for Multi-Sheet Spreadsheet Understanding
A recent research paper introduces Sheet as Token, a framework enhanced by graphs for retrieving multi-sheet spreadsheets. This approach considers each worksheet as a cohesive semantic entity, extracting schema-aware records from elements such as sheet names, column headers, representative values, and layout characteristics, subsequently encoding each worksheet into a compact dense token. When presented with a natural-language query, a Graph Retriever generates a query-specific candidate graph to enhance retrieval precision. This study tackles difficulties in understanding workbooks at scale for language-model-driven data analysis agents, where pertinent information is frequently spread across various sheets with differing schemas and implicit connections. Unlike existing retrieval methods that break spreadsheets into rows, columns, or blocks, potentially fragmenting worksheets, Sheet as Token seeks to maintain overall semantics while facilitating efficient retrieval. The paper can be accessed on arXiv with the identifier 2605.05811.
Key facts
- Sheet as Token is a graph-enhanced framework for multi-sheet spreadsheet retrieval.
- Each worksheet is treated as a unified semantic unit.
- Schema-aware records are extracted from sheet names, column headers, representative values, and layout features.
- Each worksheet is encoded into a compact dense token.
- A Graph Retriever constructs a query-specific candidate graph.
- The method addresses challenges in workbook-scale spreadsheet understanding for language-model-based data analysis agents.
- Existing approaches decompose spreadsheets into rows, columns, or blocks, which can fragment worksheets.
- The paper is available on arXiv with identifier 2605.05811.
Entities
Institutions
- arXiv