Data Language Models: A New Foundation Model Class for Tabular Data

ai-technology · 2026-05-09

Researchers have introduced the Data Language Model (DLM), a new class of foundation model designed to understand tabular data natively, without preprocessing. Unlike existing approaches such as gradient-boosted trees or previous tabular foundation models, DLMs can process raw cell values directly, similar to how language models handle sentences. This eliminates the need for serialization or preprocessing pipelines that currently separate raw data from AI systems. The work is presented in a paper on arXiv (2605.06290).

Key facts

DLM is a new foundation model class for tabular data.
It understands tables natively without preprocessing.
Existing tabular AI methods require preprocessing pipelines.
DLM processes raw cell values directly.
The paper is available on arXiv with ID 2605.06290.
DLM is compared to language models for text and vision models for images.
It aims to serve as a data layer for AI models and agents.
The approach eliminates the gap between raw data and AI systems.

Data Language Models: A New Foundation Model Class for Tabular Data

Key facts

Entities

Institutions

Sources