SOB: Multi-Source Benchmark for Structured Output in LLMs

publication · 2026-04-30

The Structured Output Benchmark (SOB) has been launched by researchers to assess the effectiveness of structured output generation in large language models. This benchmark incorporates three different source modalities: text, images, and audio conversations. Each model is provided with a text-normalized context representation, which separates the evaluation of structured-output abilities from the quality of vision or speech processing. SOB includes 5,000 text evaluation entries based on multi-hop QA from a comprehensive corpus of 25,091 records, along with 209 image entries sourced from OCR-processed PDFs and additional audio entries. This initiative aims to overcome the shortcomings of current benchmarks that concentrate only on schema compliance or correctness within a single domain.

Key facts

SOB stands for The Structured Output Benchmark
Benchmark spans three source modalities: native text, images, and audio conversations
All models receive a text-normalized representation of their context
Design isolates structured-output capability from raw vision or speech-processing quality
5,000 text evaluation records derived from multi-hop QA
Full corpus contains 25,091 records
209 image records from OCR-processed PDFs
Published on arXiv with ID 2604.25359

SOB: Multi-Source Benchmark for Structured Output in LLMs

Key facts

Entities

Institutions

Sources