ARTFEED — Contemporary Art Intelligence

SOB: Multi-Source Benchmark for Structured Output in LLMs

publication · 2026-04-30

The Structured Output Benchmark (SOB) has been launched by researchers to assess the effectiveness of structured output generation in large language models. This benchmark incorporates three different source modalities: text, images, and audio conversations. Each model is provided with a text-normalized context representation, which separates the evaluation of structured-output abilities from the quality of vision or speech processing. SOB includes 5,000 text evaluation entries based on multi-hop QA from a comprehensive corpus of 25,091 records, along with 209 image entries sourced from OCR-processed PDFs and additional audio entries. This initiative aims to overcome the shortcomings of current benchmarks that concentrate only on schema compliance or correctness within a single domain.

Key facts

  • SOB stands for The Structured Output Benchmark
  • Benchmark spans three source modalities: native text, images, and audio conversations
  • All models receive a text-normalized representation of their context
  • Design isolates structured-output capability from raw vision or speech-processing quality
  • 5,000 text evaluation records derived from multi-hop QA
  • Full corpus contains 25,091 records
  • 209 image records from OCR-processed PDFs
  • Published on arXiv with ID 2604.25359

Entities

Institutions

  • arXiv

Sources