SOB: Multi-Source Benchmark for Structured Output in LLMs
The Structured Output Benchmark (SOB) has been launched by researchers to assess the effectiveness of structured output generation in large language models. This benchmark incorporates three different source modalities: text, images, and audio conversations. Each model is provided with a text-normalized context representation, which separates the evaluation of structured-output abilities from the quality of vision or speech processing. SOB includes 5,000 text evaluation entries based on multi-hop QA from a comprehensive corpus of 25,091 records, along with 209 image entries sourced from OCR-processed PDFs and additional audio entries. This initiative aims to overcome the shortcomings of current benchmarks that concentrate only on schema compliance or correctness within a single domain.
Key facts
- SOB stands for The Structured Output Benchmark
- Benchmark spans three source modalities: native text, images, and audio conversations
- All models receive a text-normalized representation of their context
- Design isolates structured-output capability from raw vision or speech-processing quality
- 5,000 text evaluation records derived from multi-hop QA
- Full corpus contains 25,091 records
- 209 image records from OCR-processed PDFs
- Published on arXiv with ID 2604.25359
Entities
Institutions
- arXiv