ARTFEED — Contemporary Art Intelligence

Microservice Architecture for Production OCR and LLM Pipelines

ai-technology · 2026-05-20

A new paper from arXiv presents a microservice architecture designed to operationalize document AI at scale, bridging the gap between academic model research and production deployment. The system encapsulates pipelines for classification, OCR, and LLM-based structured field extraction, processing thousands of multi-page documents per hour. Key design decisions include hybrid classification, separation of GPU-bound inference from CPU-bound orchestration, asynchronous IO-bound operations, and independent horizontal scaling. Batch profiling revealed two surprising findings: OCR dominates end-to-end latency over language-model parsing, and the system saturates under certain conditions. The paper details practical insights for deploying document AI in production environments.

Key facts

  • arXiv paper 2605.18818
  • Microservice architecture for OCR and LLM pipelines
  • Processes thousands of multi-page documents per hour
  • Hybrid classification approach
  • Separates GPU-bound inference from CPU-bound orchestration
  • Asynchronous processing for IO-bound operations
  • Independent horizontal scaling strategy
  • OCR dominates latency over language-model parsing

Entities

Institutions

  • arXiv

Sources