ARTFEED — Contemporary Art Intelligence

LLMOps Stack for Fraud and AML Compliance

ai-technology · 2026-05-13

A recent study presents a specialized LLMOps stack tailored for fraud detection and anti-money laundering (AML) compliance. In contrast to standard chat workloads, compliance-related prompts are characterized by heavy prefixes, strict schemas, and abundant evidence, necessitating effective prefix reuse, management of KV-cache, runtime adjustments, orchestration of models, and validation of outputs. This stack employs self-hosted open-weight models, including Meta Llama and Alibaba Qwen, and features vLLM-style runtime tuning, PagedAttention, Automatic Prefix Caching, multi-adapter serving, batching that considers adapter and prompt lengths, sleep/wake lifecycle management, speculative decoding, and optional pruning. The study can be accessed on arXiv with the reference number 2605.11232.

Key facts

  • The paper focuses on LLMOps for fraud detection and AML compliance.
  • Compliance prompts are prefix-heavy, schema-constrained, and evidence-rich.
  • The stack uses self-hosted open-weight models: Meta Llama and Alibaba Qwen.
  • Techniques include vLLM-style runtime tuning, PagedAttention, and Automatic Prefix Caching.
  • Multi-adapter serving and adapter/prompt-length-aware batching are employed.
  • Sleep/wake lifecycle management and speculative decoding are part of the stack.
  • The paper is published on arXiv with ID 2605.11232.
  • The stack is designed for structured outputs like JSON labels or risk factors.

Entities

Institutions

  • arXiv
  • Meta
  • Alibaba

Sources