ARTFEED — Contemporary Art Intelligence

MANTA: Multi-turn framework for LLM animal welfare alignment

ai-technology · 2026-05-20

A new evaluation framework called MANTA (Multi-turn Assessment for Nonhuman Thinking and Alignment) has been developed by researchers, utilizing the Inspect AI platform. In contrast to traditional single-turn benchmarks such as AnimalHarmBench (AHB), MANTA rigorously tests advanced LLMs in both professional and everyday contexts through the use of adversarially crafted follow-up questions. This innovative framework dynamically creates pressure turns based on the actual responses of each model, thereby generating specific adversarial challenges. It assesses models across a maximum of 13 scoring dimensions derived from AHB, utilizing a continuous scale from 0 to 1. Initial findings are detailed in arXiv:2605.16301.

Key facts

  • MANTA is a multi-turn evaluation framework for LLM animal welfare alignment
  • Built on the Inspect AI platform
  • Uses adversarially generated follow-up questions
  • Generates pressure turns dynamically from model responses
  • Evaluates across up to 13 AHB-derived scoring dimensions
  • Continuous 0-1 scale
  • Preliminary results from arXiv:2605.16301
  • Addresses failure mode where models capitulate under economic, social, or authority-based arguments

Entities

Institutions

  • Inspect AI

Sources