MANTA: Multi-turn framework for LLM animal welfare alignment

ai-technology · 2026-05-20

A new evaluation framework called MANTA (Multi-turn Assessment for Nonhuman Thinking and Alignment) has been developed by researchers, utilizing the Inspect AI platform. In contrast to traditional single-turn benchmarks such as AnimalHarmBench (AHB), MANTA rigorously tests advanced LLMs in both professional and everyday contexts through the use of adversarially crafted follow-up questions. This innovative framework dynamically creates pressure turns based on the actual responses of each model, thereby generating specific adversarial challenges. It assesses models across a maximum of 13 scoring dimensions derived from AHB, utilizing a continuous scale from 0 to 1. Initial findings are detailed in arXiv:2605.16301.

Key facts

MANTA is a multi-turn evaluation framework for LLM animal welfare alignment
Built on the Inspect AI platform
Uses adversarially generated follow-up questions
Generates pressure turns dynamically from model responses
Evaluates across up to 13 AHB-derived scoring dimensions
Continuous 0-1 scale
Preliminary results from arXiv:2605.16301
Addresses failure mode where models capitulate under economic, social, or authority-based arguments

MANTA: Multi-turn framework for LLM animal welfare alignment

Key facts

Entities

Institutions

Sources