ARTFEED — Contemporary Art Intelligence

REALISTA: New Attack Method Induces LLM Hallucinations

ai-technology · 2026-05-14

A group of researchers has come up with REALISTA, a groundbreaking framework aimed at generating realistic adversarial prompts that can trigger hallucinations in large language models (LLMs). They view the process of inducing these hallucinations as a constrained optimization problem, focusing on creating prompts that are semantically similar to harmless user inputs. Existing techniques fall short: while discrete prompt attacks keep the meaning intact, they're not very varied, and continuous latent-space attacks can result in nonsensical rephrasing. REALISTA establishes a tailored dictionary of valid editing paths linked to semantically consistent rewordings and fine-tunes continuous latent vectors to spark hallucinations. This research is detailed in the arXiv preprint 2605.12813.

Key facts

  • REALISTA is a realistic latent-space attack framework.
  • It elicits hallucinations in large language models.
  • Hallucination elicitation is framed as a constrained optimization problem.
  • The goal is to find semantically coherent adversarial prompts equivalent to benign prompts.
  • Discrete prompt-based attacks search over a limited set of prompt variations.
  • Continuous latent-space attacks often decode into invalid rephrasings.
  • REALISTA uses an input-dependent dictionary of valid editing directions.
  • The preprint is available on arXiv under ID 2605.12813.

Entities

Institutions

  • arXiv

Sources