ARTFEED — Contemporary Art Intelligence

SciHorizon-GENE Benchmark Tests LLM Gene Reasoning

ai-technology · 2026-05-25

Researchers introduced SciHorizon-GENE, a large-scale benchmark for evaluating large language models (LLMs) on gene-to-function reasoning. The benchmark covers over 190,000 human genes and includes more than 540,000 questions derived from authoritative biological databases. It assesses LLMs across four biological perspectives: research attention sensitivity, hallucination tendency, answer consistency, and reasoning depth. The work aims to address a gap in reliable knowledge-driven interpretation for cell atlas analysis.

Key facts

  • SciHorizon-GENE is a benchmark for LLMs in life sciences.
  • It covers over 190,000 human genes.
  • The benchmark includes more than 540,000 questions.
  • Questions are derived from authoritative biological databases.
  • It evaluates LLMs on four biological perspectives.
  • The benchmark targets gene-to-function reasoning.
  • It addresses knowledge-enhanced cell atlas interpretation.
  • The study is published on arXiv (2601.12805).

Entities

Institutions

  • arXiv

Sources