SciHorizon-GENE Benchmark Tests LLM Gene Reasoning

ai-technology · 2026-05-25

Researchers introduced SciHorizon-GENE, a large-scale benchmark for evaluating large language models (LLMs) on gene-to-function reasoning. The benchmark covers over 190,000 human genes and includes more than 540,000 questions derived from authoritative biological databases. It assesses LLMs across four biological perspectives: research attention sensitivity, hallucination tendency, answer consistency, and reasoning depth. The work aims to address a gap in reliable knowledge-driven interpretation for cell atlas analysis.

Key facts

SciHorizon-GENE is a benchmark for LLMs in life sciences.
It covers over 190,000 human genes.
The benchmark includes more than 540,000 questions.
Questions are derived from authoritative biological databases.
It evaluates LLMs on four biological perspectives.
The benchmark targets gene-to-function reasoning.
It addresses knowledge-enhanced cell atlas interpretation.
The study is published on arXiv (2601.12805).

SciHorizon-GENE Benchmark Tests LLM Gene Reasoning

Key facts

Entities

Institutions

Sources