SciHorizon-GENE Benchmark Tests LLM Gene Reasoning
Researchers introduced SciHorizon-GENE, a large-scale benchmark for evaluating large language models (LLMs) on gene-to-function reasoning. The benchmark covers over 190,000 human genes and includes more than 540,000 questions derived from authoritative biological databases. It assesses LLMs across four biological perspectives: research attention sensitivity, hallucination tendency, answer consistency, and reasoning depth. The work aims to address a gap in reliable knowledge-driven interpretation for cell atlas analysis.
Key facts
- SciHorizon-GENE is a benchmark for LLMs in life sciences.
- It covers over 190,000 human genes.
- The benchmark includes more than 540,000 questions.
- Questions are derived from authoritative biological databases.
- It evaluates LLMs on four biological perspectives.
- The benchmark targets gene-to-function reasoning.
- It addresses knowledge-enhanced cell atlas interpretation.
- The study is published on arXiv (2601.12805).
Entities
Institutions
- arXiv