MetaGAI Benchmark for Generative AI Documentation
MetaGAI has been launched by researchers as a benchmark featuring 2,541 authenticated document triplets aimed at assessing the generation of Model and Data Cards in generative AI. This dataset was developed through the semantic triangulation of academic literature, GitHub projects, and Hugging Face resources, employing a multi-agent system that includes specialized Retriever, Generator, and Editor agents. The validation process utilized a four-dimensional human-in-the-loop approach, which involved human assessments of the editor-refined ground truth. The evaluation framework integrates automated metrics alongside validated LLM-as-a-Judge methodologies. Findings indicate that sparse Mixture-of-Experts architectures provide enhanced cost-efficiency. This initiative responds to the demand for stringent documentation standards to ensure transparency and governance in generative AI.
Key facts
- MetaGAI includes 2,541 verified document triplets
- Constructed via semantic triangulation of academic papers, GitHub repositories, and Hugging Face artifacts
- Uses a multi-agent framework with Retriever, Generator, and Editor agents
- Validation via four-dimensional human-in-the-loop assessment
- Evaluation combines automated metrics with LLM-as-a-Judge frameworks
- Sparse Mixture-of-Experts architectures show superior cost-efficiency
- Aims to improve transparency and governance in generative AI
- Published on arXiv as 2604.23539
Entities
Institutions
- arXiv
- GitHub
- Hugging Face