STELLAR-E: Automated Synthetic Dataset Generation for LLM Evaluation

ai-technology · 2026-04-29

The introduction of STELLAR-E (Synthetic, Tailored, End-to-end LLM Application Rigorous Evaluator) marks a significant advancement in the automatic generation of high-quality synthetic datasets for evaluating Large Language Models (LLMs). This system tackles the difficulties of gathering domain-specific and language-specific evaluation datasets, which are often hindered by privacy issues, regulatory limitations, and the lengthy process of manual creation. STELLAR-E functions in two phases: initially, it adapts the TGRT Self-Instruct framework to develop a synthetic data engine for customizable dataset generation with minimal human intervention; subsequently, it employs an evaluation pipeline utilizing statistical and LLM-based metrics to evaluate dataset relevance. Fully automated and scalable, it facilitates multilingual and multi-domain generation without the need for existing data, enhancing AI research by offering a robust benchmarking method for LLMs in specialized areas.

Key facts

STELLAR-E is a fully automated system for generating synthetic datasets.
It uses a modified TGRT Self-Instruct framework for data generation.
The system requires minimal human input and no existing datasets.
It includes an evaluation pipeline with statistical and LLM-based metrics.
STELLAR-E addresses privacy and regulatory concerns in dataset collection.
The system supports multilingual and multi-domain generation.
It is designed for domain-specific and language-specific LLM evaluation.
The approach improves scalability over existing automated benchmarking methods.

Entities

—

Sources

arXiv cs.AI — 2026-04-28