RDDG Framework Uses LLMs and Bayesian Calibration for Relational Data Synthesis
A new framework called RDDG (Relational Data generator with Dynamic Guidance) addresses the challenge of imbalanced data in real-world applications by synthesizing rare-class relational data. Developed by researchers and documented in arXiv preprint 2604.16817v1, this approach employs large language models (LLMs) within an in-context learning framework to generate structured tabular data. Unlike existing methods, RDDG incorporates a feedback mechanism that continuously optimizes data quality throughout synthesis. The framework first selects representative samples from original data via core set selection, then uses in-context learning to discover patterns and correlations among attributes. It employs progressive chain-of-thought steps to enhance downstream imbalanced classification performance. The work highlights the underexplored application of LLMs to relational data synthesis while addressing the lack of effective feedback mechanisms in current approaches. This research contributes to mitigating data scarcity problems through controllable synthesis techniques.
Key facts
- RDDG framework synthesizes relational data for imbalanced classification
- Uses LLMs with in-context learning for structured tabular data generation
- Incorporates dynamic feedback mechanism for continuous optimization
- Employs core set selection to identify representative samples
- Utilizes progressive chain-of-thought steps in synthesis process
- Addresses data scarcity problems for rare-classes in real-world applications
- Documented in arXiv preprint 2604.16817v1 with cross announcement type
- Focuses on discovering inherent patterns and correlations among attributes
Entities
Institutions
- arXiv