Structure-BiEval: Self-Supervised Framework for LLM Structural Fidelity in Web Systems
A novel self-supervised framework named Structure-BiEval has been introduced to assess the structural integrity of Large Language Models (LLMs) utilized in Web-based autonomous agents and Web Information Systems. This framework tackles the difficulty of evaluating how effectively LLMs convert natural language into structured formats suitable for Web API calls and data interchange, as conventional text metrics do not adequately measure topological consistency in semi-structured Web data, and manual assessments are expensive. Structure-BiEval employs deterministic Intermediate Representations to separate structure from content, using Content Semantic Accuracy and Normalized Tree Edit Distance as accurate metrics. It was tested on 15 leading LLMs, proving its utility for annotation-free, quantitative evaluations designed for Web data engineering. The findings are available in arXiv:2601.19923.
Key facts
- Structure-BiEval is a self-supervised framework for evaluating LLM structural fidelity.
- It targets Web-based autonomous agents and Web Information Systems.
- The framework decouples structure from content using deterministic Intermediate Representations.
- Metrics used: Content Semantic Accuracy and Normalized Tree Edit Distance.
- Benchmarked on 15 state-of-the-art LLMs.
- Addresses limitations of traditional text metrics and manual evaluation.
- Aims to improve Web API invocation and data exchange.
- Published on arXiv with ID 2601.19923.
Entities
Institutions
- arXiv