Knowledge Graphs from Sparse Autoencoder Features

ai-technology · 2026-04-29

A new method extracts domain-specific knowledge graphs from sparse autoencoder features in language models. The approach filters millions of features using contrastive activations, then builds co-occurrence and transcoder-based graphs with automated edge labeling. A case study on a biology textbook demonstrates the technique.

Key facts

Sparse autoencoders extract millions of interpretable features from language models.
Domain concepts are mixed with generic and weakly grounded features.
Contrastive activations and multi-stage filtering construct a domain-specific concept universe.
Two aligned graph views are built: a co-occurrence graph and a transcoder-based mechanism graph.
Automated edge labeling turns graph views into readable knowledge graphs.
A case study was conducted on a biology textbook.
The method addresses scattering of related ideas across many units.
The approach organizes conceptual structure at multiple levels of granularity.

Entities

—

Sources

arXiv cs.AI — 2026-04-28