ARTFEED — Contemporary Art Intelligence

Knowledge Graphs from Sparse Autoencoder Features

ai-technology · 2026-04-29

A new method extracts domain-specific knowledge graphs from sparse autoencoder features in language models. The approach filters millions of features using contrastive activations, then builds co-occurrence and transcoder-based graphs with automated edge labeling. A case study on a biology textbook demonstrates the technique.

Key facts

  • Sparse autoencoders extract millions of interpretable features from language models.
  • Domain concepts are mixed with generic and weakly grounded features.
  • Contrastive activations and multi-stage filtering construct a domain-specific concept universe.
  • Two aligned graph views are built: a co-occurrence graph and a transcoder-based mechanism graph.
  • Automated edge labeling turns graph views into readable knowledge graphs.
  • A case study was conducted on a biology textbook.
  • The method addresses scattering of related ideas across many units.
  • The approach organizes conceptual structure at multiple levels of granularity.

Entities

Sources