ARTFEED — Contemporary Art Intelligence

CodeDistiller: Automated Code Library Generation for Scientific AI

other · 2026-05-18

CodeDistiller is a system that automatically extracts and vets working code examples from scientific GitHub repositories to enhance Automated Scientific Discovery (ASD) agents. It addresses the limitation of current ASD systems that rely on manual examples or parametric knowledge. Evaluated on 250 materials science repositories, the best model produces functional examples for 74% of repositories. Downstream tests show ASD agents augmented with CodeDistiller-generated libraries produce more effective experiments. The system reduces manual effort and expands the scope of automated scientific coding.

Key facts

  • CodeDistiller automatically distills scientific GitHub repositories into vetted code libraries.
  • Current ASD systems are limited by manual examples or parametric knowledge.
  • Evaluation on 250 materials science repositories achieved 74% functional examples.
  • Downstream evaluation shows improved ASD agent performance with CodeDistiller libraries.
  • The system reduces manual effort in creating domain-specific code examples.
  • CodeDistiller targets Automated Scientific Discovery (ASD) systems.
  • The approach combines automatic and domain-expert evaluation.
  • CodeDistiller expands ASD capabilities without manual curation.

Entities

Sources