Language Models Tested on Philosophical Counterexample Generation

other · 2026-05-07

A recent study published on arXiv (2605.03936) explores the capacity of language models to conduct conceptual analysis using a method of iterated counterexample and repair cycles. The researchers employed one model to create counterexamples for a given definition, while another model refined that definition, repeating this for 20 concepts over thousands of iterations. Findings indicate that although many counterexamples generated by the language models are deemed invalid by both expert humans and a language model judge, the latter accepts approximately twice as many as the human evaluators. Consistency in validity assessments is moderate among humans and between humans and the language model. Prolonged iterations yield more elaborate definitions without enhancing accuracy, and certain concepts remain difficult to define consistently. These results imply that language models can undertake philosophical tasks, albeit with certain constraints.

Key facts

Study on arXiv: 2605.03936
20 concepts tested
Thousands of counterexample-repair cycles
LM judge accepts twice as many counterexamples as humans
Moderate consistency between human and LM judgments
Extended iteration increases verbosity without accuracy gain
Some concepts resist stable definitions
LMs can engage in conceptual analysis

Language Models Tested on Philosophical Counterexample Generation

Key facts

Entities

Institutions

Sources