Language Models Tested on Philosophical Counterexample Generation
A recent study published on arXiv (2605.03936) explores the capacity of language models to conduct conceptual analysis using a method of iterated counterexample and repair cycles. The researchers employed one model to create counterexamples for a given definition, while another model refined that definition, repeating this for 20 concepts over thousands of iterations. Findings indicate that although many counterexamples generated by the language models are deemed invalid by both expert humans and a language model judge, the latter accepts approximately twice as many as the human evaluators. Consistency in validity assessments is moderate among humans and between humans and the language model. Prolonged iterations yield more elaborate definitions without enhancing accuracy, and certain concepts remain difficult to define consistently. These results imply that language models can undertake philosophical tasks, albeit with certain constraints.
Key facts
- Study on arXiv: 2605.03936
- 20 concepts tested
- Thousands of counterexample-repair cycles
- LM judge accepts twice as many counterexamples as humans
- Moderate consistency between human and LM judgments
- Extended iteration increases verbosity without accuracy gain
- Some concepts resist stable definitions
- LMs can engage in conceptual analysis
Entities
Institutions
- arXiv