AI Model Predicts Technological Combinations from Patent Language
A new study on arXiv introduces TechToken, a transformer-based model that predicts first-time technological combinations by analyzing the collective language of patents. The model treats International Patent Classification codes as words, learning the language of technologies through embedding. Researchers found that forthcoming combinations leave detectable signals in patent language decades in advance, emerging as a collective shift across thousands of patents rather than from any single inventor. TechToken outperforms state-of-the-art models in general representation quality and accurately predicts novel combinations. The research addresses a fundamental challenge in forecasting innovation for science and policy.
Key facts
- TechToken is a transformer-based model for predicting technological combinations.
- It uses International Patent Classification codes as vocabulary.
- Predictive signals are detectable decades before a combination occurs.
- Signals emerge collectively across thousands of patents.
- The model outperforms state-of-the-art in representation quality.
- The study is published on arXiv with ID 2605.04875.
- The research focuses on forecasting innovation for science and policy.
- The approach treats technologies as words in a language model.
Entities
Institutions
- arXiv