Chronicle: A Multimodal Foundation Model for Joint Language and Time Series Understanding
Chronicle is a decoder-only transformer with 324 million parameters, developed from the ground up to handle both natural language and time series data within a cohesive framework. In contrast to earlier multimodal models that modify pretrained language models after initial training, Chronicle utilizes identical transformer blocks, attention mechanisms, and residual streams for both types of data. Most of the pretraining involves unimodal batches, which facilitates the emergence of cross-modal capabilities through shared parameters. The model's performance is assessed against both unimodal and multimodal benchmarks, filling a gap in previous research that only compared multimodal models. The paper can be found on arXiv with the identifier 2605.20268.
Key facts
- Chronicle is a 324M-parameter decoder-only transformer.
- It is trained from scratch on natural language and time series.
- Both modalities share the same transformer blocks, attention mechanism, and residual stream.
- Pretraining primarily uses unimodal batches.
- Cross-modal capability emerges from shared parameters.
- The model is evaluated against unimodal and multimodal baselines.
- Prior models adapted pretrained language models post hoc.
- The paper is on arXiv: 2605.20268.
Entities
Institutions
- arXiv