VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones
This repository hosts the VisionTS++ model, a state-of-the-art time series foundation model based on continual pre-training of a visual Masked AutoEncoder (MAE) on large-scale time series data. It excels in multivariate and probabilistic time series forecasting by bridging modality gaps between vision and time series data.
The model was introduced in the paper: VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones
Official GitHub repository: https://github.com/HALF111/VisionTSpp
Experience VisionTS++ directly in your browser on the Hugging Face Space! You can upload your own custom time series CSV file for zero-shot forecasting.
About
VisionTS++ is built upon continual pre-training of a vision model on large-scale time series, addressing key discrepancies in cross-modal transfer from vision to time series. It introduces three key innovations:
- Vision-model-based filtering: Identifies high-quality sequences to stabilize pre-training and mitigate the data-modality gap.
- Colorized multivariate conversion: Encodes multivariate series as multi-subfigure RGB images to enhance cross-variate modeling.
- Multi-quantile forecasting: Uses parallel reconstruction heads to generate quantile forecasts for probabilistic predictions without parametric assumptions.
These innovations allow VisionTS++ to achieve state-of-the-art performance in both in-distribution and out-of-distribution forecasting, demonstrating that vision models can effectively generalize to Time Series Forecasting with appropriate adaptation.


Installation
The VisionTS++ model is available through the visionts
package on PyPI.
First, install the package:
pip install visionts
If you want to develop the inference code, you can also build from source:
git clone https://github.com/HALF111/VisionTSpp.git
cd VisionTSpp
pip install -e .
For detailed inference examples and usage with clear visualizations of image reconstruction, please refer to the demo.ipynb
notebook in the official GitHub repository.
Citation
If you're using VisionTS++ or VisionTS in your research or applications, please cite them using this BibTeX:
@misc{chen2024visionts,
title={VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters},
author={Mouxiang Chen and Lefei Shen and Zhuo Li and Xiaoyun Joy Wang and Jianling Sun and Chenghao Liu},
year={2024},
eprint={2408.17253},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2408.17253},
}
@misc{shen2025visiontspp,
title={VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones},
author={Lefei Shen and Mouxiang Chen and Xu Liu and Han Fu and Xiaoxue Ren and Jianling Sun and Zhuo Li and Chenghao Liu},
year={2025},
eprint={2508.04379},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.04379},
}