VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones

This repository hosts the VisionTS++ model, a state-of-the-art time series foundation model based on continual pre-training of a visual Masked AutoEncoder (MAE) on large-scale time series data. It excels in multivariate and probabilistic time series forecasting by bridging modality gaps between vision and time series data.

The model was introduced in the paper: VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones

Official GitHub repository: https://github.com/HALF111/VisionTSpp

Experience VisionTS++ directly in your browser on the Hugging Face Space! You can upload your own custom time series CSV file for zero-shot forecasting.

About

VisionTS++ is built upon continual pre-training of a vision model on large-scale time series, addressing key discrepancies in cross-modal transfer from vision to time series. It introduces three key innovations:

  1. Vision-model-based filtering: Identifies high-quality sequences to stabilize pre-training and mitigate the data-modality gap.
  2. Colorized multivariate conversion: Encodes multivariate series as multi-subfigure RGB images to enhance cross-variate modeling.
  3. Multi-quantile forecasting: Uses parallel reconstruction heads to generate quantile forecasts for probabilistic predictions without parametric assumptions.

These innovations allow VisionTS++ to achieve state-of-the-art performance in both in-distribution and out-of-distribution forecasting, demonstrating that vision models can effectively generalize to Time Series Forecasting with appropriate adaptation.

Installation

The VisionTS++ model is available through the visionts package on PyPI.

First, install the package:

pip install visionts

If you want to develop the inference code, you can also build from source:

git clone https://github.com/HALF111/VisionTSpp.git
cd VisionTSpp
pip install -e .

For detailed inference examples and usage with clear visualizations of image reconstruction, please refer to the demo.ipynb notebook in the official GitHub repository.

Citation

If you're using VisionTS++ or VisionTS in your research or applications, please cite them using this BibTeX:

@misc{chen2024visionts,
      title={VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters}, 
      author={Mouxiang Chen and Lefei Shen and Zhuo Li and Xiaoyun Joy Wang and Jianling Sun and Chenghao Liu},
      year={2024},
      eprint={2408.17253},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2408.17253}, 
}

@misc{shen2025visiontspp,
      title={VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Visual Backbones}, 
      author={Lefei Shen and Mouxiang Chen and Xu Liu and Han Fu and Xiaoxue Ren and Jianling Sun and Zhuo Li and Chenghao Liu},
      year={2025},
      eprint={2508.04379},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.04379}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Lefei/VisionTSpp 1