This is the codebase for the paper: TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [Paper] [Slides]
TimeSiam pre-trains Siamese encoders to capture temporal correlations between past and current subseries. It benefits from diverse masking augmented subseries and learns time-dependent representations through past-to-current reconstruction. Lineage embeddings are introduced to further foster the learning of diverse temporal correlations.
- In the spirit of learning temporal correlations, we propose TimeSiam that leverages Siamese networks to capture correlations among temporally distanced subseries.
- With Siamese encoders to reconstruct current masked subseries based on past observation and lineage embeddings to capture subseries disparity, TimeSiam can learn diverse time-dependent representations.
- TimeSiam achieves consistent state-of-the-art fine-tuning performance across thirteen standard benchmarks, excelling in various time series analysis tasks.
Figure 1. Overview of TimeSiam.
TimeSiam pre-training involves the following two modules: Siamese subseries sampling and Siamese modeling.
(1) Siamese Subseries Sampling
We construct Siamese subseries pairs by randomly sampling a past sample preceding the current sample in the same time series. Furthermore, we adopt a simple masking augmentation to generate augmented current subseries.
(2) Siamese Modeling
Our Siamese sampling strategy natively derives a past-to-current reconstruction task to reconstruct the masked current subseries.
Under the cooperation of lineage embeddings, TimeSiam can further derive two types of fine-tuning paradigms, covering both fixed and extended input series settings.
(1) Fixed-Input-Multiple-Lineages
TimeSiam innovatively pre-trains Siamese encoders with diverse lineage embeddings to capture different distanced temporal correlations, which allows TimeSiam to derive diverse representations with different lineages for the same input series.
(2) Extended-Input-Multiple-Lineages
TimeSiam can leverage multiple lineage embeddings trained under different temporal distanced pairs to different segments, which can natively conserve the temporal order of different segments. This advantage is achieved by associating each segment with its respective lineage embedding.
- Install Pytorch and necessary dependencies.
pip install -r requirements.txt
-
The datasets can be obtained can be obtained from Google Drive, Tsinghua Cloud, and TSLD.
-
Experiment scripts can be found under the folder
./scripts
.
The reconstruction effect across various datasets with different data distributions, as detailed below.
Figure 2. Showcases of TimeSiam in reconstructing time series from different datasets with 25% masked raito.
We employ Principal Components Analysis (PCA) to elucidate the distribution of temporal representations on the ECL dataset. When time series is fed into a pre-trained Siamese network with different lineage embeddings, the model generates divergent temporal representations that representations derived from the same lineage embeddings tend to be closely clustered together, while representations from different lineage embeddings exhibit significant dissimilarity.
Figure 2. Visualizing the effect of temporal shift representations. (a) Test distribution under three types of lineage embeddings. (b) Test distribution under six types of lineage embeddings.
If you find this repo useful, please cite our paper.
@inproceedings{dong2024timesiam,
title={TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling},
author={Dong, Jiaxiang and Wu, Haixu and Wang, Yuxuan and Qiu, Yunzhong and Zhang, Li and Wang, Jianmin and Long, Mingsheng},
booktitle={ICML},
year={2024}
}
If you have any questions, please contact [email protected].
We appreciate the following github repos a lot for their valuable code base or datasets:
The users need to request permission to download on the TDBrain official website and process the raw data.