It's 2nd place solution to Kaggle competition: https://www.kaggle.com/competitions/leap-atmospheric-physics-ai-climsim
This repo contains the code we used to train the models. But it could be really time-consuming. As a result, we also provide trained model file to infer directly.
We have 5 people in our team and each one has his own environment and training/inference details.
- Common part
- ADAM's part
- FDZM's part 1.Preprocessing 2.ForcewithMe's part 3.Joseph's part 4.Max2020 5.Zuiye
- Ensemble
- download kaggle-data into
raw_data/kaggle-data
folder.- test.csv
- sample_submission.csv
- train.csv[if only infer, this one can be skipped.]
- [if you only plan to do the inference, this step can be skipped.] download https://huggingface.co/datasets/LEAP/ClimSim_low-res data into
raw_data/ClimSim_low-res
folder. The expected structure should beraw_data/ClimSim_low-res/train/0009-01/*.nc
- attachment: google-drive link
HARDWARE
- RAM: At least 360 Gi [if only infer, 120Gi is enough]
- GPU: 3 x RTX4090 [if only infer, 1 RTX4090 is enough]
- Disk: At least 1000Gi[if only infer, 300Gi is enough]
SOFTWARE
- Python 3.8.10
- adam_part/requirements.txt
- CUDA 12.2
- nvidia drivers v535.129.03
In this part, it will only do the inference using model file we uploaded.
- STEP1: preprocessing
- Download model file
195.pt
,197.pt
,200.pt
fromadam_attachment
folder of link intoadam_part/src/infer/saved_model
folder
cd adam_part/src/preprocessing
python process_test.py
- STEP2: inference
The outputs are in adam_part/src/infer/subs
folder.
cd adam_part/src/infer
sh run_only_infer.sh
- STEP1: preprocessing
From adam_attachment
folder of link, Download file v3_index.pt
into adam_part/data/middle_result
folder
- I used sampling in creating datasets.
v3_index.pt
is the sampling index which will be used when creating dataset.
cd adam_part/src
sh run_preprocess.sh
- STEP2: training
In this part, it will train from scratch and also do the inference. The outputs are in adam_part/src/outputs folder.
- oof.npy: valid oof
- log.txt: log
- log_old.txt: the log when I trained
- exp{xxx}_new.parquet: submit file
cd adam_part/src/exp
sh run.sh
- Firstly, we run generate_par_data.py and cal_mean_std.py to get kaggle data used for calculating mean.json and std.json for normalization.
- Secondly, we process the Climsim data(.nc) to get the input data for training. All processed data is in "fdzm_part/data/months"
cd fdzm_part/data
sh train_data_generator.sh
- We process the kaggle test data to get the input for models(mean.json & std.json already exist).
- Download all folders from
fdzm_models
folder of link intofdzm_part/weights
folder. So expected structure should befdzm_part/weights/exp907/*.pt
orfdzm_part/weights/021_fork14_moredim_gf/*.pt
cd fdzm_part/data
sh test_data_generator.sh
- RAM: At least 360 Gi [if only infer, 120Gi is enough]
- GPU: 4 x RTX4090 [if only infer, 1 RTX4090 is enough]
install -r requirements.txt
, if you meet problems when installingmamba-ssm
, please check the guidance about install in https://github.com/state-spaces/mamba;- place all the downloaded weights in
fdzm_part/weights
folder. For example, for theforcewithme_gf_reslstm_cv0.790_lb0.785
model, placeforcewithme_gf_reslstm_cv0.790_lb0.785.pt
infdzm_part/weights/forcewithme_gf_reslstm_cv0.790_lb0.785/forcewithme_gf_reslstm_cv0.790_lb0.785.pt
; - All the models have a global id for training scripts, inference scripts, checkpoints, prediction files. You can see more details about the model in the solution document.
- We have provided the checkpoints trained during the LEAP competition. So if you don't want to reproduce the training process, you can skip this part and focuse on inference part. If you want to re-train the models, please attach to the following steps:
cd fdzm_part/exp
bash train_force
. This scripts will train all the 6 models of ForcewithMe. The training contains 2 stages:
(1) Optimize on all of the 368 targets (2) Resume on the weights produced on stage (1), and fine-tuning on 7 groups(60-60-60-60-60-8), respectively.
- The training outputs are in
fdzm_part/outputs
, including checkpoints, oof, prediction files and logs.
cd fdzm_part/infer
bash infer_force.sh
. This scripts will infer all the 6 models trained by ForcewithMe.- The inference outputs(parquet prediction files) are in
fdzm_part/outputs
. - Please move those prediction files(
.parquet
file) into the direction required by the ensemble part manually.
cd fdzm_part/exp
sh train_joseph.sh
cd fdzm_part/infer
sh infer_joseph.sh
My environment requirements need to be consistent with those of Joseph and Forcewithme.
cd fdzm_part/exp
sh train_max2020.sh
cd fdzm_part/infer
sh infer_max2020.sh
In my section, I focused exclusively on the fine-tuning of the LSTM model. Model 10 follows the approach designed by @zui0711. The architecture of Model 10 consists of two connected LSTM layers with different hidden sizes, followed by a MultiheadAttention layer. Models 14, 15, 21, and 22 are all improvements based on the model by @forcewithme, integrating LSTM with skip connections. Model 22 is our team’s highest-performing single model, providing us with the best results in local scoring, Leader Board scoring, and private scoring.
Regarding the learning rate schedule, I used a cosine decay learning rate, with decays occurring at three and six epochs.
For the loss function, I utilized smooth L1 loss with a beta of 0.5.
In deep learning, a continuously discussed topic within multi-objective learning tasks is the interaction between different learning objectives, specifically whether they promote or inhibit each other. In our experiments on the leap dataset, we found that in the early stages of training, seven different target groups promoted each other. However, towards the end of the training, these learning objectives began to interfere with each other, potentially due to complex semantic constraints.
Inspired by the top solution from the 2021 VPP competition, we divided 368 features into seven groups, six of which are series of measurements of different metrics along the atmospheric column, and one group consists of eight unique single targets. After the training process with 364 full outputs was completed, we fine-tuned these groups again. This allowed each model with different architectures to achieve an improvement ranging from 0.0005 to 0.0015. Due to time and resource constraints, we only fine-tuned each group for one epoch.
According to the final hill climb result, my models are not taken into use in the final ensemble submission, so I just simply describe my method without codes.
My models are mainly based one two architectures. The first one consists of 2 LSTM layers followed by a MultiheadAttention layer. The other one consists of 3 parallel Convolutional layers with 3 different kernal sizes and next 2 LSTM layers followed by a MultiheadAttention layer just like the first architecture. My best single model gets LB 0.78696 / PB 0.78205 and ensemle of my own models (with hill climb) gets LB 0.79050/ PB 0.78614.
I design an auxiliary loss we call Diff Loss to help our models learn better. Almost all models of our teams benefit from this. For every group of targets with 60 vertical levels, we caculate the difference of the real values of level N with level N+1 and the difference of predicted values of level N with level N+1. The error of prediction difference to real difference is caculated with smoothl1 loss to describe the changes between two adjacent levels and then added to the main loss. The code is as follows.
with torch.no_grad():
out_puts = model(inputs)
loss = criterion(out_puts, labels)
for i in range(6):
output_diff = out_puts[:, 8+60*i+1:8+60*(i+1)] - out_puts[:, 8+60*i:8+60*(i+1)-1]
label_diff = labels[:, 8+60*i+1:8+60*(i+1)] - labels[:, 8+60*i:8+60*(i+1)-1]
loss += criterion(output_diff, label_diff) / 6
Finally, We use hill climb to search blend weights.
submission/blend/weight_df_dict_all_group_all_v10.pt
saves ensemble weight of each model.
Code
cd submission/blend
python hill_climb_blend.py
This will generate submission/blend/final_blend_v10.parquet
for final submission.
- cv:0.7955
- public leaderborad: 0.79211
- private leaderboard: 0.78856
Weights of best model are following:
exp_id | weight | cv | public leaderborad | private leaderboard |
---|---|---|---|---|
forcewithme_exp32 | 0.166556 | 0.790 | 0.7865 | 0.78398 |
forcewithme_exp37 | 0.158625 | 0.7896 | 0.78618 | 0.78293 |
forcewithme_exp38 | 0.139194 | 0.7897 | 0.78719 | 0.78362 |
max_exp22 | 0.120125 | 0.7908 | 0.78793 | 0.78434 |
Jo_exp912 | 0.111971 | 0.78935 | 0.78528 | 0.78150 |
max_exp21 | 0.104738 | 0.7904 | 0.78752 | 0.78425 |
forcewithme_exp39 | 0.098977 | 0.789 | 0.78699 | 0.78257 |
max_exp14 | 0.093088 | 0.7905 | 0.78641 | 0.78214 |
max_exp10 | 0.092157 | 0.7888 | 0.78619 | 0.78213 |
forcewithme_exp40 | 0.082941 | 0.7885 | 0.7853 | 0.78261 |
max_exp015 | 0.052500 | 0.7905 | 0.78695 | 0.78244 |
adam_exp197 | 0.048994 | 0.7855 | 0.78269 | 0.777 |
adam_exp200 | -0.047132 | 0.7836 | 0.78010 | 0.77434 |
adam_exp195 | -0.049875 | 0.78569 | 0.78334 | 0.77753 |
Jo_exp907 | -0.083779 | 0.7855 | 0.78289 | 0.77873 |
forcewithme_exp18 | -0.089079 | 0.7890 | 0.7863 | 0.78272 |
It's worth noting that we've provided the parameter weights files for all models on google driver. Using these for ensemble submissions results in slightly higher LB and PB scores, by a margin of 0.0002. The discrepancies stem from two main issues:
Firstly, the jo_exp907.pt model file was missing, and the model had to be rerun post-competition, which led to some differences from the original. Secondly, during the competition, an incorrect model file was used under the forcewithme_exp18 (corresponding to the forcewithme_reslstm_cv0.789_lb0.783 folder). This has now been corrected. These two points have caused a very minor difference in our final results. Although we believe this difference does not impact the reproducibility of our overall approach, we mention it here to avoid any confusion.