This Python machine learning repository focuses on forecasting the time evolution of the principical componenets of the sea-surface-temperature/height of the El Niño-Southern Oscillation (ENSO) phenonmena with respect to the Community Earth System Model 2 (CESM2) data. Various sequence-to-sequence (S2S) neural network architectures are being deployed and it incorporates the "Linear Inverse Model" (LIM) as a baseline comparison and to enhance ENSO forecasts by integrating additional data points, aiming to examine whether additional data shows signfiicant improvement to the forecasting task.
The data should be found here: https://csegweb.cgd.ucar.edu/experiments/public/
- sftlf_fx_CESM2_historical_r1i1p1f1.nc
- ts_Amon_CESM2_piControl_r1i1p1f1.nc
- zos_Amon_CESM2_piControl_r1i1p1f1.nc
First, install dependencies
# clone project
git clone https://github.com/Felix6464/ML_Climate_Science_Research_Project.git
# install project
cd ML_Climate_Science_Research_Project
conda create --name <env> --file requirements.txt
conda activate <env>
Next, navigate to any file and run it.
-
Linear Inverse Model: In the directory
LIM
, you'll find:- LIM_implementation_1994.pdf - The base paper for the LIM implementation.
- LIM_class.py - contains the implementation of the Liner Inverse Model (LIM).
- lim_integration_plots.ipynb - A Jupyter Notebook that:
- Loads raw data
- Computes principal components
- Plots Empirical Orthogonal Functions (EOFs)
- Crops the data to the respective ENSO region
- Fits the LIM model
- Checks for stationary time series
- Creates multiple plots to validate the implementation
- Plots folder contains plots generated by lim_integration_plots.ipynb in both PNG and SVG formats.
-
Neural Networks: This directory
LIM/neural_networks
holds the implementation for deep learning approaches to ENSO prediction. Inside themodels
folder, you'll find various neural network implementations:FNN_model.py
- FeedForwardNeuralNetworkGRU_enc_dec.py
- GatedRecurrentUnitLSTM.py
- LongShortTermModelLSTM_enc_dec.py
Encoder-Decoder-LSTM with only hidden state of encoder as input for predictionLSTM_enc_dec_input.py
Encoder-Decoder-LSTM with last state as input for prediction
-
Raw Data: The
raw_data
folder contains the CESM2 piControl data for sea surface height and sea surface temperature, as well as the sea-land mask. Put the raw data here for consequently loading it -
Final Models Trained: The
final_models_trained
folder contains trained PyTorch models used for evaluation. Model names are represented by randomly generated integers for identification aswell as "np" (numpy) or "xr" (xarray) to identify on which type of data it was trained -
Synthetic Data: The
synthetic_data/data
folder contains files for generating synthetic data based on the CESM2 piControl data, as well as the final multidimensional NumPy data used for training.checking_timeseries.ipynb
- checks shape of timeseries of data and compares it to piControl datalim_data_generation.py
- generates synthetic data by integration of the LIM using the euler method and save the new data to thedata
foldertesting_lim_integration.ipynb
- verifys the integration of the LIM for stationarity and plots the resulting time series
-
Training: The following scripts contain scripts for training various models using synthetic data. These include:
fnn_training.py
: Trains a feedforward neural network model with specified parameters using synthetically generated data on the train split of the data.rnn_training.py
: Trains a recurrent model with specified parameters on the synthetic data of the train split. The type of recurrent model used can be changed by importing the respective class from themodels
folder.
-
Testing: The following scripts contain test scripts used during development to validate and experiment with the code. These include:
fnn_testing.py
: Loads a pretrained feedforward neural network model from file and evaluates it for prediction horizons ranging from 1 to 24. It calculates the loss for each prediction horizon over the test set and plots the loss distribution curve.rnn_testing.py
: Evaluates a pretrained RNN model over prediction horizons of 1 to 24. The type of recurrent model used can be changed by importing the respective class from themodels
folder.plot_saved_model.py
: Loads a pretrained model from file and evaluates it on both the test and train sets for the chosen horizon during training. It also plots the timeseries forecast of the first principal component (can be varied) and the training loss curve of the model.testing_combined.py
: Loads multiple pretrained models, as well as the LIM, with different architectures and evaluates them simultaneously to create a plot that compares the performance of different models at once.
-
Utilities: The
utilities.py
file contains multiple utility functions for data preprocessing, cropping, eigenvalue decomposition, principal component analysis, and small helper tools. -
Plots: The
plots.py
file contains different functions for plotting the prediction horizons and loss curves.
To get started, refer to the documentation and python files within the respective folders for detailed instructions on running experiments and training neural network models.
Make sure you have the required Python libraries and packages installed. You can find the dependencies listed in the requirements.txt
file