Machine Learning in Climate Science Research Project

How much Data do S2S-Neural-Networks need? An ENSO Showcase

Description

This Python machine learning repository focuses on forecasting the time evolution of the principical componenets of the sea-surface-temperature/height of the El Niño-Southern Oscillation (ENSO) phenonmena with respect to the Community Earth System Model 2 (CESM2) data. Various sequence-to-sequence (S2S) neural network architectures are being deployed and it incorporates the "Linear Inverse Model" (LIM) as a baseline comparison and to enhance ENSO forecasts by integrating additional data points, aiming to examine whether additional data shows signfiicant improvement to the forecasting task.

Download Raw Data

CESM2 piControl Data

The data should be found here: https://csegweb.cgd.ucar.edu/experiments/public/

Sea-Land-Mask

sftlf_fx_CESM2_historical_r1i1p1f1.nc

Sea Surface Temperature

ts_Amon_CESM2_piControl_r1i1p1f1.nc

Sea Surface Height

zos_Amon_CESM2_piControl_r1i1p1f1.nc

How to run

First, install dependencies

# clone project   
git clone https://github.com/Felix6464/ML_Climate_Science_Research_Project.git

# install project   
cd ML_Climate_Science_Research_Project  
conda create --name <env> --file requirements.txt
conda activate <env>

Next, navigate to any file and run it.

Repository Structure

Linear Inverse Model: In the directory LIM, you'll find:
- LIM_implementation_1994.pdf - The base paper for the LIM implementation.
- LIM_class.py - contains the implementation of the Liner Inverse Model (LIM).
- lim_integration_plots.ipynb - A Jupyter Notebook that:
  - Loads raw data
  - Computes principal components
  - Plots Empirical Orthogonal Functions (EOFs)
  - Crops the data to the respective ENSO region
  - Fits the LIM model
  - Checks for stationary time series
  - Creates multiple plots to validate the implementation
- Plots folder contains plots generated by lim_integration_plots.ipynb in both PNG and SVG formats.
Neural Networks: This directory LIM/neural_networks holds the implementation for deep learning approaches to ENSO prediction. Inside the models folder, you'll find various neural network implementations:
- FNN_model.py - FeedForwardNeuralNetwork
- GRU_enc_dec.py - GatedRecurrentUnit
- LSTM.py - LongShortTermModel
- LSTM_enc_dec.py Encoder-Decoder-LSTM with only hidden state of encoder as input for prediction
- LSTM_enc_dec_input.py Encoder-Decoder-LSTM with last state as input for prediction
Raw Data: The raw_data folder contains the CESM2 piControl data for sea surface height and sea surface temperature, as well as the sea-land mask. Put the raw data here for consequently loading it
Final Models Trained: The final_models_trained folder contains trained PyTorch models used for evaluation. Model names are represented by randomly generated integers for identification aswell as "np" (numpy) or "xr" (xarray) to identify on which type of data it was trained
Synthetic Data: The synthetic_data/data folder contains files for generating synthetic data based on the CESM2 piControl data, as well as the final multidimensional NumPy data used for training.
- checking_timeseries.ipynb - checks shape of timeseries of data and compares it to piControl data
- lim_data_generation.py - generates synthetic data by integration of the LIM using the euler method and save the new data to the data folder
- testing_lim_integration.ipynb - verifys the integration of the LIM for stationarity and plots the resulting time series
Training: The following scripts contain scripts for training various models using synthetic data. These include:
- fnn_training.py: Trains a feedforward neural network model with specified parameters using synthetically generated data on the train split of the data.
- rnn_training.py: Trains a recurrent model with specified parameters on the synthetic data of the train split. The type of recurrent model used can be changed by importing the respective class from the models folder.
Testing: The following scripts contain test scripts used during development to validate and experiment with the code. These include:
- fnn_testing.py: Loads a pretrained feedforward neural network model from file and evaluates it for prediction horizons ranging from 1 to 24. It calculates the loss for each prediction horizon over the test set and plots the loss distribution curve.
- rnn_testing.py: Evaluates a pretrained RNN model over prediction horizons of 1 to 24. The type of recurrent model used can be changed by importing the respective class from the models folder.
- plot_saved_model.py: Loads a pretrained model from file and evaluates it on both the test and train sets for the chosen horizon during training. It also plots the timeseries forecast of the first principal component (can be varied) and the training loss curve of the model.
- testing_combined.py: Loads multiple pretrained models, as well as the LIM, with different architectures and evaluates them simultaneously to create a plot that compares the performance of different models at once.
Utilities: The utilities.py file contains multiple utility functions for data preprocessing, cropping, eigenvalue decomposition, principal component analysis, and small helper tools.
Plots: The plots.py file contains different functions for plotting the prediction horizons and loss curves.

Getting Started

To get started, refer to the documentation and python files within the respective folders for detailed instructions on running experiments and training neural network models.

Dependencies

Make sure you have the required Python libraries and packages installed. You can find the dependencies listed in the requirements.txt file

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.idea		.idea
LIM		LIM
.gitignore		.gitignore
LICENSE		LICENSE
ML_Climate_Science_Research_Project_Felix_Boette.pdf		ML_Climate_Science_Research_Project_Felix_Boette.pdf
README.md		README.md
plotting.mplstyle		plotting.mplstyle
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning in Climate Science Research Project

How much Data do S2S-Neural-Networks need? An ENSO Showcase

Description

Download Raw Data

CESM2 piControl Data

Sea-Land-Mask

Sea Surface Temperature

Sea Surface Height

How to run

Repository Structure

Getting Started

Dependencies

About

Releases

Packages

Languages

License

Felix6464/ML_Climate_Science_Research_Project

Folders and files

Latest commit

History

Repository files navigation

Machine Learning in Climate Science Research Project

How much Data do S2S-Neural-Networks need? An ENSO Showcase

Description

Download Raw Data

CESM2 piControl Data

Sea-Land-Mask

Sea Surface Temperature

Sea Surface Height

How to run

Repository Structure

Getting Started

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages