Skip to content

Latest commit

 

History

History
53 lines (31 loc) · 2.51 KB

README.md

File metadata and controls

53 lines (31 loc) · 2.51 KB

AtmoSeer

About

This project provides a pipeline to build rainfall forecast models. The pipeline can be configured with different meteorological data sources.

Install

In the root directory of this repository, type the following command (you must have conda installed in your system):

./setup.sh

Project pipeline

The project pipeline is defined as a sequence of three steps: (1) data retrieving, (2) data pre-processing and (3) model training. These steps are implemented as Python scripts in the ./src directory.

Data retrieval

All datasets retrieved and/or generated by the scripts will be stored in the ./data folder.

  • retrieve_ws_cor.py: This script retrieves observation from a user-provided weather station.
  • retrieve_ws_inmet.py: This script retrieves observations for from a user-provided weather station.
  • retrieve_as.py: this script retrieves atmospheric sounding data.
  • retrieve_ERA5.py: this script retrieves numerical simulation data from the ERA5 portal.
Script gen_sounding_indices.py

This script will generate atmospheric instability indices for the data retrieveed by the script retrieve_as.py. Data from the SBGL sounding (located at the Galeão Airport, Rio de Janeiro - Brazil) will be used to calculate atmospheric instability indices, generating a new dataset. This new dataset contains one entry per sounding probe. SBGL sounding station produces two probes per day (at 00:00h and 12:00h UTC). Each entry in the produced contains the values of the computed instability indices for one probe. The following instability indices are computed:

  • CAPE
  • CIN
  • Lift
  • k
  • Total totals
  • Show alter

Preprocessing

The preprocessing scripts are responsible for performing several operations on the original dataset, such as creating variables or aggregating data, which can be interesting for model training and its final result.

Dataset building

These scripts will build the train, validation and test dataset from the times series produced in the previous steps. These are the datasets to be given as input to the model training step.

Model training and evaluation

The model generation script is responsible for performing the training and exporting the results obtained by the model after testing.

r2t

Notebooks

There are several Jupyter Notebooks in the notebooks directory. They were used for initial experiments and explorarory data analisys. These notebooks are not guaranteed to run 100% correctly due to the subsequent code refactor.