Skip to content

Pipelines to store raster data as Cloud-Optimized GeoTiffs in private Azure storage

Notifications You must be signed in to change notification settings

OCHA-DAP/ds-raster-pipelines

Repository files navigation

Pipelines for Analysis-Ready COGs

This repository contains code to create stores of cloud-optimized GeoTiFFs (COGs) from input raster data. Data is ingested from various sources and stored in a private Azure Storage Container.

Data Sources

1. ECMWF SEAS5 Seasonal Forecasts

These forecasts contain 0.4 degree resolution global data on precipitation rates across 0-6 month lead-times. Historical data from as early as 1981 has been accessed via ECMWF's Meteorological Archival and Retrieval System (MARS). See this User Manual for more details. Note: For more timely access than is provided by MARS, recent forecast data is populated from a private data order from ECMWF.

2. ECMWF ERA5 Reanalysis

The ERA5 reanalysis provides averaged monthly and hourly estimates of total precipitation across a 0.25 degree global grid. See these docs for more information on the full family of ERA5 datasets.

3. IMERG Global Precipitation Measurement

NASA's Integrated Multi-satellitE Retrievals for GPM (IMERG) generates estimated precipitation over the majority of Earth's surface based on information from the GPM satellite constellation. See this Technical Spec for more details.

4. FloodScan: Near real-time and historical flood mapping

Atmospheric and Environmental Research (AER) FloodScan's flood extent depiction products provide daily algorithmic delineation of temporarily flooded and unflooded areas from satellite remote sensing observations. See this Technical Spec for more details.

Usage

All pipelines can be run as a CLI, via the run_pipeline.py entrypoint. For detailed usage instructions and options, see our Pipeline Usage Guide.

Pipelines are run in production as Jobs on Databricks. Please reach out if you require access.

Development Setup

  1. Clone this repository and create a virtual Python (3.12.4) environment:
git clone https://github.com/OCHA-DAP/ds-raster-pipelines.git
python3 -m venv venv
source venv/bin/activate
  1. Install Python dependencies:
pip install -r requirements.txt
pip install -r requirements-dev.txt
  1. If processing .grib files using xarray, the cfgrib engine also requires an ecCodes system dependency. This can be installed with
sudo apt-get install libeccodes-dev
  1. Create a local .env file with the following environment variables:
# Connection to Azure blob storage
DSCI_AZ_SAS_DEV=<provided-on-request>
DSCI_AZ_SAS_PROD=<provided-on-request>

# MARS API requests
ECMWF_API_URL=<provided-on-request>
ECMWF_API_EMAIL=<provided-on-request>
ECMWF_API_KEY=<provided-on-request>

# ECMWF AWS bucket
AWS_ACCESS_KEY_ID=<provided-on-request>
AWS_SECRET_ACCESS_KEY=<provided-on-request>
AWS_BUCKET_NAME=<provided-on-request>
AWS_DEFAULT_REGION=<provided-on-request>

# CDS API credentials
CDSAPI_URL=<provided-on-request>
CDSAPI_KEY=<provided-on-request>

# IMERG Authentication
IMERG_USERNAME=<provided-on-request>
IMERG_PASSWORD=<provided-on-request>

# FloodScan access urls
FLOODSCAN_SFED_URL=<provided-on-request>
FLOODSCAN_MFED_URL=<provided-on-request>


CONTAINER_RASTER='raster'

Pre-Commit

All code is formatted according to black and flake8 guidelines. The repo is set-up to use pre-commit. Before you start developing in this repository, you will need to run

pre-commit install

You can run all hooks against all your files using

pre-commit run --all-files

About

Pipelines to store raster data as Cloud-Optimized GeoTiffs in private Azure storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages