Skip to content

Commit

Permalink
Feature/v2024-06-bcs (#227)
Browse files Browse the repository at this point in the history
* update to GC 14.4.0

* parse_yaml fixes

* createRunDir.sh fix

* small changes to latitude filler

* update comments

* GFED 2024 fix

* use condaFile

* switch hc order, update date

* update enddate for available geos-fp

* fix for missing met files

* more cores and mem

* update BC version number
  • Loading branch information
nicholasbalasus authored Jun 21, 2024
1 parent 023deba commit e18648e
Show file tree
Hide file tree
Showing 8 changed files with 71 additions and 70 deletions.
6 changes: 3 additions & 3 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -193,13 +193,13 @@ RestartDownload: true

## Path to initial GEOS-Chem restart file + prefix
## ("YYYYMMDD_0000z.nc4" will be appended)
RestartFilePrefix: "/home/ubuntu/ExtData/BoundaryConditions/v2024-03/GEOSChem.BoundaryConditions."
RestartFilePreviewPrefix: "/home/ubuntu/ExtData/BoundaryConditions/v2024-03/GEOSChem.BoundaryConditions."
RestartFilePrefix: "/home/ubuntu/ExtData/BoundaryConditions/v2024-06/GEOSChem.BoundaryConditions."
RestartFilePreviewPrefix: "/home/ubuntu/ExtData/BoundaryConditions/v2024-06/GEOSChem.BoundaryConditions."

## Path to GEOS-Chem boundary condition files (for regional simulations)
## BCversion will be appended to the end of this path. ${BCpath}/${BCversion}
BCpath: "/home/ubuntu/ExtData/BoundaryConditions"
BCversion: "v2024-03"
BCversion: "v2024-06"

## Options to download missing GEOS-Chem input data from AWS S3
## NOTE: You will be charged if your ec2 instance is not in the
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting-started/imi-config-file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ the IMI on a local cluster<../advanced/local-cluster>`).
* - ``BCpath``
- Path to GEOS-Chem boundary condition files (for regional simulations).
* - ``BCversion``
- Version of TROPOMI smoothed boundary conditions to use (e.g. ``v2023-04``). Note: this will be appended onto BCpath as a subdirectory.
- Version of TROPOMI smoothed boundary conditions to use (e.g. ``v2024-06``). Note: this will be appended onto BCpath as a subdirectory.
* - ``PreviewDryRun``
- Boolean to download missing GEOS-Chem data for the preview run. Default value is ``true``.
* - ``SpinupDryRun``
Expand Down
6 changes: 3 additions & 3 deletions envs/Harvard-Cannon/config.harvard-cannon.global_inv.yml
Original file line number Diff line number Diff line change
Expand Up @@ -199,13 +199,13 @@ RestartDownload: false

## Path to initial GEOS-Chem restart file + prefix
## ("YYYYMMDD_0000z.nc4" will be appended)
RestartFilePrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2023-10/GEOSChem.BoundaryConditions."
RestartFilePreviewPrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2023-10/GEOSChem.BoundaryConditions."
RestartFilePrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2024-06/GEOSChem.BoundaryConditions."
RestartFilePreviewPrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2024-06/GEOSChem.BoundaryConditions."

## Path to GEOS-Chem boundary condition files (for regional simulations)
## BCversion will be appended to the end of this path. ${BCpath}/${BCversion}
BCpath: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions"
BCversion: "v2024-03"
BCversion: "v2024-06"

## Options to download missing GEOS-Chem input data from AWS S3
## NOTE: Must have AWS CLI enabled
Expand Down
6 changes: 3 additions & 3 deletions envs/Harvard-Cannon/config.harvard-cannon.yml
Original file line number Diff line number Diff line change
Expand Up @@ -201,13 +201,13 @@ RestartDownload: false

## Path to initial GEOS-Chem restart file + prefix
## ("YYYYMMDD_0000z.nc4" will be appended)
RestartFilePrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2023-10/GEOSChem.BoundaryConditions."
RestartFilePreviewPrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2023-10/GEOSChem.BoundaryConditions."
RestartFilePrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2024-06/GEOSChem.BoundaryConditions."
RestartFilePreviewPrefix: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions/v2024-06/GEOSChem.BoundaryConditions."

## Path to GEOS-Chem boundary condition files (for regional simulations)
## BCversion will be appended to the end of this path. ${BCpath}/${BCversion}
BCpath: "/n/holylfs05/LABS/jacob_lab/imi/ch4/tropomi-boundary-conditions"
BCversion: "v2024-03"
BCversion: "v2024-06"

## Options to download missing GEOS-Chem input data from AWS S3
## NOTE: Must have AWS CLI enabled
Expand Down
15 changes: 8 additions & 7 deletions src/write_BCs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@
- if your simulation starts on 1 April 2018, this won't be used (`GEOSChem.Restart.20180401_0000z.nc4` will).
- this file comes from a CH4 simulation by Todd Mooring that is constrained by NOAA surface observations.
- if your simulation starts on another date, it should use a restart file from the simulation that started on 1 April 2018.
- this is accomodated by the fact that the simulation is setup to write daily restart files.
- to determine what day your restart file should be for, subtract 15 days from `startDate`.
- for example, if `startDate: 20230501`, your restart file should be `GEOSChem.Restart.202304016_0000z.nc4`.
- this allows for a 15 day previous average (using 17 April 2023 to 1 May 2023) and the 1 day at the start where no first hour will be written by GEOS-Chem.
- this is accommodated by the fact that the simulation run here is setup to write daily restart files.
- to determine what day your restart file should be for, subtract 30 days from `startDate`.
- for example, if `startDate: 20230601`, your restart file should be `GEOSChem.Restart.20230502_0000z.nc4`.
- this allows for a 15 day previous average (using 18 May 2023 to 1 June 2023) and the 1 day at the start where no first hour will be written by GEOS-Chem.
- in rare cases of low data density, a 30 day previous average is used (which would use 3 May 2023 to 1 June 2023).
- `debug` - whether or not to delete the `debug.log`.
- this inlcudes information about your environment file and the build of GEOS-Chem.
- this includes information about your environment file and the build of GEOS-Chem.
- all important information and errors are written to `boundary_conditions.log`.
2. Run `sbatch -p huce_cascade run_boundary_conditions.sh`.
- GEOS-Chem will be run first (2.0 x 2.5, GEOS-FP, CH4, 47 L, daily restart files).
Expand All @@ -28,7 +29,7 @@
## Directions for doing this operationally at Harvard
1. Run from `20180401` until the last day you have both satellite data and met fields.
- example: `startDate: 20180401`, `endDate: 20230531`.
2. **Before deleting your `workDir`**, in `workDir/gc_run/Restarts/`, copy the restart file from > 15 days before your the next day you will need boundary conditions to a persistent storage location.
2. **Before deleting your `workDir`**, in `workDir/gc_run/Restarts/`, copy the restart file from > 15 days before the next day you will need boundary conditions to a persistent storage location.
- example: `GEOSChem.Restart.20230430_0000z.nc4`.
3. When the satellite data and met fields become available, generate boundary conditions up until your new end date, but start with a little overlap to check for consistentcy.
3. When the satellite data and met fields become available, generate boundary conditions up until your new end date, but start with a little overlap to check for consistency.
- example: `startDate: 20230515_0000z.nc4`, `endDate: 20230630_0000z.nc4` (using `GEOSChem.Restart.20230430_0000z.nc4`), then check consistency between your new and previously generated boundary conditions for `20230515` until `20230531`.
11 changes: 6 additions & 5 deletions src/write_BCs/config_boundary_conditions.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
## Date range (inclusive) that you want to generate BCs for
startDate: "20180401"
endDate: "20231231"
endDate: "20240229"

## A directory with a lot of space where you can write the BCs (and intermediate files)
workDir: "/n/holylfs05/LABS/jacob_lab/Users/lestrada/IMI/BCs/v2024-01-temp-generation/IMI_BCs_14.2.3_v2024-01-temp"
workDir: "/n/holylfs05/LABS/jacob_lab/Users/nbalasus/v2024-06-IMI-BCs"

## Paths to TROPOMI data and blended TROPOMI+GOSAT data
tropomiDir: "/n/holylfs05/LABS/jacob_lab/Lab/imi/ch4/tropomi"
blendedDir: "/n/holylfs05/LABS/jacob_lab/Lab/imi/ch4/blended"
tropomiDir: "/n/holylfs05/LABS/jacob_lab/Everyone/imi/ch4/tropomi"
blendedDir: "/n/holylfs05/LABS/jacob_lab/Everyone/imi/ch4/blended"

## Conda environment to use, GEOS-Chem environment to use, GEOS-Chem input data path, and partitions to run on
condaEnv: imi_env
geosChemEnv: "/n/home03/lestrada/holylfs/IMI/BCs/v2024-01-temp-generation/integrated_methane_inversion/envs/Harvard-Cannon/gcclassic.rocky+gnu12.minimal.env"
condaFile: ~/.bashrc
geosChemEnv: "../../envs/Harvard-Cannon/gcclassic.rocky+gnu12.minimal.env"
geosChemDataPath: "/n/holyscratch01/external_repos/GEOS-CHEM/gcgrid/gcdata/ExtData"
partition: huce_cascade,seas_compute,huce_ice

Expand Down
66 changes: 32 additions & 34 deletions src/write_BCs/run_boundary_conditions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,12 @@
cwd="$(pwd)"

# Read in the config file and source the environment file
eval $(python ../utilities/parse_yaml.py config_boundary_conditions.yml)
condaEnv=$(grep -Po 'condaEnv:\s*\K.*' config_boundary_conditions.yml)
condaFile=$(grep -Po 'condaFile:\s*\K.*' config_boundary_conditions.yml)
condaFile=$(eval echo "$condaFile")
source ${condaFile}
conda activate ${condaEnv}
eval $(python ../../src/utilities/parse_yaml.py config_boundary_conditions.yml)
source ${geosChemEnv}
echo "Environment file --> ${geosChemEnv}" >> "${cwd}/boundary_conditions.log"

Expand All @@ -26,43 +31,46 @@ mkdir -p "${workDir}/tropomi-boundary-conditions"
mkdir -p "${workDir}/blended-boundary-conditions"
cd "${workDir}"

# Get GCClassic v14.2.3 and create the run directory
# Get GCClassic v14.4.0 and create the run directory
git clone https://github.com/geoschem/GCClassic.git
cd GCClassic
git checkout 14.2.3
git checkout 14.4.0
git submodule update --init --recursive
cd run
runDir="gc_run"
c="3\n2\n2\n2\n${workDir}\n${runDir}\nn\n" # CH4, GEOS-FP, 2.0 x 2.5, 47L
c="9\n2\n2\n2\n${workDir}\n${runDir}\nn\n" # CH4, GEOS-FP, 2.0 x 2.5, 47L
printf ${c} | ./createRunDir.sh
cd "${workDir}/${runDir}/build"
cmake ../CodeDir -DRUNDIR=..
make -j
make install
cd "${workDir}/${runDir}"

# Modify HISTORY.rc (hourly instantaneous CH4/pressure and 3-hourly BCs)
# Modify HISTORY.rc (hourly instantaneous CH4/pressure/air and 3-hourly BCs)
sed -i -e "s|'CH4',|#'CH4',|g" \
-e "s|'Metrics',|#'Metrics',|g" \
-e "s|'StateMet',|#'StateMet',|g" \
-e "s|#'LevelEdgeDiags',|'LevelEdgeDiags',|g" \
-e "s|Restart.frequency: 'End',|Restart.frequency: '00000001 000000',|g" \
-e "s|Restart.duration: 'End',|Restart.duration: '00000001 000000',|g" \
-e "s|SpeciesConc.frequency: 00000100 000000|SpeciesConc.frequency: 00000000 010000|g" \
-e "s|SpeciesConc.duration: 00000100 000000|SpeciesConc.duration: 00000000 010000|g" \
-e "s|SpeciesConc.mode: 'time-averaged'|SpeciesConc.mode: 'instantaneous'|g" \
-e "s|'SpeciesConcMND_?ALL? ',|#'SpeciesConcMND_?ALL? ',|g" \
-e "s|LevelEdgeDiags.frequency: 00000100 000000|LevelEdgeDiags.frequency: 00000000 010000|g" \
-e "s|LevelEdgeDiags.duration: 00000100 000000|LevelEdgeDiags.duration: 00000000 010000|g" \
-e "s|LevelEdgeDiags.mode: 'time-averaged'|LevelEdgeDiags.mode: 'instantaneous'|g" \
-e "s|#'BoundaryConditions',|'BoundaryConditions',|g" HISTORY.rc
-e "s|00000100 000000|00000000 010000|g" \
-e "s|time-averaged|instantaneous|g" \
-e "s|Met_CMFMC|Met_PEDGE|g" \
-e "s|#'BoundaryConditions',|'BoundaryConditions',|g" \
-e "s|'Met_AD ',|'Met_AIRVOL ',|g" HISTORY.rc

# Remove unnecessary StateMet and then LevelEdge variables
sed -i '269,344d' HISTORY.rc
sed -i '199,204d' HISTORY.rc

# Modify HEMCO_Config.rc so that GEOS-Chem can run into 2024
sed -i '/GFED4/s/ RF/ C/g' HEMCO_Config.rc

# Modify geoschem_config.yml
# - run GC earlier than you want BCs to accomodate a 15 day average going back in time
# - e.g., the BCs for 15 May 2023 require 1 May 2023-15 May 2023 data
# - run GC earlier than you want BCs to accomodate a 15/30 day average going back in time
# - e.g., the BCs for 15 May 2023 require 1 May 2023-15 May 2023 data (and sometimes 16 April 2023-15 May 2023)
# - run one day earlier than that because GEOS-Chem won't write SpeciesConc/LevelEdgeDiag for t = 0
if [[ ${startDate} -ge "20180416" ]]; then
gcStartDate=$(date -d "$startDate -15 days" +%Y%m%d)
if [[ ${startDate} -ge "20180501" ]]; then
gcStartDate=$(date -d "$startDate -30 days" +%Y%m%d)
else
gcStartDate="20180401"
fi
Expand Down Expand Up @@ -92,33 +100,23 @@ else
fi
fi

# Remove debug log file if not debug (everything written via >> debug.log 2>&1)
# Remove debug log file if not debug (everything written via >> debug.log 2>&1)
if ! ${debug}; then
rm "${cwd}/debug.log"
fi

# Modify HEMCO_Config.rc to allow for post-2022 GFED4
sed -i -e "s|DM_TEMP 1997-2022/1-12/01/0 RF|DM_TEMP 1997-2022/1-12/01/0 C|g" \
-e "s|DM_AGRI 1997-2022/1-12/01/0 RF|DM_AGRI 1997-2022/1-12/01/0 C|g" \
-e "s|DM_DEFO 1997-2022/1-12/01/0 RF|DM_DEFO 1997-2022/1-12/01/0 C|g" \
-e "s|DM_BORF 1997-2022/1-12/01/0 RF|DM_BORF 1997-2022/1-12/01/0 C|g" \
-e "s|DM_PEAT 1997-2022/1-12/01/0 RF|DM_PEAT 1997-2022/1-12/01/0 C|g" \
-e "s|DM_SAVA 1997-2022/1-12/01/0 RF|DM_SAVA 1997-2022/1-12/01/0 C|g" \
-e "s|GFED_FRACDAY 2003-2022/1-12/1-31/0 RF|GFED_FRACDAY 2003-2022/1-12/1-31/0 C|g" \
-e "s|GFED_FRAC3HR 2003-2022/1-12/1/0-23 RF|GFED_FRAC3HR 2003-2022/1-12/1/0-23 C|g" HEMCO_Config.rc

# Modify and submit the run script
cp runScriptSamples/operational_examples/harvard_cannon/geoschem.run .
sed -i -e "s|huce_intel,seas_compute,shared|${partition}|g" \
-e "s|--mem=15000|--mem=64000|g" \
sed -i -e "s|sapphire,huce_cascade,seas_compute,shared|${partition}|g" \
-e "s|--mem=15000|--mem=128000|g" \
-e "s|-t 0-12:00|-t 07-00:00|g"\
-e "s|-c 8|-c 24|g" geoschem.run
-e "s|-c 8|-c 48|g" geoschem.run
sbatch -W geoschem.run; wait;

# Write the boundary conditions using write_boundary_conditions.py
cd "${cwd}"
sbatch -W -J blended -o boundary_conditions.log --open-mode=append -p ${partition} -t 7-00:00 --mem 96000 -c 40 --wrap "source ~/.bashrc; conda activate $condaEnv; python write_boundary_conditions.py True $blendedDir $gcStartDate $gcEndDate"; wait; # run for Blended TROPOMI+GOSAT
sbatch -W -J tropomi -o boundary_conditions.log --open-mode=append -p ${partition} -t 7-00:00 --mem 96000 -c 40 --wrap "source ~/.bashrc; conda activate $condaEnv; python write_boundary_conditions.py False $tropomiDir $gcStartDate $gcEndDate"; wait; # run for TROPOMI data
sbatch -W -J blended -o boundary_conditions.log --open-mode=append -p ${partition} -t 7-00:00 --mem 96000 -c 40 --wrap "source $condaFile; conda activate $condaEnv; python write_boundary_conditions.py True $blendedDir $gcStartDate $gcEndDate"; wait; # run for Blended TROPOMI+GOSAT
sbatch -W -J tropomi -o boundary_conditions.log --open-mode=append -p ${partition} -t 7-00:00 --mem 96000 -c 40 --wrap "source $condaFile; conda activate $condaEnv; python write_boundary_conditions.py False $tropomiDir $gcStartDate $gcEndDate"; wait; # run for TROPOMI data
echo "" >> "${cwd}/boundary_conditions.log"
echo "Blended TROPOMI+GOSAT boundary conditions --> ${workDir}/blended-boundary-conditions" >> "${cwd}/boundary_conditions.log"
echo "TROPOMI boundary conditions --> ${workDir}/tropomi-boundary-conditions" >> "${cwd}/boundary_conditions.log"
29 changes: 15 additions & 14 deletions src/write_BCs/write_boundary_conditions.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,8 @@ def get_TROPOMI_times(filename):
def apply_tropomi_operator_to_one_tropomi_file(filename):

"""
Run apply_tropomi_operator from src/inversion_scripts/operators/TROPOMI_operator.py for a single TROPOMI file (then saves it to a pkl file)
Run apply_tropomi_operator from src/inversion_scripts/operators/TROPOMI_operator.py for a single TROPOMI file
Example input (str): S5P_RPRO_L2__CH4____20220725T152751_20220725T170921_24775_03_020400_20230201T100624.nc
Example output: write the file config["workdir"]/step1/S5P_RPRO_L2__CH4____20220725T152751_20220725T170921_24775_03_020400_20230201T100624_GCtoTROPOMI.pkl
"""

result = apply_tropomi_operator(
Expand Down Expand Up @@ -137,28 +136,28 @@ def calculate_bias(daily_means):
bias = bias.rolling(lat=5, # five lat grid boxes (10 degrees)
lon=5, # five lon grid boxes (12.5 degrees)
center=True, # five boxes includes the one we are cented on
min_periods=25/2 # half of the grid cells have a value to not output NaN
min_periods=25/2 # half (13) of the grid cells have a value to not output NaN
).mean(skipna=True)

# Smooth temporally
bias_15 = bias.rolling(time=15, # average 15 days back in time (including the time we are centered on)
min_periods=1, # only one of the time values must have a value to not output NaN
).mean(skipna=True)
min_periods=1, # only one of the time values must have a value to not output NaN
).mean(skipna=True)

bias_30 = bias.rolling(time=30, # average 30 days back in time (including the time we are centered on)
min_periods=1, # only one of the time values must have a value to not output NaN
).mean(skipna=True)
min_periods=1, # only one of the time values must have a value to not output NaN
).mean(skipna=True)

bias = bias_15.fillna(bias_30) # fill in NaN values with the 30 day average
bias = bias_15.fillna(bias_30) # fill in NaN values with the 30 day average

# Create a dataarray with latitudinal average for each time step
# We will fill the NaN values in bias with these averages
nan_value_filler_2d = bias.copy()
nan_value_filler_2d = (nan_value_filler_2d.where(nan_value_filler_2d.count("lon") >= 30) # there needs to be 30 grid boxes
.mean(dim=["lon"], skipna=True) # at this lat to define a mean
.interpolate_na(dim="lat", method="nearest") # fill in "middle" NaN values
.bfill(dim="lat") # fill in NaN values towards -90 deg
.ffill(dim="lat") # fill in NaN values towards +90 deg
nan_value_filler_2d = (nan_value_filler_2d.where(nan_value_filler_2d.count("lon") >= 15) # there needs to be 15 grid boxes
.mean(dim=["lon"], skipna=True) # at this lat to define a mean
.interpolate_na(dim="lat", method="linear") # fill in "middle" NaN values
.bfill(dim="lat") # fill in NaN values towards -90 deg
.ffill(dim="lat") # fill in NaN values towards +90 deg
)

# Expand to 3 dimensions
Expand Down Expand Up @@ -195,6 +194,7 @@ def write_bias_corrected_files(bias):
& (np.datetime64(re.search(r'(\d{8})_(\d{4}z)', f).group(1)) <= np.datetime64(config["endDate"]))
)
]
assert len(files) == len(strdate), "ERROR -> bias dimension is not the same as number of boundary condition files"

# For each file, remove the total column bias from each level of the GEOS-Chem boundary condition
for filename in files:
Expand Down Expand Up @@ -239,8 +239,9 @@ def write_bias_corrected_files(bias):
(2) Make a gridded (2.0 x 2.5 x daily) field of the bias between TROPOMI and GEOS-Chem
- subtract the TROPOMI and GEOS-Chem grids from part 1 to get a starting point for the bias
- smooth this field spatially (5 lon grid boxes, 5 lat grid boxes) then temporally (15 days backwards)
- use a temporal smoothing of 30 days back in time if there are no data in the past 15 days
- fill NaN values with the latitudinal average at that time
- for a latitudinal average to be defined, there must be >= 30 grid cells at that latitude
- for a latitudinal average to be defined, there must be >= 15 grid cells at that latitude
- when a latitudinal average cannot be found, the closest latitudinal average is used
(3) Write the boundary conditions
- using the bias from Part 2, subtract the (GC-TROPOMI) bias from the GC boundary conditions
Expand Down

0 comments on commit e18648e

Please sign in to comment.