Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which is the current status of the tool? #7

Open
josiasritter opened this issue Jul 25, 2022 · 75 comments
Open

Which is the current status of the tool? #7

josiasritter opened this issue Jul 25, 2022 · 75 comments

Comments

@josiasritter
Copy link

Dear Developers,

I would like to use this tool to calibrate my Lisflood model setup for the Mekong Basin. Reading the information in the Readme and on the Lisflood homepage, it is unclear to me which is the current version of the tool and what are the requirements.
When I clone the repository, I see that some work has been done to migrate the code to Python3, which is great! Is the migration close to being finished? Or will the development still take time and I should meanwhile use the original codes with Python 2.7?

A quick update on which is currently the best way to use the tool would be hugely appreciated.

Cheers and thanks for sharing this resource!

Josias

@doc78
Copy link
Collaborator

doc78 commented Jul 26, 2022

Dear Josias

Lisflood calibration tool is now compatible with python 3.7. However unfortunately the Readme file is outdated.
The documentation will be updated as soon as possible, in the mean time if you have any question please write us and we will try to help to use the tool.

Cheers
Carlo

@josiasritter
Copy link
Author

Thanks for the quick reply, Carlo! Great to hear the tool is already fully available in Python 3.

Luckily the codes are quite well commented (thanks!) so I have been able to prepare all inputs and setting files and I advanced up to the start of the actual calibration (CAL_7_PERFORM_CAL.py). First, a couple of questions to make sure I have been on the right track up to here:

  • CAL_4_EXTRACT_STATION.py: Do I assume correctly that this script is not needed if I want multiple stations to be included in the calibration?
  • CAL_6_CUT_MAPS_list.py: Does this script cut only the static lisflood layers, or should also the climate forcings be placed into the input directory to be cut here, too?

When I now try to run the calibration (CAL_7_PERFORM_CAL.py), I immediately get the below error at the first calibration station, then the code advances to the next station and is stuck running without progress:

Calling "qsub -l nodes=1:ppn=32 -q long -N LF_cal_on_10501 /Users/ritterj1/PythonProjects/lisflood-calibration/scripts/runLF_on_10501.sh"
sh: qsub: command not found

I understand that some software that includes qsub for queuing jobs is missing on my system. Do I need this software if I am running the calibration locally? And if so, how do I install it? I am running the calibration on a macOS Catalina with 8 cores.

Thanks a lot for your help!

Josias

@doc78
Copy link
Collaborator

doc78 commented Aug 4, 2022

Dear Josias

CAL_4_EXTRACT_STATION.py performs some checks and prepare (generates) the station_data.csv file and the observations file for each catchment in its dedicated "station" directory. Then you will need to call the the script CAL_7A_CALIBRATION.py and CAL_7B_LONGTERM_RUN.py on each single catchment, OR the script CAL_7_PERFORM_CAL.py on a list of catchments.

CAL_6_CUT_MAPS_list.py: you should include also the climate forcing into the same input directory to be cut.

"qsub" command is used to submit a job on our cluster of nodes, but you don't need it. You can just run the generated script "/Users/ritterj1/PythonProjects/lisflood-calibration/scripts/runLF_on_10501.sh"

Hope this helps
Cheers
Carlo

@EdgarEspitia
Copy link

EdgarEspitia commented Aug 10, 2022

Dear Carlo @doc78,

I am trying to use the calibration tool, but I got confused with the names of the python scripts on the documentation and release 0.2. After reading the comments above I understood that the readme is outdated, so I decide to follow the documentation with the python scrips, for example, using the script CAL_1_FILTER_STATIONS.py following the explanation of CAL_1_CAL_VAL_PERIODS.py in the documentation but I got some errors

CAL_1_FILTER_STATIONS.py [-h] settings_file stations_csv stations_type
CAL_1_FILTER_STATIONS.py: error: the following arguments are required: stations_csv, stations_type

I would appreciate some guidelines for calibration.
Finally, I have some questions, what setting file .txt is required to follow a template? the LISFLOODSettings file should be the same to initialize LIFLOOD? the runLF_linux_cut.sh depends on the system we are running, so should I rewrite it?
is it required to have installed LISFLOOD in the python virtual environment for calibration?

Thank you very much!
Edgar

@doc78
Copy link
Collaborator

doc78 commented Aug 10, 2022

Dear Edgar

CAL_1_FILTER_STATION.py takes 3 arguments:

  1. the settings file: you can find settings file template into the "integration" folder (setting.txt, settings_fast.txt and settings_slow.txt)
  2. station_csv: this is the csv file "stations.csv" in "integration" folder. You can add the new station rows and the info required into the columns of the csv file
  3. stations_type: this is the filter applied to the list of stations included into the stations.csv file. Only the station that have "station_type" value into the "EC_calib" column will be processed.

You can find in the "integration" folder some LisfloodSettings xml example files. The structure of the xml is the same used by Lisflood, so you can get info from the Lisflood 4.0.0 documentation (that is updated). This file will be used from the calib tool to tell the Deap Algorithm which modules should be used and the map paths (or fixed values) for the variables.

Yes, you need to install the lisflood package into your conda environment to use the calibration. You can install it using the command "pip install lislflood-model" or compile it from source code and then install it using the command "pip install ." in the lisflood-code folder

runLF_linux_cut.sh is an old script. You should instead use the CAL_4 and CAL_7 scripts in the actual master branch to execute the cutmaps and the calibration.

The readme will be updated as soon as possible with further details on all the new steps. In the meantime I hope this helps.

Best Regards
Carlo

@EdgarEspitia
Copy link

Dear Carlo,

Thank you for your explanation, and quick reply.

I am still having one error KeyError: 'Spinup_days' in CAL_1_FILTER_STATIONS.py

How Spinup_days are defined? Is it defined in the station.csv file? And I supposed that CAL_TYPE and Min_calib_days will generate errors if they are not defined in station.csv file.

Best regards,
Edgar

@doc78
Copy link
Collaborator

doc78 commented Aug 15, 2022

Yes, you are right, the station.csv file needs to be updated as well. You should add the Spinup_days column to the station.csv file as well as Min_calib_days. Furthermore, you should update the column "CAL_TYPE" with values "6" and "24" instead of the current "NRT_6h" and "HIST_24h" values.
Spinup days are used to shift the calibration start date from to the forcing start date, as you see in liscal/station.py at line 78:

# A calibration requires a spinup
# first valid observation point will be at forcing start + spinup

Min_calib_day is used to exclude stations that do not have enough days of observations for the calibration.

Best Regards
Carlo

@josiasritter
Copy link
Author

Dear @EdgarEspitia,

By browsing through the code and looking at the inputs requirements of the scripts, I prepared my station.csv file as seen in the screenshot below. With this station.csv file, all the preprocessing scripts worked (up to including CAL_6_CUT_MAPS_list.py). Hope it helps!

Josias

Screenshot 2022-08-15 at 17 13 47

@EdgarEspitia
Copy link

EdgarEspitia commented Aug 16, 2022

Dear @josiasritter and @doc78

Thank you very much for your help!

Now, in the script CAL_3_PREP_MISC.py, I got the error:

=================== START ===================
>> Reading stations data csv file...
>> Make map with station locations (outlet.map)...
col2map version: 4.3.3 (linux/x86_64)
nr. of records read: 1
nr. of records with mv value: 0
nr. of records with mv (x,y): 0
nr. of records outside map: 1
nr. of cells with mv: 34398
nr. of cells with more than one record: 0
nr. of cells with majority conflict: 0
>> Check for station conflicts...
pcrcalc version: 4.3.3 (linux/x86_64)
map2col version: 4.3.3 (linux/x86_64)
Station ID 1234 not found in outlet.map! Is there another station with the same location?
Number of station location conflicts: 1
Fix these! Enter 'c' to continue for now
Traceback (most recent call last):
  File "/my_system/lisflood-calibration/CAL_3_PREP_MISC.py", line 96, in <module>
    raise Exception("ERROR")
Exception: ERROR

I think the error is related to the coordinates of the outlet. Am I right?
I would appreciate any suggestion.

Best regards,
Edgar

@EdgarEspitia
Copy link

EdgarEspitia commented Aug 16, 2022

I am looking at what is wrong with the maps or coordinates.
I think the error nr. of records outside map: 1 is related to the coordinates on ldd.map, so I checked the coordinates in the map and CSV files (all maps are in the WGS84 system), but all seem to be right. The only map that I suspect is the ldd.map because it was converted from the NetCDF to a .map format by using the command pcr2nc of the lisflood-utilities.

For using a nc2pcr is required 3 parameters nc2pcr -i /path/to/input/ldd.nc -o /path/to/output/ldd.map -c /path/to/clone.map -l, but I am not sure what is the clone.map so I did with out this parameter.

@doc78
Copy link
Collaborator

doc78 commented Aug 17, 2022

nc2pcr should work without clone.map in latest lisflood-utilities v0.12.19. (clone.map is only used to get coordinates to generate the ldd.map file in this case).
However can you please check if ldd coordinates are correct? You can plot your ldd map together with tmp.map (in your temp folder) and gauges.map (in your output folder) and check which map is not correct. You should have ldd.map covering the station location, tmp.map having value 1 corresponding to the coordinates of the station and gauges.map having the ID value of the station at the correct station coordinates. You can also check if tmp.txt (in temp folder) is containing the correct coordinates and ID of the station.

@EdgarEspitia
Copy link

I found the source of errors, there were two mistakes, the first one was the coordinates, they were not the same in all CSV files, and the second was related to the station ID and Calibration ID, both of them were too long (more than 8 numeric characters).

After solving these errors, new ones arose:

?:1:84:ERROR: RUNTIME function accuflux: Unsound ldd
?:1:168:ERROR: /my_system/lisflood-calibration/tests/my_catchment/temp/accuflux.map: File '/my_system/lisflood-calibration/tests/my_catchment/temp/accuflux.map': No such file or directory

Thanks!

@doc78
Copy link
Collaborator

doc78 commented Aug 18, 2022

The new issue is related to the ldd map that is "unsound", that means not all the downstream paths end in a pit cell.
You can fix it using the command "lddrepair" and generate the repaired map starting from your ldd map. You can find the documentation of the pcraster command here: https://pcraster.geo.uu.nl/pcraster/4.3.3/documentation/pcraster_manual/sphinx/op_lddrepair.html

@EdgarEspitia
Copy link

EdgarEspitia commented Aug 22, 2022

Thanks for your help!
The command lddrepair worked well. I adjust the file names, and did it just by running the next script:

import pcraster as pcr
ldd = pcr.readmap("ldd.map")
result = pcr.lddrepair(ldd)
pcr.report(result,"ldd_corrected.map")

I continued the next steps, I ran CAL_6_CUT_MAPS_list.py just for one station, so I have a question, if I have multiple stations to calibrate, then I should add one column with the station and the observation, am I right?

And I supposed that CAL_6_CUT_MAPS.py do the same as CAL_6_CUT_MAPS_list.py , but just for the chose station, but I when I run it I got the error input mask file How can I check the outputs in this steep?

When I tried to run CAL_7A_CALIBRATION.py I got the error: ModuleNotFoundError: No module named 'lisf1', I solved it, by adding the path to the file lisf1.py in lisflood-code
Finally, I got the error CAL_7A_CALIBRATION.py: error: unrecognized arguments: 2 when I run
CAL_7A_CALIBRATION.py /my_system/settings.txt 633 1 2
As I understood, the parameters are the settings file, the ID of the stations' data to calibrate, the number of CPUs and a number to be used as a seed for a random number generator.

@doc78
Copy link
Collaborator

doc78 commented Aug 29, 2022

Dear Edgar

CAL_6_CUT_MAPS_list.py: yes, you should just add the list of the stations in one column in a text file and use the text filename as a parameter of the python script:

python CAL_6_CUT_MAPS_list.py <settings_file> <path_maps> <list_of_stations.txt>

If you get the error "wrong input mask file", you should check the subcatchment_path in your settings_file and the mask.map file in your maps folder of the subcatchment_path. You need to provide a valid "mask.map" pcraster file to cut the maps.

CAL_7A_CALIBRATION.py: The seed is an optional parameter. You can just omit it or, if you want to use it, you should write the following command:

python CAL_7A_CALIBRATION.py /my_system/settings.txt 633 1 --seed=2

@EdgarEspitia
Copy link

EdgarEspitia commented Sep 13, 2022

I am still having problems, now related with the subcatchments. I got the next errors:

  File "/mysystem/lisflood-calibration/CAL_7A_CALIBRATION.py", line 66, in <module>
    subcatch = subcatchment.SubCatchment(cfg, obsid)
  File "/mysystem/lisflood-calibration/liscal/subcatchment.py", line 42, in __init__
    self.inflowflag, n_inflows = self.prepare_inflows(cfg)
  File "/mysystem/lisflood-calibration/liscal/subcatchment.py", line 81, in prepare_inflows
    upstream_catchments = [int(i) for i in stations_links.loc[self.obsid].values if not np.isnan(i)]

And I have some doubts

  • how the subcatchments files (data and maps) should be arranged?

  • what is the structure of station_links.cvs files?
    For example:
    Ai=428, Aj=380
    ID,IDs of directly connected nested subcatchments,,,,,,,,,,,,,,,,,,,
    380,,,,,,,,,,,,,,,,,,,,
    428,380,,,,,,,,,,,,,,,,,,,
    image
    Is it right?

Thanks for your help!

@doc78
Copy link
Collaborator

doc78 commented Sep 22, 2022

Dear Edgar.
Can you please send us the full log of the error, since the error code is missing in your logs
Thanks
Carlo

@EdgarEspitia
Copy link

EdgarEspitia commented Sep 23, 2022

Dear Carlo,

I think the error is related to the paths in the settings file for the calibration or in the settings-Run.xml defining the outputs (lzavin.nc, and dis.tss). The error happens because the calibration algorithm cannot read the outputs of lisflood, so I think the problem is related to the structure of subcatchments (links and files structure ) and the paths in the settings files for calibration (for example: subcatchment_path, inlets, and interstation_regions). If you manually copy the outputs of the pre-run and run ( the files lzavin.nc, avgdis.nc, chanqWin.tss, and dis.tss) to the path reported in the error log, the calibration starts, but in the next iteration the error shows up.

error_log.txt

Thanks for your help!

@EdgarEspitia
Copy link

EdgarEspitia commented Sep 26, 2022

This is an example of the link structure

image

The link stations_links.csv looks like this:

6337350
6337340
6337610, 6337340, 6337350
6337330
6337320, 6337330, 6337610, 6337340, 6337350
6337310
6337512, 6337310 ,6337320, 6337330, 6337610, 6337340, 6337350
6337511, 6337512, 6337310 ,6337320, 6337330, 6337610, 6337340, 6337350

Is it right?

@doc78
Copy link
Collaborator

doc78 commented Sep 26, 2022

Dear Edgar.
The stations_links.csv file seems OK to me.

I think the issue is in the xml file. You should check that pathOut and pathRoot are in the format:

<textvar name="PathOut" value="$(PathRoot)/out/%run_rand_id">

And variables are in the format:

<textvar name="DisTS" value="$(PathOut)/dis.tss">

The old style of these path are instead in the format:

<textvar name="PathOut" value="$(PathRoot)/out">
<textvar name="DisTS" value="$(PathOut)/dis%run_rand_id.tss">

And this format is not compatible with the previous. Please note that "%run_rand_id" will be replaced at runtime by the numeration of the generation and the run of the current step of the calibration, e.g. for generation "0" run nr 1 you will get "0_1", thus the output folder of this run will be:

/lisflood-calibration/tests/my_catchment/CATCHMENTS_DIR/633/out/0_1/"

If you still have the issue or have any doubt please send me your xml file and I will give it a look.
Best Regards

Carlo

@EdgarEspitia
Copy link

Dear Carlo,

I am still having issues trying to run the script CAL_7A_CALIBRATION.py. I think, there is something wrong in the settings.txt in the section of path, exactly with: subcatchment_path, gauges_path, summary_path, interstation_regions, inlets

I would appreciate it if you have any comments on the attached setup.

Thanks for your help.

Best regards,
Edgar
error_log.txt
settup.zip

@doc78
Copy link
Collaborator

doc78 commented Sep 28, 2022

Dear Edgar.
The error you get is due to the wrong format of stations_links.csv. You have:

ID IDs of directly connected nested subcatchments
508
509
200,100,508,509

While it should be:
ID, IDs of directly connected nested subcatchments,,
508,,,
509,,,
200,100,508,509

Please note that you should have at least 3 commas, and at least in the first row (the header), so that pandas will know that has to take 4 columns. Otherwise the pandas parser reading csv file will fail.
The trick here is to add many commas on the first row, like the stations_links in the repository, to avoid issues if any catchment have a big number of inflows.

Best Regards
Carlo

@EdgarEspitia
Copy link

Dear Carlo,

I fixed the stations_links.csv and ran all the calibration steps, but I am still having issues, look at the error log.

---------------------------------------
Gauge location 9.125 52.964
Upstream station(s):
Traceback (most recent call last):
  File "/my_system/lisflood-calibration/bin/CAL_7A_CALIBRATION.py", line 66, in <module>
    subcatch = subcatchment.SubCatchment(cfg, obsid)
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 42, in __init__
    self.inflowflag, n_inflows = self.prepare_inflows(cfg)
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 81, in prepare_inflows
    upstream_catchments = [int(i) for i in stations_links.loc[self.obsid].values if not np.isnan(i)]
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 81, in <listcomp>
    upstream_catchments = [int(i) for i in stations_links.loc[self.obsid].values if not np.isnan(i)]
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Thanks a lot!

Best regards,
Edgar

@doc78
Copy link
Collaborator

doc78 commented Sep 29, 2022

Can you send me your updated csv file? Please note that catchments IDs should be just numbers, no letters nor special chars. Please check also for tabs or spaces.

@EdgarEspitia
Copy link

You are right, at the end of the lines in the file stations_links.csv was a tab, so I deleted it, and ran it again, but now the problems related to the subcatchments inflows. Look at the error log.

---------------------------------------
Gauge location 9.125 52.964
Upstream station(s):
Retrieving inflow for subcatchment 100
Traceback (most recent call last):
  File "/my_system/lisflood-calibration/bin/CAL_7A_CALIBRATION.py", line 66, in <module>
    subcatch = subcatchment.SubCatchment(cfg, obsid)
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 42, in __init__
    self.inflowflag, n_inflows = self.prepare_inflows(cfg)
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 93, in prepare_inflows
    raise Exception("ERROR: Missing " + Qsim_tss)
Exception: ERROR: Missing /my_system/lisflood-calibration/my_catchment/data/subcatchments/100/out/chanq_simulated_best.tss

@doc78
Copy link
Collaborator

doc78 commented Sep 29, 2022

Ok, this is fine. You just have to calibrate the catchment number "100" before the "200", since "200" need the "100" time series (as you wrote 200,100,508,509 in your stations_links.csv file).
Once you calibrate the "100" you will get the chanq_simulated_best.tss file in the 100/out folder and so you will be able to go on with the calibration

@EdgarEspitia
Copy link

So that means we should follow the river network, in this case run the calibrations scripts for each station, at first the stations 509 or 509, second 100, and finally 200.

image

@doc78
Copy link
Collaborator

doc78 commented Sep 29, 2022

Yes, correct. There is also a script to run all catchments automatically in the correct order "CAL_7_PERFORM_CAL.py", that takes as input the full o a partial list of catchments, but it is build to submit jobs on our Cluster system, so you need to make some changes to use it on your system.

@EdgarEspitia
Copy link

I have tried , but I got an error. can be it related with the general inflow.map? what are the caracteristics of the general inflow.map?

---------------------------------------
Gauge location 9.72 51.004
Upstream station(s):
No upstream inflow needed

Found 0 inflows
Traceback (most recent call last):
  File "/my_system/lisflood-calibration/bin/CAL_7A_CALIBRATION.py", line 66, in <module>
    subcatch = subcatchment.SubCatchment(cfg, obsid)
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 44, in __init__
    self.resample_inflows(cfg)
  File "/my_system/lisflood-calibration/bin/liscal/subcatchment.py", line 59, in resample_inflows
    raise FileNotFoundError('inflow map missing: {}'.format(subcatchinlets_map))
FileNotFoundError: inflow map missing: /my_system/lisflood-calibration/my_catchment/data/subcatchments/508/inflow/inflow.map

@doc78
Copy link
Collaborator

doc78 commented Sep 29, 2022

I think the general inflow.map is just missing. You should copy it to /my_system/lisflood-calibration/my_catchment/data/subcatchments/508/inflow/inflow.map
This is needed to generate a new file "inflow_cut.map" in the same folder, that will be cutted out from the general inflow map using the masksmall.map in maps folder.
Additional info on inflow map are here:
https://ec-jrc.github.io/lisflood-model/3_09_optLISFLOOD_inflow-hydrograph/

@doc78
Copy link
Collaborator

doc78 commented Oct 7, 2022

I see in your setting file another issue. You have:

<setoption choice="0" name="InitLisflood"/>

while it should be:

<setoption choice="%InitLisflood" name="InitLisflood"/>

Because the calibration will generate initialization (prerun) settings files updating this variable.
Please use this setting file to fix all the issue you have in yours:

https://github.com/ec-jrc/lisflood-calibration/blob/master/integration/settings_EFAS5.xml

This one is a working setting file, so you should keep all the variables and style of this one also in yours, just changing options, paths and filenames of your maps.

@EdgarEspitia
Copy link

Dear Carlo,

I fixed the issues with the variables names, so the calibration script works with the stations that are not nested. I tried to run for station 100 (see the figure ) but I got some errors (see the error log file).

River network
image

error_log.txt

@doc78
Copy link
Collaborator

doc78 commented Oct 17, 2022

Is this just because the row for the catchment '100' is missing in the stations_links file? It should be 100,508

@EdgarEspitia
Copy link

Thanks! It works fine.

Do you have any advice for choosing the parameters of the optimization algorithm?

For example:

[DEAP]
numcpus = NCPUS	# 
min_gen =  5		# number of MIN generation to run  
max_gen = 16		# number of MAX generation to run  
mu = 18			# initial population  
lambda_ = 56		# size of generation of offsprings
pop = 15 		# initial population
gen_offset = 3		# 
effmax_tol = 0.003	# 

# NOTES:
# numcpus 	= ?	 
# min_gen	= Run at least this number of generations
# max_gen	= Maximum number of generations to run, used as failsafe stop criterion. Takes precedence over minGen (handy for fast debugging)
# mu	 	= # of best children chosen to f	eed into the next generation (JRC decided to make it 2x calibration parameters)
#		This must always be at least 2, otherwhise DEAP cannot crossover-mutate the children
# Lamda		= # of children spawned at every generation (= # threads to run in parallel)
#		This must always be twice mu
# pop		= initial population
# gen_offset 	= ?		 
# effmax_tol 	= ?

@doc78
Copy link
Collaborator

doc78 commented Oct 18, 2022

You should set these parameters according to the system on which you will run the calibration.
"numcpus" is the number of cores to use in your system.
So you should consider that the individuals (or "run" or "executions of lisflood simulations") of each generation will be done in parallel, thus if you have 32 Cores on your system, you should choose the lambda_ equal to 32 or a multiple of your nr. of CPU core to optimize your resources (e.g. if you run on 8 cores, you can use lambda = 8, 16, 24 or 32).
If you set a low lambda, to get enough data for the DEAP algorithm you should increase the number of minimum and maximum generations: in GloFAS we set min_gen=8, max_gen=24 with lambda_=32. If using a lower lambda (e.g. 16) , you should use min_gen=16 and max_gen=48, because you will have half of the individuals generated for each generation, and in this way you will end up having enough runs of lisflood to get the best parameter for your calibration.
Said that (that is an optimization of your resources), mu is the number of individuals to use in the next generation, thus should be at least 2 and usually set to lambda/2. "pop" is the population of the first Generation, that usually is set to 2*lambda or more, to start the first generation of the DEAP algorithm and let it select the best simulation for the next iterations.
"gen_offset" and "effmax_tol" are used to check if there is no improvement of the KGE during the calibration: the Termination criteria is defined as the number of generation to be greater than min_gen and the KGE difference in the latest "gen_offset" generations to be less than effmax_tol. If these requirement are not satisfied, the calibration continues to the next generation until the max_gen nr. of generations is reached.

@MehdiHosseinipour
Copy link

Dear Developers,

I am trying to use the calibration tool, according to the comments, I noticed that the readme is not updated. Is static data including meteo variables not defined for calibration test?

Thank you,
Mahdi

@doc78
Copy link
Collaborator

doc78 commented Apr 26, 2024

Dear @MehdiHosseinipour

what you mean for "not defined for calibration test"? If you get any error can you please share details about the error?
Thanks

Carlo

@MehdiHosseinipour
Copy link

Dear @doc78

When I run the CAL_1_FILTER_STATIONS code, it cannot find the cfg.observed_discharges file and it gives the following error:

Found 14 calibration stations to check
Traceback (most recent call last):
File "/usr/local/bin/CAL_1_FILTER_STATIONS.py", line 48, in
observed_data = pd.read_csv(cfg.observed_discharges, sep=",", index_col=0)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1407, in init
self._engine = self._make_engine(f, self.engine)
File "/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
self.handles = get_handle(
File "/usr/local/lib/python3.10/dist-packages/pandas/io/common.py", line 859, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'OBS'

@doc78
Copy link
Collaborator

doc78 commented Apr 29, 2024

Dear @MehdiHosseinipour

We will release the updated documentation soon. In the meantime to solve your issue you should replace "OBS" with the filename and path to your observation file in the settings.txt file.

Cheers
Carlo

@Nooshdokht-Bayatafshary
Copy link

Dear developers,

I'm currently working on calibrating my basin using your calibration codes. While running CAL_4_EXTRACT_STATION.py, I encountered the following error:

(lis3.8) python CAL_4_EXTRACT_STATION.py ../SettingFiles/settings.txt ../CSVs/Calibration/stations_code.txt
=================== START ===================
Settings:
- Main
  - forcing_start: 22/09/2015 00:00
  - forcing_end: 31/12/2022 23:00
  - timestep: 360
  - prerun_start: 01/01/2015 0:00
  - prerun_end: 22/09/2015 00:00
  - prerun_timestep: 1440
  - fast_debug: 0
  - min_obs_years: 3.5
- Stations
  - stations_data: ../CSVs/Calibration/stations_data.csv
  - stations_links: ../CSVs/Calibration/stations_links.csv
  - observed_discharges: ../CSVs/Calibration/observation.csv
- Path
  - param_ranges: TEMPLATES/param_ranges.csv
  - subcatchment_path: CATCHMENTS_DIR
  - summary_path: SUMMARY_DIR
- Templates
  - lisfloodsettings: ./settings_lisflood.xml
- DEAP
  - numcpus: NCPUS
  - min_gen: 6
  - max_gen: 16
  - mu: 18
  - lambda_: 36
  - pop: 72
  - gen_offset: 3
  - effmax_tol: 0.003
>> Reading stations_data file...
Traceback (most recent call last):
  File "CAL_4_EXTRACT_STATION.py", line 48, in <module>
    obsid = int(args.station)
ValueError: invalid literal for int() with base 10: '../CSVs/Calibration/stations_code.txt'

From previous responses in this issue, I gathered that to run the script for multiple stations, I need to create a text file listing the stations and then use this file as a parameter for the script. I've followed these instructions and created a file stations_code.txt containing the list of stations.

Could you please assist me in identifying what might be causing this error? Any guidance or insights you can provide would be greatly appreciated.

Thank you,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented May 2, 2024

Dear @Nooshdokht-Bayatafshary
Actually for this script you should just provide the station ID as the argument, instead of the file with the list of station. This script should be called once for each station you have in the file. I.e. to process the station "21105" you should run the following command:

(lis3.8) python CAL_4_EXTRACT_STATION.py ../SettingFiles/settings.txt 21105

Cheers
Carlo

@Nooshdokht-Bayatafshary
Copy link

Dear @doc78,

Thank you for clarifying the usage of the script. Just to confirm, I understand that I need to run the script separately for each station, as exemplified by the command you provided for station "21105".

Considering I have around 50 stations, is there any way to streamline this process to avoid running the script individually for each station? Your guidance on a more efficient approach would be greatly appreciated.

Best regards,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented May 2, 2024

Dear @Nooshdokht-Bayatafshary
You can create a python script for this. Please have a look to the CAL_6_CUT_MAPS_list.py, that runs CAL_6_CUT_MAPS.py for each station written in a text file, similar to what you are looking for.
Hope this helps.
Cheers
Carlo

@Nooshdokht-Bayatafshary
Copy link

Nooshdokht-Bayatafshary commented May 8, 2024

Dear Developers,

Thank you for your earlier advice. To run CAL_6_CUT_MAPS_list.py or CAL_6_CUT_MAPS.py, I've realized that the climate forcing maps need to be in the input folder alongside the other maps. Since my maps are currently in two separate folders, would it be feasible to run the code twice — once for the static maps directory and once for the climate maps directory? I'm concerned whether this approach might lead to issues with subsequent code executions. Any insights on potential implications for future code would be greatly appreciated.

Additionally, I'm encountering an error when running CAL_6_CUT_MAPS.py to cut maps that include a time dimension (e.g., climate forcing maps, water use maps). The code throws the following error:

11:12:22 : creating... ../Karkheh/outcal/CATCHMENTS_DIR/21105/maps/thetas1_f.nc
11:12:22 : creating... ../Karkheh/outcal/CATCHMENTS_DIR/21105/maps/chan.nc
11:12:22 : creating... ../Karkheh/outcal/CATCHMENTS_DIR/21105/maps/ind.nc
Segmentation fault

Or for climate forcing maps:

(lis3.8) python CAL_6_CUT_MAPS.py ../SettingFiles/settings.txt ../Karkheh/meteo/ 21105
11:15:16 : creating... ../Karkheh/outcal/CATCHMENTS_DIR/21105/maps/ta_2022.nc
Segmentation fault

Best regards,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented May 8, 2024

Dear @Nooshdokht-Bayatafshary
You can run the script on separate folders, no problem. You will just need to set the correct path in the XML file for the future code.
Regarding the Segmentation Fault error, we are aware of this issue and we need to investigate further on it, since it seems to not be related to a specific map but more on memory and resources availability, and happens randomly on different maps.
So my suggestion is, when you have this error, just run the script again on the same map and check if it works.
Cheers
Carlo

@Nooshdokht-Bayatafshary
Copy link

Dear @doc78,
I encountered a FileNotFoundError while running the CAL_7A_CALIBRATION.py script. Below is the error traceback:

  File "CAL_7A_CALIBRATION.py", line 68, in <module>
    calibrate_subcatchment(cfg, obsid, subcatch)
  File "CAL_7A_CALIBRATION.py", line 26, in calibrate_subcatchment
    lis_template = templates.LisfloodSettingsTemplate(cfg, subcatch)
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/liscal/templates.py", line 16, in __init__
    with open(os.path.join('templates', cfg.lisflood_template), "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'templates/./settings_calibration.xml'

Here is my settings file (settings.txt):

[Main]
forcing_start = 01/01/2015 00:00
forcing_end = 31/12/2022 23:00
timestep = 360
prerun_start = 01/01/2015 0:00
prerun_end = 22/09/2015 00:00
prerun_timestep = 1440
fast_debug = 0
min_obs_years = 3.5

[Stations]
stations_data = ../Karkheh/CSVs/Calibration/stations_data.csv
stations_links = ../Karkheh/outcal/stations_links.csv
observed_discharges = ../Karkheh/CSVs/Calibration/observation.csv

[Path]
param_ranges = ../Karkheh/CSVs/Calibration/param_ranges.csv
subcatchment_path = ../Karkheh/outcal/CATCHMENTS_DIR
gauges_path = ../Karkheh/outcal/gauges.map
interstation_regions = ../Karkheh/outcal/interstation_regions.map
inlets = ../Karkheh/outcal/inlets.map
summary_path = ../Karkheh/outcal/SUMMARY_DIR

[Templates]
LISFLOODSettings = ./settings_calibration.xml

[DEAP]
numcpus = NCPUS
min_gen = 6
max_gen = 16
mu = 18
lambda_ = 36
pop = 72
gen_offset = 3
effmax_tol = 0.003

It seems the error occurs because the path to LISFLOODSettings is not being correctly resolved. The relevant line in template.py is:
with open(os.path.join('templates', cfg.lisflood_template), "r") as f:
This changes the path to os.path.join('templates', cfg.lisflood_template), which implies that the settings_calibration.xml file should be located in a templates directory within the CAL_7A_CALIBRATION.py script directory.
Should I place the settings_calibration.xml file inside a folder named templates in the directory where the CAL_7A_CALIBRATION.py script is located? Or am I missing something else?

Thank you for your assistance.
Best regards,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented Jun 3, 2024

Dear @Nooshdokht-Bayatafshary

Yes, you will just need to put the template XML setting file into the "templates" folder.
Best Regards

Carlo

@Nooshdokht-Bayatafshary
Copy link

Nooshdokht-Bayatafshary commented Jun 8, 2024

Dear @doc78 and other developers,

I encountered a segmentation fault while running CAL_7A_CALIBRATION.py. After backtracking, I found that the error occurs when reading NetCDF files using the xarray package for meteorological data in netcdf.py.

When MapsCaching is set to true, the error occurs at line 334:
data = XarrayCached(data_path, dates, indexer, climatology)

When MapsCaching is set to false, the error occurs at line 181:

ds = xr.open_mfdataset(
    data_path, engine='netcdf4', 
    chunks={'time': time_chunk}, combine='by_coords',
    mask_and_scale=True
)

I am working with split meteorological files (7 files in total for each parameter). Do you have any suggestions or ideas on how to resolve this issue?
Thank you for your assistance with this issue, and for your previous help!

Best regards,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented Jun 10, 2024

Dear @Nooshdokht-Bayatafshary

Can you please provide further info about this issue? Please attach the full log of the error.
Best Regards

Carlo

@Nooshdokht-Bayatafshary

Dear @doc78,

Thank you for your response.
Here is the full log of the run: log.txt
The run generates two setting files in the CATCHMENTS_DIR/21413/settings folder, which I have attached in the data.zip file. Additionally, it creates an empty folder at CATCHMENTS_DIR/21413/out/0.

I have also included the tp files in the data.zip attachment. The code encounters a segmentation fault when attempting to open these files at the line I mentioned in my previous comment.

Best regards,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented Jun 11, 2024

Dear @Nooshdokht-Bayatafshary

It seems there is something wrong with one of your map. Can you please check if there is any corrupted NC file in the maps folder?
Thanks

Carlo

@Nooshdokht-Bayatafshary
Copy link

Nooshdokht-Bayatafshary commented Jun 11, 2024

Dear @doc78,

Thank you very much for your time and assistance.
I have thoroughly checked the files in the maps folder using the xarray package in Python (data = xr.open_dataset(ncfile)). I was able to open all the files successfully without any issues. Additionally, I verified that each file contains 16 pixels in longitude and 20 pixels in latitude, and I also checked the time dimension for the relevant files.
I also checked my tp_2015.nc file using ncdump. and here is the output. Based on your experience with the LISFLOOD model, is there anything wrong with this, such as the units or fill values?

netcdf tp_2015 {
dimensions:
        time = 1460 ;
        lat = 1764 ;
        lon = 1368 ;
variables:
        int time(time) ;
                time:unit = "hours since 1970-01-01 00:00:00" ;
                time:frequency = 6 ;
                time:standard_name = "time" ;
                time:units = "hours since 2015-01-01 00:00:00" ;
                time:calendar = "proleptic_gregorian" ;
        double lat(lat) ;
                lat:_FillValue = NaN ;
        double lon(lon) ;
                lon:_FillValue = NaN ;
        float tp(time, lat, lon) ;
                tp:_FillValue = -999999.f ;

// global attributes:
                :_FillValue = -999999. ;
}

I haven't been able to identify any problems with the files. If you have any suggestions on what else I should check, I would greatly appreciate your guidance.
For your convenience, I have attached all the files in the maps folder (maps.zip).
Thank you once again for your help.

Best regards,
Nooshdokht

@Nooshdokht-Bayatafshary

Dear @doc78,

I wanted to let you know that after extensive research and testing, I was able to identify the cause of the segmentation fault error. By thoroughly backtracing the code, I discovered that the issue originates from the xarray package, specifically in dataset.py at line 211 where dask.array is imported.
Upon further investigation, I found a suggestion that the problem might be related to parallel execution. This led me to realize that I might be missing the appropriate dependencies for xarray to run in parallel.
Installing the necessary dependencies resolved the issue. To fix the problem, you can run the following command:

pip install "xarray[parallel]" (Reference)

I wanted to share this solution in case anyone else encounters a similar issue.

Best regards,
Nooshdokht

@Nooshdokht-Bayatafshary
Copy link

Dear @doc78,

I am running CAL_7A_CALIBRATION.py and encountered the following error. It seems to be related to Lisflood, so I'm unsure whether I should raise this issue here or in the Lisflood section. Please let me know if I should post it there instead.

Traceback (most recent call last):
  File "CAL_7A_CALIBRATION.py", line 68, in <module>
    calibrate_subcatchment(cfg, obsid, subcatch)
  File "CAL_7A_CALIBRATION.py", line 43, in calibrate_subcatchment
    calib_deap.run(subcatch.path, lock_mgr)
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/liscal/calibration.py", line 336, in run
    population = self.generate_population(halloffame)
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/liscal/calibration.py", line 278, in generate_population
    for ind, fit in zip(invalid_ind, fitnesses): # DD this updates the fitness (=KGE) for the individuals in the global pool of individuals which we just calculated. ind are
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/liscal/hydro_model.py", line 99, in run
    lisf1.main(prerun_file, '-v')
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/main.py", line 226, in main
    lisfloodexe(lissettings)
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/main.py", line 157, in lisfloodexe
    model_to_run.run()
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/global_modules/zusatz.py", line 147, in run
    self._runDynamic()
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/pcraster/framework/frameworkBase.py", line 371, in _runDynamic
    self._userModel().dynamic()
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/Lisflood_dynamic.py", line 248, in dynamic
    self.output_module.dynamic()
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/global_modules/output.py", line 581, in dynamic
    self.output_maps.write()
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/global_modules/output.py", line 446, in write
    out.write()
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/global_modules/output.py", line 298, in write
    return self.writer.write(self._start_date, self._rep_steps)
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/global_modules/output.py", line 96, in write
    nf1 = write_netcdf_header(self.settings, self.map_name, self.map_path, self.var.DtDay,
  File "/home/n.bayatafshary74.student.sharif/.conda/envs/lis3.8/lib/python3.8/site-packages/lisflood/global_modules/netcdf.py", line 479, in write_netcdf_header
    dtype = binding['OutputMapsDataType']
KeyError: 'OutputMapsDataType'

The error appears to be connected to the settings file, but I couldn't find the related key for OutputMapsDataType in the manual. I based my settings file on the GLOFAS example and only changed the input paths (settings_calibration.zip). I would greatly appreciate any assistance you can provide. This is the full error log (log.txt).

Best regards,
Nooshdokht

@doc78
Copy link
Collaborator

doc78 commented Jul 8, 2024

Dear @Nooshdokht-Bayatafshary
Please refer to https://github.com/ec-jrc/lisflood-code/blob/master/src/lisfloodSettings_reference.xml for the missing key. You should add the following variable in the setting file:

in lfuser section:

in lfbinding section:

you can now choose between float64 and float32 as data type for the outputs: this is a feature of the new Lisflood versions.
Cheers
Carlo

@Nooshdokht-Bayatafshary
Copy link

Nooshdokht-Bayatafshary commented Jul 24, 2024

Dear Developers,

I am seeking clarification on the next steps following the execution of CAL_7B_LONGTERM_RUN.py. I'm unsure about which code should be run next. I see that the LISCal codes are being updated in the develop branch, and up to CAL_7B_LONGTERM_RUN.py, everything seems consistent, except for the file names.

For CAL_9, I am uncertain whether to use CAL_9_PARAMETER_MAPS.py or CAL_9_basic_mosaickedparameterspcrastermaps.py from the master branch, or CAL_8_POSTPROCESSING.py from the develop branch. If the correct code is in the master branch, could you also provide the order in which the CAL_9 scripts should be executed? Are there specific commands or steps that I should follow?

Additionally, I have questions regarding the GwLossStations.txt and soilmoisture_stations.txt files referenced in lines 106 and 111 of CAL_9_PARAMETER_MAPS.py. Could you please clarify their roles and where I can find them?

Thank you for your assistance.

Best regards,
Nooshdokht

@StefaniaGrimaldi
Copy link
Contributor

Dear @Nooshdokht-Bayatafshary,

after the execution of https://github.com/ec-jrc/lisflood-calibration/blob/master/bin/CAL_7B_LONGTERM_RUN.py,
you might consider running https://github.com/ec-jrc/lisflood-calibration/blob/master/bin/CAL_7C_optional_diagnostic_plots.py to analyse the results of your calibration (meterological forcings, discharge, model states and internal fluxes) for catchment (as identified by the calibration points).
The script https://github.com/ec-jrc/lisflood-calibration/blob/master/bin/CAL_9_basic_mosaickedparameterspcrastermaps.py concatenates the calibrated parameters for each catchment (calibration point) into parameter maps having the same extent of your basin.

We hope that our answer helps,
kind regards,
oh behalf of the developers team,
Stefania

@Nooshdokht-Bayatafshary

Dear @doc78 and @StefaniaGrimaldi,

I hope this message finds you well. I would like to request some clarification regarding the validation and calibration periods in the latest version of LISCAL.

In a previous version, it appeared that if the record length was at least twice the MinQlength, the available streamflow record was split into equally long validation and calibration periods. If the record length was shorter than twice the MinQlength, the calibration period used MinQlength, with the remaining data allocated to validation (Reference).

Could you please explain how the streamflow data is now split between the validation and calibration periods in the new version? Additionally, I would appreciate it if you could inform me where the validation criteria, such as KGE and correlation, are saved. I noticed that the KGE for the calibration period is saved in the paramsHistory.csv file for each generation.

Thank you for your assistance.

Best regards,
Nooshdokht

@StefaniaGrimaldi
Copy link
Contributor

Dear @Nooshdokht-Bayatafshary ,

thank you for your inquiry.

The current version of the tool implements the following rules (https://github.com/ec-jrc/lisflood-calibration/blob/master/liscal/stations.py#L60):

  • if the number of available years is lower than 8 (but larger than the minimum number of calibration days), all the data are used for calibration.
  • if the number of available years is between 8 and 16, then the last 8 years are used for calibration.
  • if the number of available years is larger than 16, then the available time series is equally divided, and the most recent years are used for calibration.

Number of years, and temporal interval of the data used for calibration are computed individually for each station. Depending on discharge data availability, two calibration points in the same river might use different calibration years.

You are correct, paramsHistory.csv, and pHistoryWRanks.csv show the scores for the calibration period.
A script to compute the evaluation metrics using ALL the observations (calibration period and older data, if any) is being added to the develop branch of the calibration tool. However, this script does not compute the metrics using only the observations not included in the calibration period.

We hope that our answer clarifies some of your doubts,
Kind regards,
Stefania and Carlo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants