-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Dask distributed may slow down preprocessor steps running before concatenate #2073
Comments
Maybe your compute cluster doesn't have Infiniband? Could you have a look at the network interfaces you have available on the compute nodes? You can list them with |
It should have infiniband according to the technical specs, and it shows in the network interfaces:
But setting either configuration, though ib1 seems to be down, leads to this error
I guess it would be best to ask the sysadmins. In any case, whether the connection between nodes is faster or not using infiniband, I think the two by two concatenation in |
Ethernet networks are a lot slower than Infiniband, so communicating the data will be slower and could explain the performance issue reported here. My recommendation would be to try and get Infiniband configured. Are you running ESMValCore on the head node or on a compute node? If you're running it on the head node, it is possible that it has a different network setup from the compute nodes. In that case, you can configure the network interface separately for the head node which is running the scheduler and for the workers as in this example: dask/dask-jobqueue#382 (comment). |
not to be laughed at please - but isn't the network card more important than the actual cable/connection type? 😁 Also, we should not assume any Infini or Efini or Fini types of connection in our suggested settings (Efini is a Mazda, not a connection OK 😁 ) |
Well I managed to use infiniband by running the recipe in a compute node instead of in the login node, not much improvement either. In any case, I think there are many issues in the concatenation. The graph in the dashboard looks quite bad when the concatenation starts. For instance from our side, we are realising the data if the cubes overlap: ESMValCore/esmvalcore/preprocessor/_io.py Line 533 in ac98c69
I think that it would be worth it to include the concatenation in the list of performance issues that are pending to be solved |
that's an overlap-gated cube @sloosvel - not the biggest problem you have on your hands, it's usually a few years-long at most 😁 |
Indeed! |
It's happening to us that we have a cube with 18000 timesteps getting realised... |
oh boy, that's one chunky overlap; you guys using millisecond means data? 🤣 |
That is something that would need to be fixed first, |
As discussed in past meetings, the main issue with the concatenation of many cubes, specially if they contain HR data, comes from the fact that auxiliary coordinate array values need to be compared to ensure they are equal. This comparison happens sequentially in iris and requires to I will open a PR with the changes. |
Did you also open the iris issue? I couldn't find it yet. |
Iris issue opened SciTools/iris#5750 |
@sloosvel I'm working on parallel coordinate comparison in Iris in SciTools/iris#5926, would you have time to try it out and provide me with some feedback? |
With the great introduction of the possibility of configuring Dask distributed in #2049, I tried to run a test recipe using our model's high res configuration. The model files are splitted in chunks of one year.
High res data with the same resolution is also available in DKRZ in (for example) here:
/work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ/HighResMIP/EC-Earth-Consortium/EC-Earth3P-HR/control-1950/r1i1p2f1/Omon/tos/gn/v20181119
So preprocessor steps (
fix_file
can be ignored)load
,check_metadata
andconcatenate
have to be applied to multiple iris cubes before all the data is gathered in a single cube after concatenation. High res data in irregular grids also has the extra challenge that the coordinate and bound arrays are quite heavy and if they get realised multiple times, it can make your run hang due to memory issues (see SciTools/iris#5115).Running a recipe with two variables (2 tasks, without a dask.yml file), 92 years, in a single default node in our infrastructure tends to take in these steps:
Whereas in a SLURMCluster with this configuration (not sure if it's an optimal configuration) using two regular nodes:
And using even more resources`with 4 nodes but keeping the other parameters the same (again, not sure if optimal) :
Our VHR data (which is not available on ESGF) behaves even worse because the files are splitted in chunks of one month for monthly variables. So you can get stuck concatenating files for 30 min. My guess is that the loop over the cubes maybe does not scale well? All examples are run in nodes that got requested exclusively for the jobs. But I also don't know if the cluster configuration is just plain bad. I tried many other configurations (less memory, more cores, more number of processes, more nodes, a combination of more everything) and none seemed to get better though.
I also had to run this with the changes in SciTools/iris#5142, otherwise it would not have been possible to use our default nodes. Requesting higher memory nodes is not always a straight forward solution because it may leave your jobs in the queue for several days.
The text was updated successfully, but these errors were encountered: