Skip to content
This repository has been archived by the owner on Oct 23, 2020. It is now read-only.

Ocean/layer volume weighted mask #1292

Open
wants to merge 11 commits into
base: ocean/develop
Choose a base branch
from

Conversation

vanroekel
Copy link
Contributor

These changes are to partially address slow performance in the layer weighted volume average ocean AM. In the 18to6 case, this AM was nearly 60% of the time integration timer. With this PR, the time for this AM is reduced by 80%, it is now about 10% of the time integration timer.

@vanroekel
Copy link
Contributor Author

@mark-petersen here are my proposed changes to the layer volume weighted AM. I have the don't merge tag for now until I can confirm that the output is consistent with what is produced by the head of ocean/develop. There may be other areas for performance enhancements as well, and I would greatly appreciate any suggestions. Also if @philipwjones or @pwolfram had additional thoughts on areas for improvement please let me know.

Copy link
Contributor

@pwolfram pwolfram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Luke,

Great work! I've noted some places that may present additional performance gain opportunities in this file.

Specific ideas are highlighted but here are some high-level ideas:

  1. If we could reduce
    kBufferLength=nOceanRegionsTmp*nLayerVolWeightedAvgFields*nVertLevels it would be helpful because this must be communicated in parallel.

  2. Could loop unrolling be used to speed up computation? Is it possible the compiler isn't doing this automatically, for example?

Lastly, and least importantly: do loops aren't indented properly in several places in the file, e.g., line 498.

workBufferSum(kBuffer) = workBufferSum(kBuffer) + workSum(iDataField)
workBufferMin(kBuffer) = min(workBufferMin(kBuffer), workMin(iDataField))
workBufferMax(kBuffer) = max(workBufferMax(kBuffer), workMax(iDataField))
do iRegion=1,nOceanRegionsTmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanroekel, doesn't first index vary fastest in FORTRAN, e.g., http://jblevins.org/log/efficient-code? Unless I'm confused here the loop order may be switchable for some performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, you are certainly right. I will fix this.

workSum(3) = workSum(3) + cellVolume ! volume sum
do iDataField=4,nDefinedDataFields
workSum(iDataField) = workSum(iDataField) + cellVolume*workArray(iDataField,iCell) ! volume-weighted sum
do iRegion=1,nOceanRegionsTmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanroekel, it looks like the indentation may be off here.

workSum(iRegion, 3) = workSum(iRegion, 3) + cellVolume ! volume sum
do iDataField=4,nDefinedDataFields
workSum(iRegion, iDataField) = workSum(iRegion, iDataField) + cellVolume*workArray(iDataField,iCell) ! volume-weighted sum
enddo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the code below right? iDatafield appears to be the last value of nDefinedDataFields. Should this code be moved up in the loop? This isn't 100% obvious at first glance. A comment here would be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into what loop? It is in the iRegion loop (sorry for the poor indenting). I'm not sure why this would be moved into the cells loop. This loops over all the physical variables (1,2,3 are mask, area, and volume respectively).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you were referencing here. It is the two lines below the iDataField loop and yes you are right these needed to be moved and have been.

@@ -333,21 +336,21 @@
<var name="workMinLayerVolume"
persistence="scratch"
type="real"
dimensions="nLayerVolWeightedAvgFields Time"
dimensions="nOceanRegionsTmp nLayerVolWeightedAvgFields Time"
Copy link
Contributor

@pwolfram pwolfram Apr 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the array-order in memory here consistent with the fastest array-access pattern? This is a key thing that could help performance that should be checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this needs to be fixed as mentioned above.

@@ -250,17 +336,17 @@ subroutine ocn_compute_layer_volume_weighted_averages(domain, timeLevel, err)!{{
! get pointers to scratch variables
call mpas_pool_get_subpool(block % structs, 'layerVolumeWeightedAverageAMScratch', scratchPool)
call mpas_pool_get_field(scratchPool, 'workArrayLayerVolume', workArrayField)
call mpas_pool_get_field(scratchPool, 'workMaskLayerVolume', workMaskField)
! call mpas_pool_get_field(scratchPool, 'workMaskLayerVolume', workMaskField)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not just removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistaken holdover. I have removed

@@ -312,18 +398,11 @@ subroutine ocn_compute_layer_volume_weighted_averages(domain, timeLevel, err)!{{
workBufferMin(:) = +1.0e20_RKIND
workBufferMax(:) = -1.0e20_RKIND

! loop over all ocean regions
do iRegion=1,nOceanRegionsTmp

! loop over the vertical
do iLevel=1,nVertLevels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we absolutely have to do this for each level? Could we get away with every other or every n levels, for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, but I don't think that the work required to not compute every level will pay off for the efficiency gained.

@pwolfram
Copy link
Contributor

One more performance thought that may hopefully be the most helpful-- communication is in the loop, e.g., at https://github.com/vanroekel/MPAS/blob/74f8f668ff9b97bc845567102dffb2772ef94af4/src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F#L449. It may be better to cache the results and communicate at the end in one call in order to avoid the latency of communication, which for iLevel over 100 levels may be causing the problem. I would try to aggregate the message passing and get the communicator out of the loop if possible.

@philipwjones
Copy link
Contributor

@vanroekel - I'll look at this today.

@vanroekel vanroekel force-pushed the ocean/Layer_volume_weighted_mask branch from 74f8f66 to a184f5a Compare April 17, 2017 16:19
@vanroekel
Copy link
Contributor Author

@pwolfram communication is not in the loop. Where you point to is post block loop. Communication happens after the nVertLevels loop. And the buffer size is now indeed nOceanRegionsTmp*nLayerVolWeightedAvgFields*nVertLevels so no change is needed on that front. I'll work on the array order.

@philipwjones
Copy link
Contributor

@vanroekel - hate to say this, but this really needs to be re-written. @pwolfram already pointed out the stride access issues - all the arrays should have k-index (ilevel) first and level loop innermost. In addition, the construct of having the minmaxsum (statistics) as a separate function is creating a lot of unnecessary storage and expensive allocations that would not be needed with an in-line calculation and an appropriate loop ordering. If you want, I can take a shot at the core routine(s) and could turn this around tomorrow.

@vanroekel
Copy link
Contributor Author

@philipwjones if you are willing to take a stab at the rewrite, I would be very happy. I was trying to not change too much, hence the separate stats call and the slightly odd ordered arrays, so I if you want to make any and all changes, I'd be very grateful. This routine is a big hit for the high-res runs, taking 5% of the total ocean time with a compute interval of 1-day.

For my own understanding can you explain why k first instead of nDataFields first, which is what I have now done here?

@mark-petersen
Copy link
Contributor

@vanroekel and @philipwjones There is a COMPASS test that will be very helpful here. I just added a little commit to add more variables to the comparison, hope you don't mind @vanroekel.

You can set up this test case as follows:

gr-fe1.lanl.gov> cd testing_and_setup/compass/
gr-fe1.lanl.gov> ./list_testcases.py |grep analysis
  14: -o ocean -c global_ocean -r QU240 -t analysis_test
gr-fe1.lanl.gov> setup_testcase.py -f general.config.ocean_turq --work_dir WORKDIR -n 14

on a login node:

cd WORKDIR
cd ocean/global_ocean/QU240/analysis_test
./run_test.py

@mark-petersen
Copy link
Contributor

I'm finding that the min and max are broken using the above test, for both Volume and Layer variables. The avg appears to be working.

cd /lustre/scratch3/turquoise/mpeterse/runs/t43q/ocean/global_ocean/QU240/analysis_test
gr0871.localdomain> ncdump -v maxVolumeTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
  -6598076933143.29, 0, 0, 0, 0, 0, -93754460884.3355 ;
}
gr0871.localdomain> ncdump -v minVolumeTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01 ;
}
gr0871.localdomain> ncdump -v avgVolumeTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
    3.69965924887792, 3.72517778620299, 3.59349709115481 ;
}
gr0871.localdomain> ncdump -v minLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01 ;
}
gr0871.localdomain> ncdump -v maxLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
    2.12610713986572e+16 ;
}
gr0871.localdomain> ncdump -v minLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01,
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01,
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01,
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01 ;
}
gr0871.localdomain> ncdump -v maxLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
    2.12610713986572e+16,
  215467097738560, 7.52538176863014e+15, 2.41496069145745e+15,
    1.76884274279661e+15, 1.56532233466065e+15, 792240960463543,
    2.12610713986572e+16 ;
}
gr0871.localdomain> ncdump -v avgLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
  0, 0.971952995480642, -0.4037625291602, 0, 0.802349697800351,
    0.791204201314209, 0.977386912520514,
  0, 1.03008866096944, -0.696744842394911, 0, 0.802619202528348, 0,
    0.986543052197686 ;
}
gr0871.localdomain> ncdump -v avgLayerKineticEnergyCell forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
    5.4709712920508e-07, 4.71442787103023e-05,
  0, 2.80398517492325e-05, 8.74947379578159e-05, 0, 5.08843505897544e-07,
    2.5656437999868e-07, 3.99017393710811e-05,
  0, 2.1506749937419e-05, 0, 0, 2.95050907750405e-07, 0, 3.54668394169954e-05 ;
}
gr0871.localdomain> ncdump -v maxLayerKineticEnergyCell forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
  28518885021.4354, 212119379173.995, 878633656653.876, 9356784606.48849,
    3972991322.19699, 5599138317.21231, 1564468481818.65,
  28518885021.4354, 212119379173.995, 878633656653.876, 9356784606.48849,
    3972991322.19699, 5599138317.21231, 1564468481818.65 ;
}

It was working on ocean/develop:

gr-fe3.lanl.gov> pwd
/lustre/scratch3/turquoise/mpeterse/runs/t41n/ocean/global_ocean/QU240/analysis_test
gr-fe3.lanl.gov> ncdump -v minLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 3
    -0.825268646333416, -0.762441875583128, -0.750172640579461,
    -0.696744842394911 ;
}
gr-fe3.lanl.gov> ncdump -v maxLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 3
    4.5178530968127, 3.67229754056838, 3.66701465042708, 3.43060986528412,
    2.53173043332382, 2.06111757675413, 2.05747970347969, 1.89823820105345 ;
}

@vanroekel
Copy link
Contributor Author

@mark-petersen thanks for testing this. I have only checked the layer averages thus far and hadn't had a chance to look at max/min. I will try and look soon. From the look of the numbers it seems like I'm not dividing by volume of the cell perhaps.

@philipwjones
Copy link
Contributor

Actually, the regional max value is incorrect in both the original and @vanroekel versions. The routine has the statement:
maxValueWithinOceanVolumeRegion(iDataField, iRegion) = maxval(minValueWithinOceanLayerRegion(iDataField,:,iRegion))

So it's taking the max of minimum values instead of the true maximum.

Done with my rewrite and am now testing...

@mark-petersen
Copy link
Contributor

@philipwjones What is the status of this PR? @q-rai (Anne Berres) has volunteered to add regional mask computations (from a netcdf file) to this AM, but she is only at LANL for another week ( @milenaveneziani this will help us out). It is important that @q-rai starts with code that is close to the final version of this PR, and she has time to work on it now.

@philipwjones could you just push up whatever you have, and tell us the status (passes or fails tests...) @q-rai can start with that commit, and we can always rebase later. I bet any remaining changes for this PR will be small. Thanks!

@philipwjones
Copy link
Contributor

@mark-petersen Will push at least one version up tonight. However, note that the region mask should really be an MPAS-ocean-wide capability and not limited to this AM. Maybe @q-rai should design/build a standalone module for this?

@milenaveneziani
Copy link

@philipwjones: the way that the regional mask is implemented in this AM is similar to only two other AMs: the surfaceAvg and the waterMassCensus ams. I believe @q-rai already made changes to waterMassCensus (not sure about surfaceAvg), which should be similar to those to be made here.
Other AM's, like the MOC and MHT, handle the regional calculation differently (fairly separated from the global calculation), and the regional quantity is saved in a separate array, rather than being added to one variable with additional region dimension as in the 3 AMs mentioned above.

@q-rai
Copy link

q-rai commented May 2, 2017

@philipwjones @milenaveneziani
Yes, MHT and MOC are a bit different from the other three as they didn't have hard-wired regions in them.
I've finished Water Mass Census and am currently debugging Surface Area Weighted Averages, which both have the same hard-wired regions as the Layer Averages.
@mark-petersen said he'd rather have me do the ground work on adapting Layer Averages than finish the nitty gritty details of Surface Averages. I agree that this would be the best use of my remaining time, as well as his, just based on our skill sets.
The way region masks are implemented in all four is using netCDF-based region masks that are generated from geojson files. (All found in the geometric_features repository.) All four analysis members write out separate arrays for the original AM version (global for MOC/MHT, hard-wired regions for WMC/SAWA) and the region mask file version (slightly inconsistent naming; I'll create an issue for that before I leave).

On the topic of a standalone module: I won't be here long enough to design a new module, but I'll leave my two cents hoping they'll help someone else.
I think a stand-alone module would be rather difficult to realize for all AMs at once:
MOC/MHT type AMs should work out okay, they write out spatial arrays with all the information needed. One could apply the region masks to the saved-out data. It could even be done as a stand-alone script and applied to all kinds of spatial data.
WMC/SAWA/LVWM type AMs are more tricky. They only save out processed/summarized data (avg, min, max, ...) and it's not possible to change the underlying region in retrospect to get a new min/max/avg. I don't see a good solution to this that doesn't involve going into the code of each individual AM to address this.

@philipwjones
Copy link
Contributor

@q-rai and @milenaveneziani - Sorry if I was unclear. What I was suggesting is essentially what Anne is doing already - namely replacing these hard-wired region masks with the common region masks and her suggestions for standardizing some of the naming (eg of nOceanRegions). Was thinking more about the overall bookkeeping, grouping and avoiding duplication among the AMs which I believe is done now mostly through preprocessing of the mask file. Something for the future. BTW, I suspect we may want to change index ordering on the masks - the versions of layer avg code I sent earlier are meant to explore this to some extent.

@mark-petersen
Copy link
Contributor

@philipwjones I copied your three versions and the registry file into this repo, and made one commit for each for posterity, see above. All three compiled using gnu on grizzly.

My trouble is, @q-rai has time to add the regional netcdf masks, but I don't have time to test these at high rez because I'm committed to the ACME non-bfb block test until it's fixed. Anne is only here for one more week.

@philipwjones could you make a best guess on which version to proceed with? Anne can start with that commit and add the netcdf region masks.

@philipwjones
Copy link
Contributor

@mark-petersen @q-rai - the regional mask in all three is identical, so maybe start with the Fcolumn version which is the most similar to the current one. If I get a chance after the threading stuff, I can try to evaluate these in a high-res case.

@mark-petersen
Copy link
Contributor

Great, thanks. @q-rai please start with 1f66c6c for your region mask work.

@q-rai
Copy link

q-rai commented May 4, 2017

@philipwjones I noticed that you changed the hard-wired region dimensions down from 7 to 6.
That extra region is the global "region", which is why nothing is done by the mask routine; the array is already initialized with '1'. In this AM it's not explicitly stated, but you can find an equivalent in Water Mass Census.
I'll change it back in the version I'm working on and I'll put a comment in the mask routines to make that clear.

Completely unrelated: which version of the code were you basing your changes on? I'm asking because @vanroekel pulled the mask computation up to the init routine which seemed like a good idea to me (I actually made the same change over here after seeing his solution). Is there a performance-related reason against it or did you just start from a different codebase?

@philipwjones
Copy link
Contributor

@q-rai Ok - didn't catch the 7th, but probably would be better to make that explicit rather than implied? I didn't see it in the var array in registry (may have missed it), so just assumed there was an extra index.

On masking - I did move the region mask up into init, but left the vertical mask since future simulations with active ice sheets or inundation might have time-varying masks for vertical levels. So I computed the total mask as the product of an invariant region mask and time-varying vertical mask. Due to some vectorization-related optimization later, we may end up having an MPAS-wide vertical mask rather than varying loop limits, so this may get pushed elsewhere.

@q-rai
Copy link

q-rai commented May 4, 2017

@philipwjones It's easy to miss, that's why I figured a comment stating that will help.

I completely missed the mask part when skimming your code. I expected something of the form

initroutine
  contains
  maskroutine
  end maskroutine
end initroutine
computeroutine
end computeroutine

because that's what I'd seen so far (and that's one big obvious block to see). Mark just explained to me that it doesn't actually have to be that way because there's also a contains all the way at the top before initializing.
So -- great! I'll just work off of what you have. Thanks for explaining what you did there, it will come in handy!

@q-rai q-rai force-pushed the ocean/Layer_volume_weighted_mask branch from c7aabaf to 8cde6b3 Compare May 8, 2017 23:44
@q-rai
Copy link

q-rai commented May 8, 2017

I added in region mask support. At a first glance, I seem to get the same results as hard-wired regions (with equivalent region mask).

@mark-petersen
Copy link
Contributor

@q-rai thanks for your quick work on this!

@q-rai q-rai force-pushed the ocean/Layer_volume_weighted_mask branch from 8cde6b3 to 1f66c6c Compare May 11, 2017 19:21
Anne S Berres added 3 commits May 11, 2017 14:34
@q-rai
Copy link

q-rai commented May 11, 2017

Testing:
I wrote some code to compute some stats on the output of
2f1d07c (hard-wired regions only; ocean/develop version)
1f66c6c (hard-wired regions only; Phil’s version 1)
and
112bd73 (todo) (hard-wired regions and mask file-based regions; my version)

which can be found here:
https://gist.github.com/q-rai/78d82aaeca0b5e3d44ddf4b4e092ec0a

The output looks promising, but there are some small issues that need fixing. The current status is:

  • min and max Layer Averages don’t seem to work in any of the versions; they all look the same.
  • min and max Volume Averages aren’t written out for hard-wired regions in my version; not sure what I messed up but it should be easy to fix; region mask versions are doing what they should
  • min and max Volume Averages for region masks look good; the maxes look like Phil’s version (fixed max)
  • avg Layer and Volume Averages are looking good

Here are some example images; commit numbers are in the caption. From left to right it’s ocean/develop-hard-wired, Phil-hard-wired, mine-hard-wired, mine-regionmask.
Note that the second graph from the left is missing the rightmost region.
lvwa_avglayerbruntvaisalafreqtop
lvwa_avgvolumevelocityzonal
lvwa_maxlayertemperature
lvwa_maxvolumerelativevorticitycell
lvwa_minlayerpotentialdensity
lvwa_minvolumevertvelocitytop

Region numbers for reference:

  • 0 Arctic (lat > 60)
  • 1 Equatorial (lat in [-15,15])
  • 2 Southern Ocean (lat < -50)
  • 3 Nino 3 (lat in [-5,5], lon in [210,270])
  • 4 Nino 4 (lat in [-5,5], lon in [160,120])
  • 5 Nino 3.4(lat in [-5,5], lon in [190,240])
  • global

@mark-petersen
Copy link
Contributor

@vanroekel This had huge swaths of conflicts with the head. I'm postponing the merge, but we should do it next week. Best to rebase after ocean/develop is updated today.

@pwolfram
Copy link
Contributor

@vanroekel and @mark-petersen, is this something we need to migrate to MPAS-Model?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants