Ocean/layer volume weighted mask #1292

vanroekel · 2017-04-15T04:29:45Z

These changes are to partially address slow performance in the layer weighted volume average ocean AM. In the 18to6 case, this AM was nearly 60% of the time integration timer. With this PR, the time for this AM is reduced by 80%, it is now about 10% of the time integration timer.

vanroekel · 2017-04-15T04:31:57Z

@mark-petersen here are my proposed changes to the layer volume weighted AM. I have the don't merge tag for now until I can confirm that the output is consistent with what is produced by the head of ocean/develop. There may be other areas for performance enhancements as well, and I would greatly appreciate any suggestions. Also if @philipwjones or @pwolfram had additional thoughts on areas for improvement please let me know.

pwolfram

Hi Luke,

Great work! I've noted some places that may present additional performance gain opportunities in this file.

Specific ideas are highlighted but here are some high-level ideas:

If we could reduce
kBufferLength=nOceanRegionsTmp*nLayerVolWeightedAvgFields*nVertLevels it would be helpful because this must be communicated in parallel.
Could loop unrolling be used to speed up computation? Is it possible the compiler isn't doing this automatically, for example?

Lastly, and least importantly: do loops aren't indented properly in several places in the file, e.g., line 498.

pwolfram · 2017-04-15T05:07:18Z

src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F

-              workBufferSum(kBuffer) = workBufferSum(kBuffer) + workSum(iDataField)
-              workBufferMin(kBuffer) = min(workBufferMin(kBuffer), workMin(iDataField))
-              workBufferMax(kBuffer) = max(workBufferMax(kBuffer), workMax(iDataField))
+            do iRegion=1,nOceanRegionsTmp


@vanroekel, doesn't first index vary fastest in FORTRAN, e.g., http://jblevins.org/log/efficient-code? Unless I'm confused here the loop order may be switchable for some performance.

My mistake, you are certainly right. I will fix this.

pwolfram · 2017-04-15T05:08:52Z

src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F

-    workSum(3) = workSum(3) + cellVolume                                   ! volume sum
-    do iDataField=4,nDefinedDataFields
-      workSum(iDataField) = workSum(iDataField) + cellVolume*workArray(iDataField,iCell)  ! volume-weighted sum
+      do iRegion=1,nOceanRegionsTmp


@vanroekel, it looks like the indentation may be off here.

pwolfram · 2017-04-15T05:12:56Z

src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F

+         workSum(iRegion, 3) = workSum(iRegion, 3) + cellVolume                                   ! volume sum
+      do iDataField=4,nDefinedDataFields
+        workSum(iRegion, iDataField) = workSum(iRegion, iDataField) + cellVolume*workArray(iDataField,iCell)  ! volume-weighted sum
+      enddo


Is the code below right? iDatafield appears to be the last value of nDefinedDataFields. Should this code be moved up in the loop? This isn't 100% obvious at first glance. A comment here would be helpful.

Moved into what loop? It is in the iRegion loop (sorry for the poor indenting). I'm not sure why this would be moved into the cells loop. This loops over all the physical variables (1,2,3 are mask, area, and volume respectively).

I think I see what you were referencing here. It is the two lines below the iDataField loop and yes you are right these needed to be moved and have been.

pwolfram · 2017-04-15T05:14:19Z

src/core_ocean/analysis_members/Registry_layer_volume_weighted_averages.xml

@@ -333,21 +336,21 @@
 		<var name="workMinLayerVolume"
 			 persistence="scratch"
 			 type="real"
-			 dimensions="nLayerVolWeightedAvgFields Time"
+			 dimensions="nOceanRegionsTmp nLayerVolWeightedAvgFields Time"


Is the array-order in memory here consistent with the fastest array-access pattern? This is a key thing that could help performance that should be checked.

Nope, this needs to be fixed as mentioned above.

pwolfram · 2017-04-15T05:15:52Z

src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F

@@ -250,17 +336,17 @@ subroutine ocn_compute_layer_volume_weighted_averages(domain, timeLevel, err)!{{
         ! get pointers to scratch variables
         call mpas_pool_get_subpool(block % structs, 'layerVolumeWeightedAverageAMScratch', scratchPool)
         call mpas_pool_get_field(scratchPool, 'workArrayLayerVolume', workArrayField)
-         call mpas_pool_get_field(scratchPool, 'workMaskLayerVolume', workMaskField)
+!         call mpas_pool_get_field(scratchPool, 'workMaskLayerVolume', workMaskField)


Why is this not just removed?

This is a mistaken holdover. I have removed

pwolfram · 2017-04-15T05:19:55Z

src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F

@@ -312,18 +398,11 @@ subroutine ocn_compute_layer_volume_weighted_averages(domain, timeLevel, err)!{{
         workBufferMin(:) = +1.0e20_RKIND
         workBufferMax(:) = -1.0e20_RKIND

-         ! loop over all ocean regions
-         do iRegion=1,nOceanRegionsTmp
-
         ! loop over the vertical
         do iLevel=1,nVertLevels


Do we absolutely have to do this for each level? Could we get away with every other or every n levels, for example?

Probably not, but I don't think that the work required to not compute every level will pay off for the efficiency gained.

pwolfram · 2017-04-15T05:32:30Z

One more performance thought that may hopefully be the most helpful-- communication is in the loop, e.g., at https://github.com/vanroekel/MPAS/blob/74f8f668ff9b97bc845567102dffb2772ef94af4/src/core_ocean/analysis_members/mpas_ocn_layer_volume_weighted_averages.F#L449. It may be better to cache the results and communicate at the end in one call in order to avoid the latency of communication, which for iLevel over 100 levels may be causing the problem. I would try to aggregate the message passing and get the communicator out of the loop if possible.

philipwjones · 2017-04-17T16:04:09Z

@vanroekel - I'll look at this today.

vanroekel · 2017-04-17T17:27:41Z

@pwolfram communication is not in the loop. Where you point to is post block loop. Communication happens after the nVertLevels loop. And the buffer size is now indeed nOceanRegionsTmp*nLayerVolWeightedAvgFields*nVertLevels so no change is needed on that front. I'll work on the array order.

philipwjones · 2017-04-17T19:19:57Z

@vanroekel - hate to say this, but this really needs to be re-written. @pwolfram already pointed out the stride access issues - all the arrays should have k-index (ilevel) first and level loop innermost. In addition, the construct of having the minmaxsum (statistics) as a separate function is creating a lot of unnecessary storage and expensive allocations that would not be needed with an in-line calculation and an appropriate loop ordering. If you want, I can take a shot at the core routine(s) and could turn this around tomorrow.

vanroekel · 2017-04-17T20:02:27Z

@philipwjones if you are willing to take a stab at the rewrite, I would be very happy. I was trying to not change too much, hence the separate stats call and the slightly odd ordered arrays, so I if you want to make any and all changes, I'd be very grateful. This routine is a big hit for the high-res runs, taking 5% of the total ocean time with a compute interval of 1-day.

For my own understanding can you explain why k first instead of nDataFields first, which is what I have now done here?

mark-petersen · 2017-04-19T15:42:27Z

@vanroekel and @philipwjones There is a COMPASS test that will be very helpful here. I just added a little commit to add more variables to the comparison, hope you don't mind @vanroekel.

You can set up this test case as follows:

gr-fe1.lanl.gov> cd testing_and_setup/compass/
gr-fe1.lanl.gov> ./list_testcases.py |grep analysis
  14: -o ocean -c global_ocean -r QU240 -t analysis_test
gr-fe1.lanl.gov> setup_testcase.py -f general.config.ocean_turq --work_dir WORKDIR -n 14

on a login node:

cd WORKDIR
cd ocean/global_ocean/QU240/analysis_test
./run_test.py

mark-petersen · 2017-04-19T15:50:18Z

I'm finding that the min and max are broken using the above test, for both Volume and Layer variables. The avg appears to be working.

cd /lustre/scratch3/turquoise/mpeterse/runs/t43q/ocean/global_ocean/QU240/analysis_test
gr0871.localdomain> ncdump -v maxVolumeTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
  -6598076933143.29, 0, 0, 0, 0, 0, -93754460884.3355 ;
}
gr0871.localdomain> ncdump -v minVolumeTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01 ;
}
gr0871.localdomain> ncdump -v avgVolumeTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
    3.69965924887792, 3.72517778620299, 3.59349709115481 ;
}
gr0871.localdomain> ncdump -v minLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01 ;
}
gr0871.localdomain> ncdump -v maxLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 2
    2.12610713986572e+16 ;
}
gr0871.localdomain> ncdump -v minLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01,
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01,
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01,
  -321751610102804, 0, -459547812071011, 0, 0, 0, -5388069369312.01 ;
}
gr0871.localdomain> ncdump -v maxLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
    2.12610713986572e+16,
  215467097738560, 7.52538176863014e+15, 2.41496069145745e+15,
    1.76884274279661e+15, 1.56532233466065e+15, 792240960463543,
    2.12610713986572e+16 ;
}
gr0871.localdomain> ncdump -v avgLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
  0, 0.971952995480642, -0.4037625291602, 0, 0.802349697800351,
    0.791204201314209, 0.977386912520514,
  0, 1.03008866096944, -0.696744842394911, 0, 0.802619202528348, 0,
    0.986543052197686 ;
}
gr0871.localdomain> ncdump -v avgLayerKineticEnergyCell forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
    5.4709712920508e-07, 4.71442787103023e-05,
  0, 2.80398517492325e-05, 8.74947379578159e-05, 0, 5.08843505897544e-07,
    2.5656437999868e-07, 3.99017393710811e-05,
  0, 2.1506749937419e-05, 0, 0, 2.95050907750405e-07, 0, 3.54668394169954e-05 ;
}
gr0871.localdomain> ncdump -v maxLayerKineticEnergyCell forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 5
  28518885021.4354, 212119379173.995, 878633656653.876, 9356784606.48849,
    3972991322.19699, 5599138317.21231, 1564468481818.65,
  28518885021.4354, 212119379173.995, 878633656653.876, 9356784606.48849,
    3972991322.19699, 5599138317.21231, 1564468481818.65 ;
}

It was working on ocean/develop:

gr-fe3.lanl.gov> pwd
/lustre/scratch3/turquoise/mpeterse/runs/t41n/ocean/global_ocean/QU240/analysis_test
gr-fe3.lanl.gov> ncdump -v minLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 3
    -0.825268646333416, -0.762441875583128, -0.750172640579461,
    -0.696744842394911 ;
}
gr-fe3.lanl.gov> ncdump -v maxLayerTemperature forward/analysis_members/layerVolumeWeightedAverage.0001-01-01_00.00.00.nc | tail -n 3
    4.5178530968127, 3.67229754056838, 3.66701465042708, 3.43060986528412,
    2.53173043332382, 2.06111757675413, 2.05747970347969, 1.89823820105345 ;
}

vanroekel · 2017-04-19T15:53:39Z

@mark-petersen thanks for testing this. I have only checked the layer averages thus far and hadn't had a chance to look at max/min. I will try and look soon. From the look of the numbers it seems like I'm not dividing by volume of the cell perhaps.

philipwjones · 2017-04-20T18:13:40Z

Actually, the regional max value is incorrect in both the original and @vanroekel versions. The routine has the statement:
maxValueWithinOceanVolumeRegion(iDataField, iRegion) = maxval(minValueWithinOceanLayerRegion(iDataField,:,iRegion))

So it's taking the max of minimum values instead of the true maximum.

Done with my rewrite and am now testing...

mark-petersen · 2017-05-01T22:01:08Z

@philipwjones What is the status of this PR? @q-rai (Anne Berres) has volunteered to add regional mask computations (from a netcdf file) to this AM, but she is only at LANL for another week ( @milenaveneziani this will help us out). It is important that @q-rai starts with code that is close to the final version of this PR, and she has time to work on it now.

@philipwjones could you just push up whatever you have, and tell us the status (passes or fails tests...) @q-rai can start with that commit, and we can always rebase later. I bet any remaining changes for this PR will be small. Thanks!

philipwjones · 2017-05-01T23:14:25Z

@mark-petersen Will push at least one version up tonight. However, note that the region mask should really be an MPAS-ocean-wide capability and not limited to this AM. Maybe @q-rai should design/build a standalone module for this?

milenaveneziani · 2017-05-02T01:46:04Z

@philipwjones: the way that the regional mask is implemented in this AM is similar to only two other AMs: the surfaceAvg and the waterMassCensus ams. I believe @q-rai already made changes to waterMassCensus (not sure about surfaceAvg), which should be similar to those to be made here.
Other AM's, like the MOC and MHT, handle the regional calculation differently (fairly separated from the global calculation), and the regional quantity is saved in a separate array, rather than being added to one variable with additional region dimension as in the 3 AMs mentioned above.

q-rai · 2017-05-02T16:37:36Z

@philipwjones @milenaveneziani
Yes, MHT and MOC are a bit different from the other three as they didn't have hard-wired regions in them.
I've finished Water Mass Census and am currently debugging Surface Area Weighted Averages, which both have the same hard-wired regions as the Layer Averages.
@mark-petersen said he'd rather have me do the ground work on adapting Layer Averages than finish the nitty gritty details of Surface Averages. I agree that this would be the best use of my remaining time, as well as his, just based on our skill sets.
The way region masks are implemented in all four is using netCDF-based region masks that are generated from geojson files. (All found in the geometric_features repository.) All four analysis members write out separate arrays for the original AM version (global for MOC/MHT, hard-wired regions for WMC/SAWA) and the region mask file version (slightly inconsistent naming; I'll create an issue for that before I leave).

On the topic of a standalone module: I won't be here long enough to design a new module, but I'll leave my two cents hoping they'll help someone else.
I think a stand-alone module would be rather difficult to realize for all AMs at once:
MOC/MHT type AMs should work out okay, they write out spatial arrays with all the information needed. One could apply the region masks to the saved-out data. It could even be done as a stand-alone script and applied to all kinds of spatial data.
WMC/SAWA/LVWM type AMs are more tricky. They only save out processed/summarized data (avg, min, max, ...) and it's not possible to change the underlying region in retrospect to get a new min/max/avg. I don't see a good solution to this that doesn't involve going into the code of each individual AM to address this.

philipwjones · 2017-05-02T20:41:14Z

@q-rai and @milenaveneziani - Sorry if I was unclear. What I was suggesting is essentially what Anne is doing already - namely replacing these hard-wired region masks with the common region masks and her suggestions for standardizing some of the naming (eg of nOceanRegions). Was thinking more about the overall bookkeeping, grouping and avoiding duplication among the AMs which I believe is done now mostly through preprocessing of the mask file. Something for the future. BTW, I suspect we may want to change index ordering on the masks - the versions of layer avg code I sent earlier are meant to explore this to some extent.

mark-petersen · 2017-05-02T22:18:51Z

@philipwjones I copied your three versions and the registry file into this repo, and made one commit for each for posterity, see above. All three compiled using gnu on grizzly.

My trouble is, @q-rai has time to add the regional netcdf masks, but I don't have time to test these at high rez because I'm committed to the ACME non-bfb block test until it's fixed. Anne is only here for one more week.

@philipwjones could you make a best guess on which version to proceed with? Anne can start with that commit and add the netcdf region masks.

philipwjones · 2017-05-02T23:43:21Z

@mark-petersen @q-rai - the regional mask in all three is identical, so maybe start with the Fcolumn version which is the most similar to the current one. If I get a chance after the threading stuff, I can try to evaluate these in a high-res case.

mark-petersen · 2017-05-02T23:48:52Z

Great, thanks. @q-rai please start with 1f66c6c for your region mask work.

q-rai · 2017-05-04T21:43:02Z

@philipwjones I noticed that you changed the hard-wired region dimensions down from 7 to 6.
That extra region is the global "region", which is why nothing is done by the mask routine; the array is already initialized with '1'. In this AM it's not explicitly stated, but you can find an equivalent in Water Mass Census.
I'll change it back in the version I'm working on and I'll put a comment in the mask routines to make that clear.

Completely unrelated: which version of the code were you basing your changes on? I'm asking because @vanroekel pulled the mask computation up to the init routine which seemed like a good idea to me (I actually made the same change over here after seeing his solution). Is there a performance-related reason against it or did you just start from a different codebase?

philipwjones · 2017-05-04T21:51:23Z

@q-rai Ok - didn't catch the 7th, but probably would be better to make that explicit rather than implied? I didn't see it in the var array in registry (may have missed it), so just assumed there was an extra index.

On masking - I did move the region mask up into init, but left the vertical mask since future simulations with active ice sheets or inundation might have time-varying masks for vertical levels. So I computed the total mask as the product of an invariant region mask and time-varying vertical mask. Due to some vectorization-related optimization later, we may end up having an MPAS-wide vertical mask rather than varying loop limits, so this may get pushed elsewhere.

q-rai · 2017-05-04T22:06:37Z

@philipwjones It's easy to miss, that's why I figured a comment stating that will help.

I completely missed the mask part when skimming your code. I expected something of the form

initroutine
  contains
  maskroutine
  end maskroutine
end initroutine
computeroutine
end computeroutine

because that's what I'd seen so far (and that's one big obvious block to see). Mark just explained to me that it doesn't actually have to be that way because there's also a contains all the way at the top before initializing.
So -- great! I'll just work off of what you have. Thanks for explaining what you did there, it will come in handy!

q-rai · 2017-05-08T23:54:05Z

I added in region mask support. At a first glance, I seem to get the same results as hard-wired regions (with equivalent region mask).

mark-petersen · 2017-05-09T02:33:41Z

@q-rai thanks for your quick work on this!

Hard-wired regions and region masks can be turned on/off individually in namelist.

q-rai · 2017-05-11T20:58:03Z

Testing:
I wrote some code to compute some stats on the output of
2f1d07c (hard-wired regions only; ocean/develop version)
1f66c6c (hard-wired regions only; Phil’s version 1)
and
112bd73 (todo) (hard-wired regions and mask file-based regions; my version)

which can be found here:
https://gist.github.com/q-rai/78d82aaeca0b5e3d44ddf4b4e092ec0a

The output looks promising, but there are some small issues that need fixing. The current status is:

min and max Layer Averages don’t seem to work in any of the versions; they all look the same.
min and max Volume Averages aren’t written out for hard-wired regions in my version; not sure what I messed up but it should be easy to fix; region mask versions are doing what they should
min and max Volume Averages for region masks look good; the maxes look like Phil’s version (fixed max)
avg Layer and Volume Averages are looking good

Here are some example images; commit numbers are in the caption. From left to right it’s ocean/develop-hard-wired, Phil-hard-wired, mine-hard-wired, mine-regionmask.
Note that the second graph from the left is missing the rightmost region.

Region numbers for reference:

0 Arctic (lat > 60)
1 Equatorial (lat in [-15,15])
2 Southern Ocean (lat < -50)
3 Nino 3 (lat in [-5,5], lon in [210,270])
4 Nino 4 (lat in [-5,5], lon in [160,120])
5 Nino 3.4(lat in [-5,5], lon in [190,240])
global

mark-petersen · 2017-10-06T16:43:29Z

@vanroekel This had huge swaths of conflicts with the head. I'm postponing the merge, but we should do it next week. Best to rebase after ocean/develop is updated today.

pwolfram · 2019-04-29T19:41:50Z

@vanroekel and @mark-petersen, is this something we need to migrate to MPAS-Model?

vanroekel added 2 commits April 14, 2017 22:25

layer volume average fixes

d4be042

Cleans up unneeded code

5686028

vanroekel added bug Don't Merge Ocean labels Apr 15, 2017

vanroekel requested a review from mark-petersen April 15, 2017 04:29

pwolfram reviewed Apr 15, 2017

View reviewed changes

Fixes array unpacking post send

a184f5a

vanroekel force-pushed the ocean/Layer_volume_weighted_mask branch from 74f8f66 to a184f5a Compare April 17, 2017 16:19

Switch index ordering

65839bc

Add layer and min/max/avg to test suite comparison

645984b

mark-petersen added 3 commits May 2, 2017 15:48

Phil performance version 1: column ordering

1f66c6c

Phil performance version 2: in place

a7d08af

Phil performance version 3: transpose

c7aabaf

q-rai force-pushed the ocean/Layer_volume_weighted_mask branch from c7aabaf to 8cde6b3 Compare May 8, 2017 23:44

mark-petersen force-pushed the ocean/develop branch from 0e08bf7 to 2f1d07c Compare May 9, 2017 16:41

q-rai force-pushed the ocean/Layer_volume_weighted_mask branch from 8cde6b3 to 1f66c6c Compare May 11, 2017 19:21

Anne S Berres added 3 commits May 11, 2017 14:34

Back to Phil performance version 1: column ordering

d97605e

Region mask support for Layer Volume Weighted Averages.

187850d

Hard-wired regions and region masks can be turned on/off individually in namelist.

Refactored to conform with issue #1305.

4a5ed7a

mark-petersen removed the Don't Merge label Oct 6, 2017

Ocean/layer volume weighted mask #1292

Are you sure you want to change the base?

Ocean/layer volume weighted mask #1292

Conversation

vanroekel commented Apr 15, 2017

vanroekel commented Apr 15, 2017

pwolfram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwolfram Apr 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwolfram commented Apr 15, 2017

philipwjones commented Apr 17, 2017

vanroekel commented Apr 17, 2017

philipwjones commented Apr 17, 2017

vanroekel commented Apr 17, 2017

mark-petersen commented Apr 19, 2017

mark-petersen commented Apr 19, 2017

vanroekel commented Apr 19, 2017

philipwjones commented Apr 20, 2017

mark-petersen commented May 1, 2017

philipwjones commented May 1, 2017

milenaveneziani commented May 2, 2017

q-rai commented May 2, 2017

philipwjones commented May 2, 2017

mark-petersen commented May 2, 2017

philipwjones commented May 2, 2017

mark-petersen commented May 2, 2017

q-rai commented May 4, 2017

philipwjones commented May 4, 2017

q-rai commented May 4, 2017 • edited Loading

q-rai commented May 8, 2017

mark-petersen commented May 9, 2017

q-rai commented May 11, 2017 • edited Loading

mark-petersen commented Oct 6, 2017

pwolfram commented Apr 29, 2019

pwolfram Apr 15, 2017 •

edited

Loading

q-rai commented May 4, 2017 •

edited

Loading

q-rai commented May 11, 2017 •

edited

Loading