Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove excess initialization from BGC code #6058

Conversation

peterdschwartz
Copy link
Contributor

@peterdschwartz peterdschwartz commented Nov 6, 2023

Every model timestep, the vegetation and column fluxes are set to zero for the CNP variables.
This took up roughly 1/3rd of the time for the Ecosystem model time, and this PR reduces that by over half.

[BFB]

Added initializing into compute loops when needed, and switched some variables to
be only be initialized on the first time step.

the harvest variables shouldn't require initializing every timestep but extra effort
is needed to untangle the dependencies and order of operations.
…lity.

Cleaned up the comments in VegetationDataType
@@ -308,7 +308,9 @@ subroutine SoilLittDecompAlloc (bounds, num_soilc, filter_soilc, &
/ cp_decomp_pools_new(c,j,cascade_receiver_pool(k)) )

else ! 100% respiration
pmpf_decomp_cascade(c,j,k) = - p_decomp_cpool_loss(c,j,k) / cp_decomp_pools(c,j,cascade_donor_pool(k))
if(decomp_ppools_vr(c,j,cascade_donor_pool(k)) > 0._r8) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterdschwartz, is this future proofing or was this already causing div0s? If zero's were being passed in, then I would recommend flagging this as an issue so you can have someone put time into figuring out why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this only became an issue when I removed certain initializations from the SetValues routine. Likewise with other changes: some initializations were moved to loops that first use that variable. This is for variables that only have meaningful values for certain types of pfts or cols for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgknox Did I answer your question?

Copy link
Contributor

@rgknox rgknox Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry for the late reply! thanks for the bump

Copy link
Contributor

@rgknox rgknox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good to me @peterdschwartz

@peterdschwartz
Copy link
Contributor Author

addressed comments

bishtgautam added a commit that referenced this pull request Jan 5, 2024
…#6058)

Every model timestep, the vegetation and column fluxes are set to zero for the CNP variables.
This took up roughly 1/3rd of the time for the Ecosystem model time, and this PR reduces that by over half.

[BFB]
@bishtgautam bishtgautam self-requested a review January 5, 2024 22:45
@peterdschwartz
Copy link
Contributor Author

I see some of the e3sm_integration BGC tests have diffs with intel compiler only. I'll try to have it sorted today

@rljacob
Copy link
Member

rljacob commented Jan 8, 2024

Looks like it was a small number of fields.

@rljacob rljacob added this to the v3.0.0-rc milestone Jan 8, 2024
@peterdschwartz
Copy link
Contributor Author

@bishtgautam Queue times have been slow, but I've isolated the issue to a specific subroutine. I'll be able to push a fix sometime tonight (depending on job turnover), but I wouldn't worry about being able to re-merge it tonight.

bishtgautam added a commit that referenced this pull request Jan 9, 2024
… next (PR #6058)"

This reverts commit 575f773, reversing
changes made to ab7c7e0.
@rljacob
Copy link
Member

rljacob commented Jan 10, 2024

Testing found some diffs with baselines. Seen on both chrysalis and pm-cpu (The only ones running integration with baselines)

The cases with 5 variables diffing.

SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPECACNT_1850.*_*.elm-bgcexp
SMS_Ld2.ne30pg2_r05_EC30to60E2r2.BGCEXP_CNTL_CNPRDCTC_1850.*_*.elm-bgcexp

The variables (from one of the cases, grepping for "RMS" in cprnc.out)

 RMS x2a_Fall_fco2_lnd                5.6027E+32            NORMALIZED  1.1852E+01
 RMS a2x_Sa_co2prog                   4.2255E+41            NORMALIZED  1.3157E+01
 RMS l2x_Fall_fco2_lnd                1.9829E+33            NORMALIZED  3.6984E+00
 RMS x2l_Sa_co2prog                   1.5536E+42            NORMALIZED  4.0495E+00
 RMS x2oacc_Sa_co2prog                6.1847E+40            NORMALIZED  1.9795E+01

This seems like its a simple fix.

But 2 other cases had 255 variables diffing (which implies a real diff introduced in to the state early since its only 5 steps).
The cases:

SMS_Ln5.ne30pg2_r05_EC30to60E2r2.BGCEXP_LNDATM_CNPRDCTC_1850.*_*
SMS_Ln5.ne30pg2_r05_EC30to60E2r2.BGCEXP_LNDATM_CNPRDCTC_20TR.*_*

@peterdschwartz
Copy link
Contributor Author

peterdschwartz commented Jan 10, 2024

@rljacob Yes I'm aware of the differences. I tried to focus on

SMS_Ln5.ne30pg2_r05_EC30to60E2r2.BGCEXP_LNDATM_CNPRDCTC_1850.*_*

as it is shorter and requires less resources, but I ran into the issue on chrysalis where this test DIFFs (in 255 variables) with master. I asked Gautam to do the same, and his case also gave DIFFs (same 255 variables) with master, so I think something is wrong with the default chrysalis environment (this isn't the only issue I've encountered when running test suites on chrysalis)

I don't have this issue with pm-cpu but the queue times are pretty long and that's the only thing holding this debugging effort back.

@rljacob
Copy link
Member

rljacob commented Jan 10, 2024

I was just noting for the record.

The jenkins testing on Chrysalis uses different node counts then regular runs for a given resolution. If you are comparing a run you do with the Jenkins-generated baseline then you might be picking up a pelayout diff that has crept in. Try making your own baseline with the same pelayout as your test case.

@peterdschwartz
Copy link
Contributor Author

@rljacob Right! the cases do also report NLDIFF for the pelayout so that could be it, thanks.

@rljacob
Copy link
Member

rljacob commented Jan 10, 2024

Good that should help find the issue with this PR. Unfortunately, there's now another issue because that case can't pass a PEM test. That may have existed for a while and can be looked at later.

@peterdschwartz
Copy link
Contributor Author

@bishtgautam Newest commit has fixed the 4 test diffs

bishtgautam added a commit that referenced this pull request Jan 16, 2024
bishtgautam added a commit that referenced this pull request Jan 16, 2024
…#6058)

Re-merging the PR #6058

Every model timestep, the vegetation and column fluxes are set to zero for the CNP variables.
This took up roughly 1/3rd of the time for the Ecosystem model time, and this PR reduces that by over half.

[BFB]
@rljacob
Copy link
Member

rljacob commented Jan 17, 2024

@bishtgautam you can merge this to master.

@bishtgautam bishtgautam merged commit 057f1f8 into E3SM-Project:master Jan 17, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants