Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale Microgrid Test Error #16

Closed
Paulm991 opened this issue Jul 10, 2024 · 5 comments · Fixed by #37
Closed

Scale Microgrid Test Error #16

Paulm991 opened this issue Jul 10, 2024 · 5 comments · Fixed by #37
Assignees
Labels
bug Something isn't working

Comments

@Paulm991
Copy link
Collaborator

Trying to run the ScaleMicrogrid test is throwing an error on the 'develop' branch. The error output is as follows:

"ScaleMicrogrid" start time: Jul 10 14:10 EDT
Output:
Test the Relative Error
Test with Nsize = 2 passes!
Test the Relative Error
Test with Nsize = 4 passes!
Test the Relative Error
Test with Nsize = 8 fails!
Some tests fail!!

Test time = 3.82 sec
Test Failed.
"ScaleMicrogrid" end time: Jul 10 14:10 EDT
"ScaleMicrogrid" time elapsed: 00:00:03
This was performed with the coin-or/Ipopt#12 release of GridKit, Sundials 6.7.0, Ipopt 3.14.16, and SuiteSparse 5.10.1.

@Paulm991 Paulm991 changed the title Trying to run the ScaleMicrogrid test is throwing an error on the 'develop' branch. The error output is as follows: Scale Microgrid Test Error Jul 10, 2024
@Paulm991 Paulm991 assigned Paulm991 and reid-g and unassigned Paulm991 Jul 10, 2024
@superwhiskers superwhiskers assigned reid-g and unassigned reid-g Jul 10, 2024
@Paulm991 Paulm991 linked a pull request Jul 11, 2024 that will close this issue
@Paulm991 Paulm991 mentioned this issue Aug 6, 2024
@Paulm991 Paulm991 added the bug Something isn't working label Aug 8, 2024
@reid-g
Copy link
Collaborator

reid-g commented Sep 16, 2024

Some updates on this. I have verified that all model parameters are correct and rhs outputs are correct. So these are not it.

The "true" solution vectors I generate are from MATLAB based ODE form of the model. Since no analytical solution to the model available . No Jacobian is used. This is done with extreme tolerances. Relative Tolerance 1e-14 and Absolute Tolerance 1e-14. I have done both ode15s and ode23tb solutions. I have done also the new MATLAB integrated IDA way as well. They all match up correctly with only some numerical rounds difference between.

This same error appears in hardwired setups (no GridKit) as well. This is for both ODE and DAE forms hardwired.

Still actively looking to see what is the issue.

@pelesh
Copy link
Collaborator

pelesh commented Dec 12, 2024

Unfortunately, #30 did not fix this issue. See this log.

@nkoukpaizan
Copy link
Collaborator

The test failure seems to be non-deterministic. Different versions of SUNDIALS and/or different machines produce different error norms.

For example, from my runs on Frontier:

Test the Relative Error for N = 2
2-Norm Relative Error: 1.85946e-06
Test with Nsize = 2 passes!
Test the Relative Error for N = 4
2-Norm Relative Error: 7.05324e-06
Test with Nsize = 4 passes!
Test the Relative Error for N = 8
2-Norm Relative Error: 6.77802e-06
Test with Nsize = 8 passes!
All tests pass!!

But from the Github actions:

12: Test the Relative Error for N = 2
12: 2-Norm Relative Error: 2.80821e-06
12: Test with Nsize = 2 passes!
12: Test the Relative Error for N = 4
12: 2-Norm Relative Error: 2.25243e-06
12: Test with Nsize = 4 passes!
12: Test the Relative Error for N = 8
12: 2-Norm Relative Error: 0.000147299
12: Test with Nsize = 8 fails!

Both of these are with SUNDIALS v7.1.1.

@reid-g
Copy link
Collaborator

reid-g commented Dec 12, 2024

The test failure seems to be non-deterministic. Different versions of SUNDIALS and/or different machines produce different error norms.

For example, from my runs on Frontier:

Test the Relative Error for N = 2
2-Norm Relative Error: 1.85946e-06
Test with Nsize = 2 passes!
Test the Relative Error for N = 4
2-Norm Relative Error: 7.05324e-06
Test with Nsize = 4 passes!
Test the Relative Error for N = 8
2-Norm Relative Error: 6.77802e-06
Test with Nsize = 8 passes!
All tests pass!!

But from the Github actions:

12: Test the Relative Error for N = 2
12: 2-Norm Relative Error: 2.80821e-06
12: Test with Nsize = 2 passes!
12: Test the Relative Error for N = 4
12: 2-Norm Relative Error: 2.25243e-06
12: Test with Nsize = 4 passes!
12: Test the Relative Error for N = 8
12: 2-Norm Relative Error: 0.000147299
12: Test with Nsize = 8 fails!

Both of these are with SUNDIALS v7.1.1.

This appears to be due to compiler optimizations and how rounding is handled. I am able to exactly (every digit) replicate the results from Github actions on my machine by simply changing optimization flag from O0 to O1. I did not catch this since I was working in O0. My results were the same going from SUNDIALS v6.6.0 to v7.2.0 at optimization level O0.

There are trig functions utilized in the DG component. I suspect compiler optimizations are handling them differently. Plus the time scales are quite small in the initial time. The errors are worse on N=8 vs N=4 and N=2

I have found two routes (at least for my machine) to handle the problem.

  • Reduce the SCALE_MICROGRID_REL_TOL and SCALE_MICROGRID_ABS_TOL to 1e-6. This gives consistent results between O0 and O1. I am in favor of this option as we are already "bottoming out".
  • Add the compiler flag -mfpmath=387 to increase temporary float precision. This prevents the large increase in error in N=8 at the current tolerances. However, this would only be useful for this problem and may slow performance.

@pelesh
Copy link
Collaborator

pelesh commented Dec 13, 2024

I have found two routes (at least for my machine) to handle the problem.

  • Reduce the SCALE_MICROGRID_REL_TOL and SCALE_MICROGRID_ABS_TOL to 1e-6. This gives consistent results between O0 and O1. I am in favor of this option as we are already "bottoming out".
  • Add the compiler flag -mfpmath=387 to increase temporary float precision. This prevents the large increase in error in N=8 at the current tolerances. However, this would only be useful for this problem and may slow performance.

I suggest you make a PR with your solution to nicholson/buildsystem branch. That way we test it instantly. Please document your choices for selecting specific tolerances.

I suggest building code as RelWithDebInfo in CMake. With GCC, it will set flags to -g -O2. I would also test with -O3, as well.

@reid-g reid-g linked a pull request Dec 17, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants