Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not stop MockLogAppender instances #12134

Closed

Conversation

andrross
Copy link
Member

@andrross andrross commented Feb 1, 2024

When JUnit tests run concurrently they are mostly isolated but they do share static state. Tests generally don't rely on static state, with the notable exception of when they manipulate the logger to test the logging behavior of the code under test. This leads to problems if say test 1 stops a test log appender, then test 2 causes a logging message which attempts to append to that logging appender. The logging system will generate an error which leads to spurious failures in test 2. I previously attempted to avoid these problems by ensuring that test log appenders are removed before they are stopped. However, that doesn't completely eliminate race conditions as the list of appenders maintained by the logging system is essentially a copy-on-write array list. That means test 2 can get a reference of the appender list, then test 1 can remove and stop an appended, but test 2 still has a reference to a version of the list that contained the now stopped appender. I have now come to the conclusion that the easiest way to fix this problem (without mucking with the implementation details or concurrency properties of the logging system itself) is to not stop the test appenders. They maintain no external state so they can be safely removed without being stopped and they will just be garbage collected and everyone will be happy.

Related Issues

Resolves #10799

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

When JUnit tests run concurrently they are mostly isolated but they do
share static state. Tests generally don't rely on static state, with the
notable exception of when they manipulate the logger to test the logging
behavior of the code under test. This leads to problems if say test 1
stops a test log appender, then test 2 causes a logging message which
attempts to append to that logging appender. The logging system will
generate an error which leads to spurious failures in test 2. I
previously attempted to avoid these problems by ensuring that test log
appenders are removed before they are stopped. However, that doesn't
completely eliminate race conditions as the list of appenders maintained
by the logging system is essentially a copy-on-write array list. That
means test 2 can get a reference of the appender list, then test 1 can
remove and stop an appended, but test 2 still has a reference to a
version of the list that contained the now stopped appender. I have now
come to the conclusion that the easiest way to fix this problem (without
mucking with the implementation details or concurrency properties of the
logging system itself) is to not stop the test appenders. They maintain
no external state so they can be safely removed without being stopped
and they will just be garbage collected and everyone will be happy.

Signed-off-by: Andrew Ross <[email protected]>
Copy link
Contributor

github-actions bot commented Feb 1, 2024

Compatibility status:

Checks if related components are compatible with change 0f8047d

Incompatible components

Incompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/k-nn.git]

Copy link
Contributor

github-actions bot commented Feb 1, 2024

❕ Gradle check result for 0f8047d: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (3c07461) 71.40% compared to head (0f8047d) 71.30%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12134      +/-   ##
============================================
- Coverage     71.40%   71.30%   -0.10%     
+ Complexity    59505    59431      -74     
============================================
  Files          4925     4925              
  Lines        279540   279540              
  Branches      40646    40646              
============================================
- Hits         199594   199328     -266     
- Misses        63314    63610     +296     
+ Partials      16632    16602      -30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reta
Copy link
Collaborator

reta commented Feb 1, 2024

Thanks @andrross for persisting to get this one fixed

When JUnit tests run concurrently they are mostly isolated but they do share static state.

So this is my understanding of how we do that:

  • we run test suites concurrently by forking JVMs
  • we never run test suites (or tests with same suite) concurrently within same JVM

The static state between tests should not be shared between JVMs but it is possible that we don't clean up properly after some test suites so this problem emerges.

@andrross
Copy link
Member Author

andrross commented Feb 1, 2024

@reta You're right. I didn't realize all test concurrency happened via forked JVMs, but some experimentation definitely shows that to be true.

I'll dig into this some more, but another way this could be happening even within the context of a single test is that SimpleNioTransportTests.testTracerLog is triggering a log message to happen on a different thread that executes concurrently with the log appender remove/stop, which in theory is possible because we don't stop the services or thread pool in testTracerLog() method itself and that happens in the tearDown() method. Regardless, I'll close this PR because there is likely a different solution here.

@andrross andrross closed this Feb 1, 2024
@andrross andrross deleted the mock-logger-stop-fix branch February 8, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run Other skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] MockLogAppender appended to after stopping
3 participants