Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unit test testFailsHealthOnHungIOBeyondHealthyTimeout() by incresing the max waiting time before assertion #1692

Merged
merged 2 commits into from
Dec 10, 2021

Conversation

tlfeng
Copy link
Collaborator

@tlfeng tlfeng commented Dec 9, 2021

Description

Try to fix the unit test testFailsHealthOnHungIOBeyondHealthyTimeout() in class FsHealthServiceTests, according to the idea in comment #1567 (comment)

  • Increase the max waiting time between cancelling the IO hanging and checking the File System health status. Increase the 2x multiplier to 3x that applied to refreshInterval.
  • Remove the duplicate assertion statements for getting better error message.

Note:
1
The test failure can not be reproduced easily locally, and only sometime shows up in the CI workflow. I guess the reason of the test failure is there are dozens of file directories have to be restoring from the IO hanging status in the below step,

logger.info("--> Fix file system disruption");
disruptFileSystemProvider.injectIODelay.set(false);

but the max waiting time in below code is not enough for all those file directories restored to a healthy state.

2
Before removing the duplicate assertion statement:

org.opensearch.monitor.fs.FsHealthServiceTests > testFailsHealthOnHungIOBeyondHealthyTimeout FAILED
    java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([20C97377870AC21B:A2BD2D7EF3F6B453]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.opensearch.monitor.fs.FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout(FsHealthServiceTests.java:239)

After removing, the error message can be more intuitive:

org.opensearch.monitor.fs.FsHealthServiceTests > testFailsHealthOnHungIOBeyondHealthyTimeout FAILED
    java.lang.AssertionError: expected:<HEALTHY> but was:<UNHEALTHY>
        at __randomizedtesting.SeedInfo.seed([8E47558C149873A4:C330B85606405EC]:0)
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:834)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at org.opensearch.monitor.fs.FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout(FsHealthServiceTests.java:243)

Issues Resolved

#1567 - I wish it could be resolved after the PR.
#1307 and #1450 - These 2 issues reporting the same test failure as #1567.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Tianli Feng added 2 commits December 9, 2021 15:14
@tlfeng tlfeng requested a review from a team as a code owner December 9, 2021 23:29
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success fbaea49
Log 1419

Reports 1419

@tlfeng
Copy link
Collaborator Author

tlfeng commented Dec 10, 2021

Hi @Bukhtawar, I would like to hear your opinion on this change. 😁 Thank you.

@dblock dblock added backport PRs or issues specific to backporting features or enhancments backport 1.x labels Dec 10, 2021
@dblock
Copy link
Member

dblock commented Dec 10, 2021

I'm going to merge it since it visibly reduces the amount of test failures.

@dblock dblock merged commit baa10b9 into opensearch-project:main Dec 10, 2021
@dblock dblock added pending backport Identifies an issue or PR that still needs to be backported and removed backport PRs or issues specific to backporting features or enhancments labels Dec 10, 2021
@dblock
Copy link
Member

dblock commented Dec 10, 2021

This failed to backport automatically in https://github.com/opensearch-project/OpenSearch/runs/4486636906?check_suite_focus=true and will need to be done by hand, please, cc: @tlfeng.

@tlfeng tlfeng deleted the fix-fshealth-timeout branch December 10, 2021 17:40
@tlfeng
Copy link
Collaborator Author

tlfeng commented Dec 10, 2021

Hi @dblock, I realized that the PR #1167 that introduced the test testFailsHealthOnHungIOBeyondHealthyTimeout() isn't merged to 1.x branch. (https://github.com/opensearch-project/OpenSearch/blob/1.x/server/src/test/java/org/opensearch/monitor/fs/FsHealthServiceTests.java)
Although the backport is not needed, I will keep an eye when the unit test being backported.

@tlfeng tlfeng added :test Adding or fixing a test v2.0.0 Version 2.0.0 and removed pending backport Identifies an issue or PR that still needs to be backported backport 1.x labels Dec 10, 2021
@tlfeng tlfeng added the pending backport Identifies an issue or PR that still needs to be backported label Dec 30, 2021
@tlfeng
Copy link
Collaborator Author

tlfeng commented Dec 30, 2021

Note: Pending backport along with c05e0d9, after the PR #1269 merged.

@Bukhtawar
Copy link
Collaborator

@tlfeng can you confirm if the test fixes have been merged to 1.x? I tried pulling in these changes as well

@tlfeng
Copy link
Collaborator Author

tlfeng commented Feb 7, 2022

@tlfeng can you confirm if the test fixes have been merged to 1.x? I tried pulling in these changes as well

Hi @Bukhtawar, thanks a lot for pulling the test fix into your PR #1269! The unit test changes in your commit 88d1d65 are totally correct with commit baa10b9 and c05e0d9.

@tlfeng tlfeng removed the pending backport Identifies an issue or PR that still needs to be backported label Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:test Adding or fixing a test v2.0.0 Version 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] FsHealthServiceTests.testFailsHealthOnHungIOBeyondHealthyTimeout (Random test Failure)
5 participants