Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exceptions in IntervalCalculation and ResultIndexingHandler #1379

Merged
merged 2 commits into from
Dec 4, 2024

Conversation

kaituo
Copy link
Collaborator

@kaituo kaituo commented Dec 3, 2024

Description

  • IntervalCalculation: Prevent an ArrayIndexOutOfBoundsException by returning early when there are fewer than two timestamps. Previously, the code assumed at least two timestamps, causing an exception when only one was present.

  • ResultIndexingHandler: Handle exceptions from asynchronous calls by logging error messages instead of throwing exceptions. Since the caller does not wait for these asynchronous operations, throwing exceptions had no effect and could lead to unhandled exceptions. Logging provides visibility without disrupting the caller's flow.

Testing done:

  1. added UT and ITs.

Signed-off-by: Kaituo Li [email protected]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This PR
- Introduced an `AtomicInteger` called `pagesInFlight` to track the number of pages currently being processed. 
- Incremented `pagesInFlight` before processing each page and decremented it after processing is complete
- Adjusted the condition in `scheduleImputeHCTask` to check both `pagesInFlight.get() == 0` (all pages have been processed) and `sentOutPages.get() == receivedPages.get()` (all responses have been received) before scheduling the `imputeHC` task. 
- Removed the previous final check in `onResponse` that decided when to schedule `imputeHC`, relying instead on the updated counters for accurate synchronization.

These changes address the race condition where `sentOutPages` might not have been incremented in time before checking whether to schedule the `imputeHC` task. By accurately tracking the number of in-flight pages and sent pages, we ensure that `imputeHC` is executed only after all pages have been fully processed and all responses have been received.

Testing done:
1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
2. added an IT for the above scenario.

Signed-off-by: Kaituo Li <[email protected]>
- **IntervalCalculation**: Prevent an `ArrayIndexOutOfBoundsException` by returning early when there are fewer than two timestamps. Previously, the code assumed at least two timestamps, causing an exception when only one was present.

- **ResultIndexingHandler**: Handle exceptions from asynchronous calls by logging error messages instead of throwing exceptions. Since the caller does not wait for these asynchronous operations, throwing exceptions had no effect and could lead to unhandled exceptions. Logging provides visibility without disrupting the caller's flow.

Testing done:
1. added UT and ITs.

Signed-off-by: Kaituo Li <[email protected]>
@opensearch-trigger-bot opensearch-trigger-bot bot added documentation Improvements or additions to documentation infra Changes to infrastructure, testing, CI/CD, pipelines, etc. backport 2.x labels Dec 3, 2024
@kaituo kaituo added bug Something isn't working and removed documentation Improvements or additions to documentation infra Changes to infrastructure, testing, CI/CD, pipelines, etc. labels Dec 3, 2024
Copy link

codecov bot commented Dec 3, 2024

Codecov Report

Attention: Patch coverage is 97.14286% with 1 line in your changes missing coverage. Please review.

Project coverage is 81.61%. Comparing base (1a3b8c9) to head (b85e825).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...eries/transport/handler/ResultIndexingHandler.java 96.96% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1379      +/-   ##
============================================
+ Coverage     80.11%   81.61%   +1.50%     
- Complexity     5710     5826     +116     
============================================
  Files           533      533              
  Lines         23527    23522       -5     
  Branches       2367     2367              
============================================
+ Hits          18848    19197     +349     
+ Misses         3557     3167     -390     
- Partials       1122     1158      +36     
Flag Coverage Δ
plugin 81.61% <97.14%> (+1.50%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...t/transport/ForecastResultBulkTransportAction.java 48.14% <ø> (ø)
...h/timeseries/rest/handler/IntervalCalculation.java 93.20% <100.00%> (ø)
...eries/transport/handler/ResultIndexingHandler.java 86.25% <96.96%> (+21.54%) ⬆️

... and 28 files with indirect coverage changes

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took an initial pass

@kaituo kaituo requested a review from owaiskazi19 December 3, 2024 20:32
Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@kaituo kaituo merged commit 0adf906 into opensearch-project:main Dec 4, 2024
25 checks passed
@opensearch-trigger-bot
Copy link

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/anomaly-detection/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/anomaly-detection/backport-2.x
# Create a new branch
git switch --create backport/backport-1379-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0adf906017229fc11498c3307237e4ee6bd6ca8c
# Push it to GitHub
git push --set-upstream origin backport/backport-1379-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/anomaly-detection/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1379-to-2.x.

kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Dec 5, 2024
…search-project#1379)

* Fix race condition in PageListener

This PR
- Introduced an `AtomicInteger` called `pagesInFlight` to track the number of pages currently being processed. 
- Incremented `pagesInFlight` before processing each page and decremented it after processing is complete
- Adjusted the condition in `scheduleImputeHCTask` to check both `pagesInFlight.get() == 0` (all pages have been processed) and `sentOutPages.get() == receivedPages.get()` (all responses have been received) before scheduling the `imputeHC` task. 
- Removed the previous final check in `onResponse` that decided when to schedule `imputeHC`, relying instead on the updated counters for accurate synchronization.

These changes address the race condition where `sentOutPages` might not have been incremented in time before checking whether to schedule the `imputeHC` task. By accurately tracking the number of in-flight pages and sent pages, we ensure that `imputeHC` is executed only after all pages have been fully processed and all responses have been received.

Testing done:
1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
2. added an IT for the above scenario.

Signed-off-by: Kaituo Li <[email protected]>

* Fix exceptions in IntervalCalculation and ResultIndexingHandler

- **IntervalCalculation**: Prevent an `ArrayIndexOutOfBoundsException` by returning early when there are fewer than two timestamps. Previously, the code assumed at least two timestamps, causing an exception when only one was present.

- **ResultIndexingHandler**: Handle exceptions from asynchronous calls by logging error messages instead of throwing exceptions. Since the caller does not wait for these asynchronous operations, throwing exceptions had no effect and could lead to unhandled exceptions. Logging provides visibility without disrupting the caller's flow.

Testing done:
1. added UT and ITs.

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
kaituo added a commit to kaituo/anomaly-detection-1 that referenced this pull request Dec 5, 2024
…search-project#1379)

* Fix race condition in PageListener

This PR
- Introduced an `AtomicInteger` called `pagesInFlight` to track the number of pages currently being processed. 
- Incremented `pagesInFlight` before processing each page and decremented it after processing is complete
- Adjusted the condition in `scheduleImputeHCTask` to check both `pagesInFlight.get() == 0` (all pages have been processed) and `sentOutPages.get() == receivedPages.get()` (all responses have been received) before scheduling the `imputeHC` task. 
- Removed the previous final check in `onResponse` that decided when to schedule `imputeHC`, relying instead on the updated counters for accurate synchronization.

These changes address the race condition where `sentOutPages` might not have been incremented in time before checking whether to schedule the `imputeHC` task. By accurately tracking the number of in-flight pages and sent pages, we ensure that `imputeHC` is executed only after all pages have been fully processed and all responses have been received.

Testing done:
1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
2. added an IT for the above scenario.

Signed-off-by: Kaituo Li <[email protected]>

* Fix exceptions in IntervalCalculation and ResultIndexingHandler

- **IntervalCalculation**: Prevent an `ArrayIndexOutOfBoundsException` by returning early when there are fewer than two timestamps. Previously, the code assumed at least two timestamps, causing an exception when only one was present.

- **ResultIndexingHandler**: Handle exceptions from asynchronous calls by logging error messages instead of throwing exceptions. Since the caller does not wait for these asynchronous operations, throwing exceptions had no effect and could lead to unhandled exceptions. Logging provides visibility without disrupting the caller's flow.

Testing done:
1. added UT and ITs.

Signed-off-by: Kaituo Li <[email protected]>

---------

Signed-off-by: Kaituo Li <[email protected]>
kaituo added a commit that referenced this pull request Dec 6, 2024
… (#1386)

* Fix race condition in PageListener

This PR
- Introduced an `AtomicInteger` called `pagesInFlight` to track the number of pages currently being processed. 
- Incremented `pagesInFlight` before processing each page and decremented it after processing is complete
- Adjusted the condition in `scheduleImputeHCTask` to check both `pagesInFlight.get() == 0` (all pages have been processed) and `sentOutPages.get() == receivedPages.get()` (all responses have been received) before scheduling the `imputeHC` task. 
- Removed the previous final check in `onResponse` that decided when to schedule `imputeHC`, relying instead on the updated counters for accurate synchronization.

These changes address the race condition where `sentOutPages` might not have been incremented in time before checking whether to schedule the `imputeHC` task. By accurately tracking the number of in-flight pages and sent pages, we ensure that `imputeHC` is executed only after all pages have been fully processed and all responses have been received.

Testing done:
1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
2. added an IT for the above scenario.



* Fix exceptions in IntervalCalculation and ResultIndexingHandler

- **IntervalCalculation**: Prevent an `ArrayIndexOutOfBoundsException` by returning early when there are fewer than two timestamps. Previously, the code assumed at least two timestamps, causing an exception when only one was present.

- **ResultIndexingHandler**: Handle exceptions from asynchronous calls by logging error messages instead of throwing exceptions. Since the caller does not wait for these asynchronous operations, throwing exceptions had no effect and could lead to unhandled exceptions. Logging provides visibility without disrupting the caller's flow.

Testing done:
1. added UT and ITs.



---------

Signed-off-by: Kaituo Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants