-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Data race due to mutable feature vectors being shared across thread boundaries when using concurrent segment search #52
Comments
Thanks @jhinch-at-atlassian-com. @msfroh can you share any thoughts here? |
Yeah... this comment looks like a bit of a smoking gun: opensearch-learning-to-rank-base/src/main/java/com/o19s/es/ltr/query/RankerQuery.java Lines 203 to 206 in d3e1dd1
Now that we are using the concurrent |
@jhinch-at-atlassian-com -- Would you be able to develop and test a fix for this? I think you're right that scoping the supplier to the given |
I had a brief exploration into fixing this. It looks like using a thread local is the most convenient fix. The thread local is created when the a new feature vector is created and cleared after the ranker score is computed. I will raise a PR with the candidate fix shortly |
What is the bug?
OpenSearch 2.12 stablised concurrent segment search which results in segments being search on different threads in parallel. The LTR plugin makes use of a MutableSupplier to share feature vectors between different parts of the code. For example in DerivedExpressionQuery and ScriptFeature. Due to the implementation being made using an atomic reference (e34d607) this results in feature vectors potentially for different documents to be retrieved, resulting in the wrong feature scores and also data races.
How can one reproduce the bug?
Steps to reproduce the behavior.
What is the expected behavior?
A clear and concise description of what you expected to happen.
What is your host/environment?
Operating system, version.
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
The underlying issue seems to be that the vector is being shared across
LeafReaderContext
boundaries. If either a supplier was created one perLeafReaderContext
or the feature vector sharing is kept unique for eachLeafReaderContext
then it should work as expectedThe text was updated successfully, but these errors were encountered: