-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #40
Conversation
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java
Outdated
Show resolved
Hide resolved
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Welcome to Codecov 🎉Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment. Thanks for integrating Codecov - We've got you covered ☂️ |
Signed-off-by: zhichao-aws <[email protected]>
UT are finished, the integration test are still blocked by an exception during cluster bootstrap:
This issue depends on the ml-commons plugin. I think we can review and merge this PR first to unblock other developers. After ml-commons fix the issue, we can add integration test in follow up PRs. |
public static final String INDEX_FIELD = "index"; | ||
public static final String SOURCE_FIELD = "source_field"; | ||
public static final String DOC_SIZE_FIELD = "doc_size"; | ||
public static final int DEFAULT_DOC_SIZE = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to control the matching docs size? I remember it's 10 by default in search API. Are we giving 2 randomly or there's a reason behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number is inherited from origin VectorDBTool. I guess the reason maybe we want to limit the context size for LLM. I'm open for changing this number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's confirm and add a comment for this field. I'm also curious to know more about this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The origin VectorDBTool was introduced from the agent framework POC commit. opensearch-project/ml-commons@1b85cff#diff-b353ce1eda5b942809ea500dadcfda5769504edd8fa17fc174d498937097c69eR67 Hi @ylwu-amzn, do you know why the default doc size is 2? Do we set this for context length?
By the way I don't think this is a blocker issue for this PR. This parameter is configurable, and we can alter its value after our e2e test.
for (int i = 0; i < hits.length; i++) { | ||
SearchHit hit = hits[i]; | ||
Map<String, Object> docContent = new HashMap<>(); | ||
docContent.put("_index", hit.getIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any specific reason to extract these fields instead of deserialize the whole searchResponse as result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/opensearch-project/OpenSearch/blob/2c8ee1947b55a1cc5bc1a114b82e3a3b8a99851e/server/src/main/java/org/opensearch/search/SearchHit.java#L616
If we just deserialize the object, the unnecessary fields may increase the size of context. E.g. nestedIdentity, version, seqNo, primaryTerm etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result looks like in jsonl format, is this as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this result will be treated as a norm string and be sent to LLM.
src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java
Outdated
Show resolved
Hide resolved
Co-authored-by: zane-neo <[email protected]> Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhichao-aws overall it looks good to me even though the codecov is failing.
I wrote some tests to increase code coverage for AbstactRetrieverTool, we can merge this PR first and I will help contribute the codecov in a new PR.
build.gradle
Outdated
@@ -165,6 +170,8 @@ compileJava { | |||
options.compilerArgs.addAll(["-processor", 'lombok.launch.AnnotationProcessorHider$AnnotationProcessor']) | |||
} | |||
|
|||
forbiddenApisTest.ignoreFailures = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inherited from ml-commons build script. I try to build without this line and get errors like
Forbidden annotation use: org.junit.Test [defaultMessage Just name your test method testFooBar]
in org.opensearch.agent.tools.NeuralSparseSearchToolTests (NeuralSparseSearchToolTests.java, annotation on method declaration of 'testCreateTool()')
I tried to do some research but can not get any useful information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if I removes the @test annotation, it just can not find the test case
@@ -82,7 +82,7 @@ configurations { | |||
zipArchive | |||
all { | |||
resolutionStrategy { | |||
force "org.mockito:mockito-core:5.8.0" | |||
force "org.mockito:mockito-core:${versions.mockito}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will be the current version in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build.gradle
Outdated
testLogging { | ||
exceptionFormat "full" | ||
events "skipped", "passed", "failed" // "started" | ||
showStandardStreams true | ||
} | ||
include '**/*Tests.class' | ||
systemProperty 'tests.security.manager', 'false' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns off the security manager to make it easy for writing test case. It is used in many build script of Opensearch plugins. However we don't have such test cases currently. So I just removed this line and we can add it back when we need it.
public static final String INDEX_FIELD = "index"; | ||
public static final String SOURCE_FIELD = "source_field"; | ||
public static final String DOC_SIZE_FIELD = "doc_size"; | ||
public static final int DEFAULT_DOC_SIZE = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's confirm and add a comment for this field. I'm also curious to know more about this field.
} | ||
listener.onResponse((T) contextBuilder.toString()); | ||
} else { | ||
listener.onResponse((T) "Can not get any match from search result."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a customer facing response? If yes, how about: "No matches found. Please refine your search terms."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my mental model the retrieval tools are used for RAG, so their response will be parsed by LLM instead of human. We don't have a tool-retry logic now, the "refine your search terms" may confuse LLM
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/skills/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/skills/backport-2.x
# Create a new branch
git switch --create backport-40-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c088f7793c3e93ba0488b0be8bacc49c7195682a
# Push it to GitHub
git push --set-upstream origin backport-40-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/skills/backport-2.x Then, create a pull request where the |
feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77)
…lSparseTools (#58) * Merge pull request #40 from zhichao-aws/SearchTools feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77) * fix commons-lang3 version (#45) (#59) Signed-off-by: zhichao-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> Co-authored-by: zane-neo <[email protected]>
…lSparseTools (opensearch-project#58) * Merge pull request opensearch-project#40 from zhichao-aws/SearchTools feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77) * fix commons-lang3 version (opensearch-project#45) (opensearch-project#59) Signed-off-by: zhichao-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> Co-authored-by: zane-neo <[email protected]> Signed-off-by: yuye-aws <[email protected]>
Description
Move the AbstractRetriverTool, VectorDBTool, NeuralSparseTools from ml-commons agent_framework_dev branch to skills.
Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.