feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #40

zhichao-aws · 2023-12-20T10:11:19Z

Description

Move the AbstractRetriverTool, VectorDBTool, NeuralSparseTools from ml-commons agent_framework_dev branch to skills.

Issues Resolved

[List any issues this PR will resolve]

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zhichao-aws <[email protected]>

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

Signed-off-by: zhichao-aws <[email protected]>

codecov · 2023-12-21T02:25:42Z

Welcome to Codecov 🎉

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered ☂️

Signed-off-by: zhichao-aws <[email protected]>

zhichao-aws · 2023-12-21T03:30:53Z

UT are finished, the integration test are still blocked by an exception during cluster bootstrap:

org.opensearch.bootstrap.StartupException: java.lang.IllegalStateException: Plugin [skills] cannot extend non-extensible plugin [opensearch-ml]

This issue depends on the ml-commons plugin. I think we can review and merge this PR first to unblock other developers. After ml-commons fix the issue, we can add integration test in follow up PRs.

zane-neo · 2023-12-21T07:34:37Z

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

+    public static final String INDEX_FIELD = "index";
+    public static final String SOURCE_FIELD = "source_field";
+    public static final String DOC_SIZE_FIELD = "doc_size";
+    public static final int DEFAULT_DOC_SIZE = 2;


Is this to control the matching docs size? I remember it's 10 by default in search API. Are we giving 2 randomly or there's a reason behind this?

This number is inherited from origin VectorDBTool. I guess the reason maybe we want to limit the context size for LLM. I'm open for changing this number

Let's confirm and add a comment for this field. I'm also curious to know more about this field.

The origin VectorDBTool was introduced from the agent framework POC commit. opensearch-project/ml-commons@1b85cff#diff-b353ce1eda5b942809ea500dadcfda5769504edd8fa17fc174d498937097c69eR67 Hi @ylwu-amzn, do you know why the default doc size is 2? Do we set this for context length?

By the way I don't think this is a blocker issue for this PR. This parameter is configurable, and we can alter its value after our e2e test.

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

zane-neo · 2023-12-21T07:43:56Z

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

+                for (int i = 0; i < hits.length; i++) {
+                    SearchHit hit = hits[i];
+                    Map<String, Object> docContent = new HashMap<>();
+                    docContent.put("_index", hit.getIndex());


Is there any specific reason to extract these fields instead of deserialize the whole searchResponse as result?

https://github.com/opensearch-project/OpenSearch/blob/2c8ee1947b55a1cc5bc1a114b82e3a3b8a99851e/server/src/main/java/org/opensearch/search/SearchHit.java#L616
If we just deserialize the object, the unnecessary fields may increase the size of context. E.g. nestedIdentity, version, seqNo, primaryTerm etc.

The result looks like in jsonl format, is this as expected?

Yes, this result will be treated as a norm string and be sent to LLM.

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

Co-authored-by: zane-neo <[email protected]> Signed-off-by: zhichao-aws <[email protected]>

Signed-off-by: zhichao-aws <[email protected]>

mingshl

@zhichao-aws overall it looks good to me even though the codecov is failing.

I wrote some tests to increase code coverage for AbstactRetrieverTool, we can merge this PR first and I will help contribute the codecov in a new PR.

mingshl/ml-commons@93c729a

dhrubo-os · 2023-12-21T20:49:17Z

build.gradle

@@ -165,6 +170,8 @@ compileJava {
    options.compilerArgs.addAll(["-processor", 'lombok.launch.AnnotationProcessorHider$AnnotationProcessor'])
 }

+forbiddenApisTest.ignoreFailures = true


Can we add a comment why do we need this?

It's inherited from ml-commons build script. I try to build without this line and get errors like

Forbidden annotation use: org.junit.Test [defaultMessage Just name your test method testFooBar] in org.opensearch.agent.tools.NeuralSparseSearchToolTests (NeuralSparseSearchToolTests.java, annotation on method declaration of 'testCreateTool()')

I tried to do some research but can not get any useful information.

And if I removes the @test annotation, it just can not find the test case

dhrubo-os · 2023-12-21T20:49:42Z

build.gradle

@@ -82,7 +82,7 @@ configurations {
    zipArchive
    all {
        resolutionStrategy {
-            force "org.mockito:mockito-core:5.8.0"
+            force "org.mockito:mockito-core:${versions.mockito}"


what will be the current version in this case?

It's 5.5.0 in this case
https://github.com/opensearch-project/OpenSearch/blob/f92f846a1f9b30a055dde846fd12d987a511723a/buildSrc/version.properties#L58

dhrubo-os · 2023-12-21T20:50:34Z

build.gradle

    testLogging {
        exceptionFormat "full"
        events "skipped", "passed", "failed" // "started"
        showStandardStreams true
    }
+    include '**/*Tests.class'
+    systemProperty 'tests.security.manager', 'false'


Let's add a comment what is this for?

It turns off the security manager to make it easy for writing test case. It is used in many build script of Opensearch plugins. However we don't have such test cases currently. So I just removed this line and we can add it back when we need it.

dhrubo-os · 2023-12-21T20:52:28Z

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

+    public static final String INDEX_FIELD = "index";
+    public static final String SOURCE_FIELD = "source_field";
+    public static final String DOC_SIZE_FIELD = "doc_size";
+    public static final int DEFAULT_DOC_SIZE = 2;


Let's confirm and add a comment for this field. I'm also curious to know more about this field.

dhrubo-os · 2023-12-21T20:56:12Z

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java

+                }
+                listener.onResponse((T) contextBuilder.toString());
+            } else {
+                listener.onResponse((T) "Can not get any match from search result.");


Is this a customer facing response? If yes, how about: "No matches found. Please refine your search terms."

Based on my mental model the retrieval tools are used for RAG, so their response will be parsed by LLM instead of human. We don't have a tool-retry logic now, the "refine your search terms" may confuse LLM

build.gradle

Signed-off-by: zhichao-aws <[email protected]>

opensearch-trigger-bot · 2023-12-25T02:36:50Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/skills/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/skills/backport-2.x
# Create a new branch
git switch --create backport-40-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c088f7793c3e93ba0488b0be8bacc49c7195682a
# Push it to GitHub
git push --set-upstream origin backport-40-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/skills/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport-40-to-2.x.

feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77)

…lSparseTools (#58) * Merge pull request #40 from zhichao-aws/SearchTools feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77) * fix commons-lang3 version (#45) (#59) Signed-off-by: zhichao-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> Co-authored-by: zane-neo <[email protected]>

…lSparseTools (opensearch-project#58) * Merge pull request opensearch-project#40 from zhichao-aws/SearchTools feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77) * fix commons-lang3 version (opensearch-project#45) (opensearch-project#59) Signed-off-by: zhichao-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> Co-authored-by: zane-neo <[email protected]> Signed-off-by: yuye-aws <[email protected]>

add AbstractRetriverTool, VectorDBTool, NeuralSparseTool. add ut

7d1fb29

Signed-off-by: zhichao-aws <[email protected]>

zhichao-aws requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, xinyual, amitgalitz, jackiehanyang, owaiskazi19, ohltyler, joshpalis, dbwiddis, kaituo, lezzago, eirsep, sbcd90 and joshuali925 as code owners December 20, 2023 10:11

zhichao-aws added 3 commits December 20, 2023 18:14

neat

355bbe7

Signed-off-by: zhichao-aws <[email protected]>

fix forbiddenApisTest

340a6ed

Signed-off-by: zhichao-aws <[email protected]>

neat

1114732

Signed-off-by: zhichao-aws <[email protected]>

mingshl reviewed Dec 20, 2023

View reviewed changes

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java Outdated Show resolved Hide resolved

zhichao-aws added 2 commits December 21, 2023 10:06

fix: remove listener from buildSearchRequest

7fe33f6

Signed-off-by: zhichao-aws <[email protected]>

fix @test annotation

fd3cf33

Signed-off-by: zhichao-aws <[email protected]>

add more ut

4311116

Signed-off-by: zhichao-aws <[email protected]>

zane-neo reviewed Dec 21, 2023

View reviewed changes

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java Show resolved Hide resolved

zane-neo reviewed Dec 21, 2023

View reviewed changes

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java Outdated Show resolved Hide resolved

zane-neo reviewed Dec 21, 2023

View reviewed changes

src/main/java/org/opensearch/agent/tools/AbstractRetrieverTool.java Outdated Show resolved Hide resolved

zhichao-aws and others added 4 commits December 21, 2023 17:27

neat

02f0d45

Co-authored-by: zane-neo <[email protected]> Signed-off-by: zhichao-aws <[email protected]>

fix jarhell

43a228f

Signed-off-by: zhichao-aws <[email protected]>

address comments

97f92e9

Signed-off-by: zhichao-aws <[email protected]>

fix test

de436a4

Signed-off-by: zhichao-aws <[email protected]>

mingshl approved these changes Dec 21, 2023

View reviewed changes

dhrubo-os reviewed Dec 21, 2023

View reviewed changes

modify build script

6ec9b0c

Signed-off-by: zhichao-aws <[email protected]>

zhichao-aws requested review from zane-neo and dhrubo-os December 22, 2023 02:51

zhichao-aws added 2 commits December 22, 2023 14:31

merge main

ef03c34

Signed-off-by: zhichao-aws <[email protected]>

fix jar hell of gradlew run

b1dbe84

Signed-off-by: zhichao-aws <[email protected]>

zane-neo approved these changes Dec 22, 2023

View reviewed changes

reove redundant lines

b3a964b

Signed-off-by: zhichao-aws <[email protected]>

zane-neo merged commit c088f77 into opensearch-project:main Dec 22, 2023
3 of 11 checks passed

mingshl mentioned this pull request Dec 22, 2023

Increase AbstractRetrieverToolTests code coverage #53

Merged

5 tasks

zhichao-aws added the backport 2.x label Dec 25, 2023

opensearch-trigger-bot bot added the failed backport label Dec 25, 2023

zhichao-aws pushed a commit that referenced this pull request Dec 25, 2023

Merge pull request #40 from zhichao-aws/SearchTools

07ffd9c

feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools (cherry picked from commit c088f77)

This was referenced Dec 25, 2023

[Backport 2.x] feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #58

Merged

[backport 2.x] fix commons-lang3 version (#45) #59

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #40

feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #40

zhichao-aws commented Dec 20, 2023 •

edited

Loading

codecov bot commented Dec 21, 2023

zhichao-aws commented Dec 21, 2023

zane-neo Dec 21, 2023

zhichao-aws Dec 21, 2023

dhrubo-os Dec 21, 2023

zhichao-aws Dec 22, 2023

zane-neo Dec 21, 2023

zhichao-aws Dec 21, 2023

zane-neo Dec 22, 2023

zhichao-aws Dec 22, 2023

mingshl left a comment

dhrubo-os Dec 21, 2023

zhichao-aws Dec 22, 2023

zhichao-aws Dec 22, 2023

dhrubo-os Dec 21, 2023

zhichao-aws Dec 22, 2023

dhrubo-os Dec 21, 2023

zhichao-aws Dec 22, 2023

dhrubo-os Dec 21, 2023

dhrubo-os Dec 21, 2023

zhichao-aws Dec 22, 2023

opensearch-trigger-bot bot commented Dec 25, 2023

feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #40

feature: Add AbstractRetriverTool, VectorDBTool, NeuralSparseTools #40

Conversation

zhichao-aws commented Dec 20, 2023 • edited Loading

Description

Issues Resolved

Check List

codecov bot commented Dec 21, 2023

Welcome to Codecov 🎉

zhichao-aws commented Dec 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingshl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

opensearch-trigger-bot bot commented Dec 25, 2023

zhichao-aws commented Dec 20, 2023 •

edited

Loading