Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Nested field mapping support for text embedding processor #110

Closed
navneet1v opened this issue Jan 27, 2023 · 7 comments
Closed
Assignees
Labels
Enhancements Increases software capabilities beyond original client specifications good first issue Good for newcomers v2.16.0

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Jan 27, 2023

Is your feature request related to a problem?

This is related to customer created Github issue: #109

The following configuration using a nested source field, embeddings are not computed, which should be supported:

PUT /_ingest/pipeline/neural_pipeline_nested
{
  "description": "Neural Search Pipeline for message content",
  "processors": [
    {
      "text_embedding": {
        "model_id": "SXXx8YUBR2ZWhVQIkghB",
        "field_map": {
          "message.text": "message_embedding"
        }
      }
    }
  ]
}

PUT /neural-test-index-nested
{
    "settings": {
        "index.knn": true,
        "default_pipeline": "neural_pipeline_nested"
    },
    "mappings": {
        "properties": {
            "message_embedding": {
                "type": "knn_vector",
                "dimension": 384,
                "method": {
                    "name": "hnsw",
                    "engine": "lucene"
                }
            },
            "message.text": { 
                "type": "text"            
            },
            "color": {
                "type": "text"
            }
        }
    }
}

POST /_bulk
{"create":{"_index":"neural-test-index-nested","_id":"0"}}
{"message":{"text":"Text 1"},"color":"red"}
{"create":{"_index":"neural-test-index-nested","_id":"1"}}
{"message":{"text":"Text 2"}, "color": "black"}

GET /neural-test-index-nested/_search

What solution would you like?

The fields map keys should support . operator to define the nested fields.

What alternatives have you considered?

Customer can create a nested field mapping using:

PUT /neural-test-index-nested
{
    "description": "Neural Search Pipeline for message content",
    "processors": [
        {
            "text_embedding": {
                "model_id": "SXXx8YUBR2ZWhVQIkghB",
                "field_map": {
                    "message": {
                        "text": "message_embedding"
                    }
                }
            }
        }
    ]
}
@navneet1v navneet1v added the Enhancements Increases software capabilities beyond original client specifications label Jan 27, 2023
@vamshin vamshin added the good first issue Good for newcomers label Mar 28, 2023
@asfoorial
Copy link

asfoorial commented Oct 1, 2023

Is this going to also handle inner documents "nested" field types?

@Sanjana679
Copy link

I'm going to tackle this issue!

@sam-herman
Copy link

@navneet1v what is the expected behavior in case of nested field type as opposed to the above object field example?
How will the flattening to an array be handled?
For context
Nested field: https://opensearch.org/docs/latest/field-types/supported-field-types/nested/
object field: https://opensearch.org/docs/latest/field-types/supported-field-types/object/

@sam-herman
Copy link

FYI, there is this issue as well regarding chunking:
#482

The question above might be related to it, if the scope of this ticket is only for object fields (not nested fields) then we can continue the discussion on #482

@ripineros
Copy link

Is this still in progress?

@Sanjana679
Copy link

Sanjana679 commented Jun 4, 2024 via email

@navneet1v
Copy link
Collaborator Author

@Sanjana679 are you still working? I am not seeing any updates on the PR.

@vamshin vamshin changed the title [FEATURE] Treat . in the field name as a nested field in the fields map of text embedding processor [FEATURE] Nested field mapping support for text embedding processor Jul 15, 2024
@vamshin vamshin moved this from Backlog to 2.16.0 in Vector Search RoadMap Jul 15, 2024
@github-project-automation github-project-automation bot moved this from 2.16.0 to ✅ Done in Vector Search RoadMap Jul 22, 2024
@github-project-automation github-project-automation bot moved this to 2.16 (First RC 07/23, Release 08/06) in OpenSearch Project Roadmap Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications good first issue Good for newcomers v2.16.0
Projects
Status: 2.16 (First RC 07/23, Release 08/06)
Status: Done
Development

No branches or pull requests

7 participants