From b1e95542117e733c893c73c980421fd13b5f73d0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Mon, 14 Oct 2024 16:25:30 -0600 Subject: [PATCH] Add mapping parameters documentation (#7115) * Add mapping parameters documentation Signed-off-by: Melissa Vagi * Add mapping parameters documentation Signed-off-by: Melissa Vagi * Add mapping parameters documentation Signed-off-by: Melissa Vagi * Add mapping parameters documentation Signed-off-by: Melissa Vagi * Add mapping parameters documentation Signed-off-by: Melissa Vagi * Add files Signed-off-by: Melissa Vagi * Add files Signed-off-by: Melissa Vagi * Add files Signed-off-by: Melissa Vagi * Write first-pass draft Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/copy-to.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Signed-off-by: Melissa Vagi * Address final tech review comments Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/copy-to.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/copy-to.md Signed-off-by: Melissa Vagi * Update analyzer.md Signed-off-by: Melissa Vagi * Update boost.md Signed-off-by: Melissa Vagi * Update coerce.md Signed-off-by: Melissa Vagi * Update copy-to.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/boost.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/boost.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/boost.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/boost.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/boost.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/copy-to.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/copy-to.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/index.md Co-authored-by: Nathan Bower Signed-off-by: Melissa Vagi * Update front matter Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/analyzer.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/boost.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/coerce.md Signed-off-by: Melissa Vagi * Update _field-types/mapping-parameters/copy-to.md Signed-off-by: Melissa Vagi * Delete empty files Signed-off-by: Melissa Vagi * Delete empty files Signed-off-by: Melissa Vagi * Delete empty files Signed-off-by: Melissa Vagi * Delete empty files Signed-off-by: Melissa Vagi * Delete empty files Signed-off-by: Melissa Vagi --------- Signed-off-by: Melissa Vagi Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower --- _field-types/mapping-parameters/analyzer.md | 90 +++++++++++++++ _field-types/mapping-parameters/boost.md | 50 ++++++++ _field-types/mapping-parameters/coerce.md | 100 ++++++++++++++++ _field-types/mapping-parameters/copy-to.md | 109 ++++++++++++++++++ .../{ => mapping-parameters}/dynamic.md | 12 +- _field-types/mapping-parameters/index.md | 28 +++++ 6 files changed, 385 insertions(+), 4 deletions(-) create mode 100644 _field-types/mapping-parameters/analyzer.md create mode 100644 _field-types/mapping-parameters/boost.md create mode 100644 _field-types/mapping-parameters/coerce.md create mode 100644 _field-types/mapping-parameters/copy-to.md rename _field-types/{ => mapping-parameters}/dynamic.md (96%) create mode 100644 _field-types/mapping-parameters/index.md diff --git a/_field-types/mapping-parameters/analyzer.md b/_field-types/mapping-parameters/analyzer.md new file mode 100644 index 0000000000..32b26da1e0 --- /dev/null +++ b/_field-types/mapping-parameters/analyzer.md @@ -0,0 +1,90 @@ +--- +layout: default +title: Analyzer +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 5 +has_children: false +has_toc: false +--- + +# Analyzer + +The `analyzer` mapping parameter is used to define the text analysis process that applies to a text field during both index and search operations. + +The key functions of the `analyzer` mapping parameter are: + +1. **Tokenization:** The analyzer determines how the text is broken down into individual tokens (words, numbers) that can be indexed and searched. Each generated token must not exceed 32,766 bytes in order to avoid indexing failures. + +2. **Normalization:** The analyzer can apply various normalization techniques, such as converting text to lowercase, removing stop words, and stemming/lemmatizing words. + +3. **Consistency:** By defining the same analyzer for both index and search operations, you ensure that the text analysis process is consistent, which helps improve the relevance of search results. + +4. **Customization:** OpenSearch allows you to define custom analyzers by specifying the tokenizer, character filters, and token filters to be used. This gives you fine-grained control over the text analysis process. + +For information about specific analyzer parameters, such as `analyzer`, `search_analyzer`, or `search_quote_analyzer`, see [Search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/). +{: .note} + +------------ + +## Example + +The following example configuration defines a custom analyzer called `my_custom_analyzer`: + +```json +PUT my_index +{ + "settings": { + "analysis": { + "analyzer": { + "my_custom_analyzer": { + "type": "custom", + "tokenizer": "standard", + "filter": [ + "lowercase", + "my_stop_filter", + "my_stemmer" + ] + } + }, + "filter": { + "my_stop_filter": { + "type": "stop", + "stopwords": ["the", "a", "and", "or"] + }, + "my_stemmer": { + "type": "stemmer", + "language": "english" + } + } + } + }, + "mappings": { + "properties": { + "my_text_field": { + "type": "text", + "analyzer": "my_custom_analyzer", + "search_analyzer": "standard", + "search_quote_analyzer": "my_custom_analyzer" + } + } + } +} +``` +{% include copy-curl.html %} + +In this example, the `my_custom_analyzer` uses the standard tokenizer, converts all tokens to lowercase, applies a custom stop word filter, and applies an English stemmer. + +You can then map a text field so that it uses this custom analyzer for both index and search operations: + +```json +"mappings": { + "properties": { + "my_text_field": { + "type": "text", + "analyzer": "my_custom_analyzer" + } + } +} +``` +{% include copy-curl.html %} diff --git a/_field-types/mapping-parameters/boost.md b/_field-types/mapping-parameters/boost.md new file mode 100644 index 0000000000..f1648a861d --- /dev/null +++ b/_field-types/mapping-parameters/boost.md @@ -0,0 +1,50 @@ +--- +layout: default +title: Boost +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 10 +has_children: false +has_toc: false +--- + +# Boost + +The `boost` mapping parameter is used to increase or decrease the relevance score of a field during search queries. It allows you to apply more or less weight to specific fields when calculating the overall relevance score of a document. + +The `boost` parameter is applied as a multiplier to the score of a field. For example, if a field has a `boost` value of `2`, then the score contribution of that field is doubled. Conversely, a `boost` value of `0.5` would halve the score contribution of that field. + +----------- + +## Example + +The following is an example of how you can use the `boost` parameter in an OpenSearch mapping: + +```json +PUT my-index1 +{ + "mappings": { + "properties": { + "title": { + "type": "text", + "boost": 2 + }, + "description": { + "type": "text", + "boost": 1 + }, + "tags": { + "type": "keyword", + "boost": 1.5 + } + } + } +} +``` +{% include copy-curl.html %} + +In this example, the `title` field has a boost of `2`, which means that it contributes twice as much to the overall relevance score than the description field (which has a boost of `1`). The `tags` field has a boost of `1.5`, so it contributes one and a half times more than the description field. + +The `boost` parameter is particularly useful when you want to apply more weight to certain fields. For example, you might want to boost the `title` field more than the `description` field because the title may be a better indicator of the document's relevance. + +The `boost` parameter is a multiplicative factor---not an additive one. This means that a field with a higher boost value will have a disproportionately large effect on the overall relevance score as compared to fields with lower boost values. When using the `boost` parameter, it is recommended that you start with small values (1.5 or 2) and test the effect on your search results. Overly high boost values can skew the relevance scores and lead to unexpected or undesirable search results. diff --git a/_field-types/mapping-parameters/coerce.md b/_field-types/mapping-parameters/coerce.md new file mode 100644 index 0000000000..3cf844897a --- /dev/null +++ b/_field-types/mapping-parameters/coerce.md @@ -0,0 +1,100 @@ +--- +layout: default +title: Coerce +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 15 +has_children: false +has_toc: false +--- + +# Coerce + +The `coerce` mapping parameter controls how values are converted to the expected field data type during indexing. This parameter lets you verify that your data is formatted and indexed properly, following the expected field types. This improves the accuracy of your search results. + +--- + +## Examples + +The following examples demonstrate how to use the `coerce` mapping parameter. + +#### Indexing a document with `coerce` enabled + +```json +PUT products +{ + "mappings": { + "properties": { + "price": { + "type": "integer", + "coerce": true + } + } + } +} + +PUT products/_doc/1 +{ + "name": "Product A", + "price": "19.99" +} +``` +{% include copy-curl.html %} + +In this example, the `price` field is defined as an `integer` type with `coerce` set to `true`. When indexing the document, the string value `19.99` is coerced to the integer `19`. + +#### Indexing a document with `coerce` disabled + +```json +PUT orders +{ + "mappings": { + "properties": { + "quantity": { + "type": "integer", + "coerce": false + } + } + } +} + +PUT orders/_doc/1 +{ + "item": "Widget", + "quantity": "10" +} +``` +{% include copy-curl.html %} + +In this example, the `quantity` field is defined as an `integer` type with `coerce` set to `false`. When indexing the document, the string value `10` is not coerced, and the document is rejected because of the type mismatch. + +#### Setting the index-level coercion setting + +```json +PUT inventory +{ + "settings": { + "index.mapping.coerce": false + }, + "mappings": { + "properties": { + "stock_count": { + "type": "integer", + "coerce": true + }, + "sku": { + "type": "keyword" + } + } + } +} + +PUT inventory/_doc/1 +{ + "sku": "ABC123", + "stock_count": "50" +} +``` +{% include copy-curl.html %} + +In this example, the index-level `index.mapping.coerce` setting is set to `false`, which disables coercion for the index. However, the `stock_count` field overrides this setting and enables coercion for this specific field. diff --git a/_field-types/mapping-parameters/copy-to.md b/_field-types/mapping-parameters/copy-to.md new file mode 100644 index 0000000000..b029f814b5 --- /dev/null +++ b/_field-types/mapping-parameters/copy-to.md @@ -0,0 +1,109 @@ +--- +layout: default +title: Copy_to +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 20 +has_children: false +has_toc: false +--- + +# Copy_to + +The `copy_to` parameter allows you to copy the values of multiple fields into a single field. This parameter can be useful if you often search across multiple fields because it allows you to search the group field instead. + +Only the field value is copied and not the terms resulting from the analysis process. The original `_source` field remains unmodified, and the same value can be copied to multiple fields using the `copy_to` parameter. However, recursive copying through intermediary fields is not supported; instead, use `copy_to` directly from the originating field to multiple target fields. + +--- + +## Examples + +The following example uses the `copy_to` parameter to search for products by their name and description and copy those values into a single field: + +```json +PUT my-products-index +{ + "mappings": { + "properties": { + "name": { + "type": "text", + "copy_to": "product_info" + }, + "description": { + "type": "text", + "copy_to": "product_info" + }, + "product_info": { + "type": "text" + }, + "price": { + "type": "float" + } + } + } +} + +PUT my-products-index/_doc/1 +{ + "name": "Wireless Headphones", + "description": "High-quality wireless headphones with noise cancellation", + "price": 99.99 +} + +PUT my-products-index/_doc/2 +{ + "name": "Bluetooth Speaker", + "description": "Portable Bluetooth speaker with long battery life", + "price": 49.99 +} +``` +{% include copy-curl.html %} + +In this example, the values from the `name` and `description` fields are copied into the `product_info` field. You can now search for products by querying the `product_info` field, as follows: + +```json +GET my-products-index/_search +{ + "query": { + "match": { + "product_info": "wireless headphones" + } + } +} +``` +{% include copy-curl.html %} + +## Response + +```json +{ + "took": 20, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1.9061546, + "hits": [ + { + "_index": "my-products-index", + "_id": "1", + "_score": 1.9061546, + "_source": { + "name": "Wireless Headphones", + "description": "High-quality wireless headphones with noise cancellation", + "price": 99.99 + } + } + ] + } +} +``` + diff --git a/_field-types/dynamic.md b/_field-types/mapping-parameters/dynamic.md similarity index 96% rename from _field-types/dynamic.md rename to _field-types/mapping-parameters/dynamic.md index 59f59bfe3d..abb0a7cb6d 100644 --- a/_field-types/dynamic.md +++ b/_field-types/mapping-parameters/dynamic.md @@ -1,18 +1,22 @@ --- layout: default -title: Dynamic parameter -nav_order: 10 +title: Dynamic +parent: Mapping parameters +grand_parent: Mapping and field types +nav_order: 25 +has_children: false +has_toc: false redirect_from: - /opensearch/dynamic/ --- -# Dynamic parameter +# Dynamic The `dynamic` parameter specifies whether newly detected fields can be added dynamically to a mapping. It accepts the parameters listed in the following table. Parameter | Description :--- | :--- -`true` | Specfies that new fields can be added dynamically to the mapping. Default is `true`. +`true` | Specifies that new fields can be added dynamically to the mapping. Default is `true`. `false` | Specifies that new fields cannot be added dynamically to the mapping. If a new field is detected, then it is not indexed or searchable but can be retrieved from the `_source` field. `strict` | Throws an exception. The indexing operation fails when new fields are detected. `strict_allow_templates` | Adds new fields if they match predefined dynamic templates in the mapping. diff --git a/_field-types/mapping-parameters/index.md b/_field-types/mapping-parameters/index.md new file mode 100644 index 0000000000..ca5586bb8f --- /dev/null +++ b/_field-types/mapping-parameters/index.md @@ -0,0 +1,28 @@ +--- +layout: default +title: Mapping parameters +nav_order: 75 +has_children: true +has_toc: false +--- + +# Mapping parameters + +Mapping parameters are used to configure the behavior of index fields. For parameter use cases, see a mapping parameter's respective page. + +The following table lists OpenSearch mapping parameters. + +Parameter | Description +:--- | :--- +`analyzer` | Specifies the analyzer used to analyze string fields. Default is the `standard` analyzer, which is a general-purpose analyzer that splits text on white space and punctuation, converts to lowercase, and removes stop words. Allowed values are `standard`, `simple`, and `whitespace`. +`boost` | Specifies a field-level boost factor applied at query time. Allows you to increase or decrease the relevance score of a specific field during search queries. Default boost value is `1.0`, which means no boost is applied. Allowed values are any positive floating-point number. +`coerce` | Controls how values are converted to the expected field data type during indexing. Default value is `true`, which means that OpenSearch tries to coerce the value to the expected value type. Allowed values are `true` or `false`. +`copy_to` | Copies the value of a field to another field. There is no default value for this parameter. Optional. +`doc_values` | Specifies whether a field should be stored on disk to make sorting and aggregation faster. Default value is `true`, which means that the doc values are enabled. Allowed values are a single field name or a list of field names. +`dynamic` | Determines whether new fields should be added dynamically. Default value is `true`, which means that new fields can be added dynamically. Allowed values are `true`, `false`, or `strict`. +`enabled` | Specifies whether the field is enabled or disabled. Default value is `true`, which means that the field is enabled. Allowed values are `true` or `false`. +`format` | Specifies the date format for date fields. There is no default value for this parameter. Allowed values are any valid date format string, such as `yyyy-MM-dd` or `epoch_millis`. +`ignore_above` | Skips indexing values that exceed the specified length. Default value is `2147483647`, which means that there is no limit on the field value length. Allowed values are any positive integer. +`ignore_malformed` | Specifies whether malformed values should be ignored. Default value is `false`, which means that malformed values are not ignored. Allowed values are `true` or `false`. +`index` | Specifies whether a field should be indexed. Default value is `true`, which means that the field is indexed. Allowed values are `true`, `false`, or `not_analyzed`. +`index_options` | Specifies what information should be stored in an index for scoring purposes. Default value is `docs`, which means that only the document numbers are stored in the index. Allowed values are `docs`, `freqs`, `positions`, or `offsets`. \ No newline at end of file