diff --git a/docs/reference/data-streams/logs.asciidoc b/docs/reference/data-streams/logs.asciidoc index 6bb98684544a3..3af5e09889a89 100644 --- a/docs/reference/data-streams/logs.asciidoc +++ b/docs/reference/data-streams/logs.asciidoc @@ -1,18 +1,20 @@ [[logs-data-stream]] == Logs data stream -preview::[Logs data streams and the logsdb index mode are in tech preview and may be changed or removed in the future. Don't use logs data streams or logsdb index mode in production.] +IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted +and self-managed Elasticsearch as of version 8.17, and is enabled by default for +logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}]. A logs data stream is a data stream type that stores log data more efficiently. In benchmarks, log data stored in a logs data stream used ~2.5 times less disk space than a regular data -stream. The exact impact will vary depending on your data set. +stream. The exact impact varies by data set. [discrete] [[how-to-use-logsds]] === Create a logs data stream -To create a logs data stream, set your index template `index.mode` to `logsdb`: +To create a logs data stream, set your <> `index.mode` to `logsdb`: [source,console] ---- @@ -31,10 +33,12 @@ PUT _index_template/my-index-template // TEST <1> The index mode setting. -<2> The index template priority. By default, Elasticsearch ships with an index template with a `logs-*-*` pattern with a priority of 100. You need to define a priority higher than 100 to ensure that this index template gets selected over the default index template for the `logs-*-*` pattern. See the <> for more information. +<2> The index template priority. By default, Elasticsearch ships with a `logs-*-*` index template with a priority of 100. To make sure your index template takes priority over the default `logs-*-*` template, set its `priority` to a number higher than 100. For more information, see <>. After the index template is created, new indices that use the template will be configured as a logs data stream. You can start indexing data and <>. +You can also set the index mode and adjust other template settings in <>. + //// [source,console] ---- @@ -46,154 +50,159 @@ DELETE _index_template/my-index-template [[logsdb-default-settings]] [discrete] -[[logsdb-synthtic-source]] +[[logsdb-synthetic-source]] === Synthetic source -By default, `logsdb` mode uses <>, which omits storing the original `_source` -field and synthesizes it from doc values or stored fields upon document retrieval. Synthetic source comes with a few -restrictions which you can read more about in the <> section dedicated to it. +If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <>, which omits storing the original `_source` +field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval. -NOTE: When dealing with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values -are preserved for <> reconstruction. In `logsdb`, the default value is `arrays`, -which retains both duplicate values and the order of entries but not necessarily the exact structure when it comes to -array elements or objects. Preserving duplicates and ordering could be critical for some log fields. This could be the -case, for instance, for DNS A records, HTTP headers, or log entries that represent sequential or repeated events. +If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field. -For more details on this setting and ways to refine or bypass it, check out <>. +Before using synthetic source, make sure to review the <>. + +When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values +are preserved for <> reconstruction. In `logsdb`, the default value is `arrays`, +which retains both duplicate values and the order of entries. However, the exact structure of +array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some +log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events. [discrete] [[logsdb-sort-settings]] === Index sort settings -The following settings are applied by default when using the `logsdb` mode for index sorting: +In `logsdb` index mode, the following sort settings are applied by default: -* `index.sort.field`: `["host.name", "@timestamp"]` - In `logsdb` mode, indices are sorted by `host.name` and `@timestamp` fields by default. For data streams, the - `@timestamp` field is automatically injected if it is not present. +`index.sort.field`: `["host.name", "@timestamp"]`:: +Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present. -* `index.sort.order`: `["desc", "desc"]` - The default sort order for both fields is descending (`desc`), prioritizing the latest data. +`index.sort.order`: `["desc", "desc"]`:: +Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data. -* `index.sort.mode`: `["min", "min"]` - The default sort mode is `min`, ensuring that indices are sorted by the minimum value of multi-value fields. +`index.sort.mode`: `["min", "min"]`:: +The `min` mode sorts indices by the minimum value of multi-value fields. -* `index.sort.missing`: `["_first", "_first"]` - Missing values are sorted to appear first (`_first`) in `logsdb` index mode. +`index.sort.missing`: `["_first", "_first"]`:: +Missing values are sorted to appear `_first`. -`logsdb` index mode allows users to override the default sort settings. For instance, users can specify their own fields -and order for sorting by modifying the `index.sort.field` and `index.sort.order`. +You can override these default sort settings. For example, to sort on different fields +and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see +<>. -When using default sort settings, the `host.name` field is automatically injected into the mappings of the -index as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and -retrieved based on the `host.name` and `@timestamp` fields. +When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields. -NOTE: If `subobjects` is set to `true` (which is the default), the `host.name` field will be mapped as an object field -named `host`, containing a `name` child field of type `keyword`. On the other hand, if `subobjects` is set to `false`, -a single `host.name` field will be mapped as a `keyword` field. +NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field +named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`, +a single `host.name` field is mapped as a `keyword` field. -Once an index is created, the sort settings are immutable and cannot be modified. To apply different sort settings, -a new index must be created with the desired configuration. For data streams, this can be achieved by means of an index -rollover after updating relevant (component) templates. +To apply different sort settings to an existing data stream, update the data stream's component templates, and then +perform or wait for a <>. -If the default sort settings are not suitable for your use case, consider modifying them. Keep in mind that sort -settings can influence indexing throughput, query latency, and may affect compression efficiency due to the way data -is organized after sorting. For more details, refer to our documentation on -<>. - -NOTE: For <>, the `@timestamp` field is automatically injected if not already present. -However, if custom sort settings are applied, the `@timestamp` field is injected into the mappings, but it is not +NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not automatically added to the list of sort fields. [discrete] -[[logsdb-specialized-codecs]] -=== Specialized codecs +[[logsdb-host-name]] +==== Existing data streams -`logsdb` index mode uses the `best_compression` <> by default, which applies {wikipedia}/Zstd[ZSTD] -compression to stored fields. Users are allowed to override it and switch to the `default` codec for faster compression -at the expense of slightly larger storage footprint. +If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied. -`logsdb` index mode also adopts specialized codecs for numeric doc values that are crafted to optimize storage usage. -Users can rely on these specialized codecs being applied by default when using `logsdb` index mode. +To avoid mapping conflicts, consider these options: -Doc values encoding for numeric fields in `logsdb` follows a static sequence of codecs, applying each one in the -following order: delta encoding, offset encoding, Greatest Common Divisor GCD encoding, and finally Frame Of Reference -(FOR) encoding. The decision to apply each encoding is based on heuristics determined by the data distribution. -For example, before applying delta encoding, the algorithm checks if the data is monotonically non-decreasing or -non-increasing. If the data fits this pattern, delta encoding is applied; otherwise, the next encoding is considered. +* **Adjust mappings:** Check your existing mappings to ensure that `host.name` is mapped as a keyword. -The encoding is specific to each Lucene segment and is also re-applied at segment merging time. The merged Lucene segment -may use a different encoding compared to the original Lucene segments, based on the characteristics of the merged data. +* **Change sorting:** If needed, you can remove `host.name` from the sort settings and use a different set of fields. Sorting by `@timestamp` can be a good fallback. + +* **Switch to a different <>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode. + +IMPORTANT: On existing data streams, `logsdb` mode is applied on <> (automatic or manual). + +[discrete] +[[logsdb-specialized-codecs]] +=== Specialized codecs -The following methods are applied sequentially: +By default, `logsdb` index mode uses the `best_compression` <>, which applies {wikipedia}/Zstd[ZSTD] +compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint. + +The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are +encoded using the following sequence of codecs: * **Delta encoding**: - a compression method that stores the difference between consecutive values instead of the actual values. + Stores the difference between consecutive values instead of the actual values. * **Offset encoding**: - a compression method that stores the difference from a base value rather than between consecutive values. + Stores the difference from a base value rather than between consecutive values. * **Greatest Common Divisor (GCD) encoding**: - a compression method that finds the greatest common divisor of a set of values and stores the differences - as multiples of the GCD. + Finds the greatest common divisor of a set of values and stores the differences as multiples of the GCD. * **Frame Of Reference (FOR) encoding**: - a compression method that determines the smallest number of bits required to encode a block of values and uses + Determines the smallest number of bits required to encode a block of values and uses bit-packing to fit such values into larger 64-bit blocks. +Each encoding is evaluated according to heuristics determined by the data distribution. +For example, the algorithm checks whether the data is monotonically non-decreasing or +non-increasing. If so, delta encoding is applied; otherwise, the process +continues with the next encoding method (offset). + +Encoding is specific to each Lucene segment and is reapplied when segments are merged. The merged Lucene segment +might use a different encoding than the original segments, depending on the characteristics of the merged data. + For keyword fields, **Run Length Encoding (RLE)** is applied to the ordinals, which represent positions in the Lucene segment-level keyword dictionary. This compression is used when multiple consecutive documents share the same keyword. [discrete] [[logsdb-ignored-settings]] -=== `ignore_malformed`, `ignore_above`, `ignore_dynamic_beyond_limit` +=== `ignore` settings + +The `logsdb` index mode uses the following `ignore` settings. You can override these settings as needed. + +[discrete] +[[logsdb-ignore-malformed]] +==== `ignore_malformed` -By default, `logsdb` index mode sets `ignore_malformed` to `true`. This setting allows documents with malformed fields -to be indexed without causing indexing failures, ensuring that log data ingestion continues smoothly even when some -fields contain invalid or improperly formatted data. +By default, `logsdb` index mode sets `ignore_malformed` to `true`. With this setting, documents with malformed fields +can be indexed without causing ingestion failures. -Users can override this setting by setting `index.mapping.ignore_malformed` to `false`. However, this is not recommended -as it might result in documents with malformed fields being rejected and not indexed at all. +[discrete] +[[logs-db-ignore-above]] +==== `ignore_above` In `logsdb` index mode, the `index.mapping.ignore_above` setting is applied by default at the index level to ensure -efficient storage and indexing of large keyword fields.The index-level default for `ignore_above` is set to 8191 -**characters**. If using UTF-8 encoding, this results in a limit of 32764 bytes, depending on character encoding. -The mapping-level `ignore_above` setting still takes precedence. If a specific field has an `ignore_above` value -defined in its mapping, that value will override the index-level `index.mapping.ignore_above` value. This default -behavior helps to optimize indexing performance by preventing excessively large string values from being indexed, while -still allowing users to customize the limit, overriding it at the mapping level or changing the index level default -setting. +efficient storage and indexing of large keyword fields.The index-level default for `ignore_above` is 8191 +_characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depending on character encoding. + +The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value +defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default +behavior helps to optimize indexing performance by preventing excessively large string values from being indexed. + +If you need to customize the limit, you can override it at the mapping level or change the index level default. + +[discrete] +[[logs-db-ignore-limit]] +==== `ignore_dynamic_beyond_limit` In `logsdb` index mode, the setting `index.mapping.total_fields.ignore_dynamic_beyond_limit` is set to `true` by -default. This allows dynamically mapped fields to be added on top of statically defined fields without causing document -rejection, even after the total number of fields exceeds the limit defined by `index.mapping.total_fields.limit`. The -`index.mapping.total_fields.limit` setting specifies the maximum number of fields an index can have (static, dynamic -and runtime). When the limit is reached, new dynamically mapped fields will be ignored instead of failing the document -indexing, ensuring continued log ingestion without errors. +default. This setting allows dynamically mapped fields to be added on top of statically defined fields, even when the total number of fields exceeds the `index.mapping.total_fields.limit`. Instead of triggering an index failure, additional dynamically mapped fields are ignored so that ingestion can continue. -NOTE: When automatically injected, `host.name` and `@timestamp` contribute to the limit of mapped fields. When -`host.name` is mapped with `subobjects: true` it consists of two fields. When `host.name` is mapped with -`subobjects: false` it only consists of one field. +NOTE: When automatically injected, `host.name` and `@timestamp` count toward the limit of mapped fields. If `host.name` is mapped with `subobjects: true`, it has two fields. When mapped with `subobjects: false`, `host.name` has only one field. [discrete] [[logsdb-nodocvalue-fields]] -=== Fields without doc values +=== Fields without `doc_values` -When `logsdb` index mode uses synthetic `_source`, and `doc_values` are disabled for a field in the mapping, -Elasticsearch may set the `store` setting to `true` for that field as a last resort option to ensure that the field's -data is still available for reconstructing the document’s source when retrieving it via +When the `logsdb` index mode uses synthetic `_source` and `doc_values` are disabled for a field in the mapping, +{es} might set the `store` setting to `true` for that field. This ensures that the field's +data remains accessible for reconstructing the document's source when using <>. -For example, this happens with text fields when `store` is `false` and there is no suitable multi-field available to -reconstruct the original value in <>. - -This automatic adjustment allows synthetic source to work correctly, even when doc values are not enabled for certain -fields. +For example, this adjustment occurs with text fields when `store` is `false` and no suitable multi-field is available for +reconstructing the original value. [discrete] [[logsdb-settings-summary]] -=== LogsDB settings summary +=== Settings reference -The following is a summary of key settings that apply when using `logsdb` index mode in Elasticsearch: +The `logsdb` index mode uses the following settings: * **`index.mode`**: `"logsdb"` diff --git a/docs/reference/data-streams/tsds.asciidoc b/docs/reference/data-streams/tsds.asciidoc index d0d6d4a455c63..1e1d56e5b4d93 100644 --- a/docs/reference/data-streams/tsds.asciidoc +++ b/docs/reference/data-streams/tsds.asciidoc @@ -17,7 +17,7 @@ metrics data. Only use a TSDS if you typically add metrics data to {es} in near real-time and `@timestamp` order. A TSDS is only intended for metrics data. For other timestamped data, such as -logs or traces, use a regular data stream. +logs or traces, use a <> or regular data stream. [discrete] [[differences-from-regular-data-stream]] diff --git a/docs/reference/images/index-mgmt/management-data-stream-fields.png b/docs/reference/images/index-mgmt/management-data-stream-fields.png new file mode 100644 index 0000000000000..605d49b80ab1f Binary files /dev/null and b/docs/reference/images/index-mgmt/management-data-stream-fields.png differ diff --git a/docs/reference/images/index-mgmt/management-data-stream.png b/docs/reference/images/index-mgmt/management-data-stream.png deleted file mode 100644 index 01534fdec2a23..0000000000000 Binary files a/docs/reference/images/index-mgmt/management-data-stream.png and /dev/null differ diff --git a/docs/reference/images/index-mgmt/management-index-templates.png b/docs/reference/images/index-mgmt/management-index-templates.png index 9188aa85e68cd..1ed004e85e71d 100644 Binary files a/docs/reference/images/index-mgmt/management-index-templates.png and b/docs/reference/images/index-mgmt/management-index-templates.png differ diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc index 1c8f1db216b75..d9b8f8802a04b 100644 --- a/docs/reference/index-modules.asciidoc +++ b/docs/reference/index-modules.asciidoc @@ -113,10 +113,9 @@ Index mode supports the following values: `standard`::: Standard indexing with default settings. -`time_series`::: Index mode optimized for storage of metrics documented in <>. +`tsds`::: _(data streams only)_ Index mode optimized for storage of metrics. For more information, see <>. -`logsdb`::: Index mode optimized for storage of logs. It applies default sort settings on the `hostname` and `timestamp` fields and uses <>. <> on different fields is still allowed. -preview:[] +`logsdb`::: _(data streams only)_ Index mode optimized for <>. [[routing-partition-size]] `index.routing_partition_size`:: diff --git a/docs/reference/indices/index-mgmt.asciidoc b/docs/reference/indices/index-mgmt.asciidoc index 7a78f9452b85e..73643dbfd4b3b 100644 --- a/docs/reference/indices/index-mgmt.asciidoc +++ b/docs/reference/indices/index-mgmt.asciidoc @@ -67,7 +67,7 @@ This value is the time period for which your data is guaranteed to be stored. Da Elasticsearch at a later time. [role="screenshot"] -image::images/index-mgmt/management-data-stream.png[Data stream details] +image::images/index-mgmt/management-data-stream-fields.png[Data stream details] * To view more information about a data stream, such as its generation or its current index lifecycle policy, click the stream's name. From this view, you can navigate to *Discover* to diff --git a/docs/reference/indices/put-index-template.asciidoc b/docs/reference/indices/put-index-template.asciidoc index 36fc66ecb90b8..9a31037546796 100644 --- a/docs/reference/indices/put-index-template.asciidoc +++ b/docs/reference/indices/put-index-template.asciidoc @@ -115,10 +115,10 @@ See <>. `index_mode`:: (Optional, string) Type of data stream to create. Valid values are `null` -(regular data stream) and `time_series` (<>). +(standard data stream), `time_series` (<>) and `logsdb` +(<>). + -If `time_series`, each backing index has an `index.mode` index setting of -`time_series`. +The template's `index_mode` sets the `index.mode` of the backing index. ===== `index_patterns`:: diff --git a/docs/reference/mapping/fields/synthetic-source.asciidoc b/docs/reference/mapping/fields/synthetic-source.asciidoc index f8666e2993d6a..ddbefb73f4522 100644 --- a/docs/reference/mapping/fields/synthetic-source.asciidoc +++ b/docs/reference/mapping/fields/synthetic-source.asciidoc @@ -1,17 +1,10 @@ [[synthetic-source]] ==== Synthetic `_source` -IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices -(indices that have `index.mode` set to `time_series`). For other indices, -synthetic `_source` is in technical preview. Features in technical preview may -be changed or removed in a future release. Elastic will work to fix -any issues, but features in technical preview are not subject to the support SLA -of official GA features. - Though very handy to have around, the source field takes up a significant amount of space on disk. Instead of storing source documents on disk exactly as you send them, Elasticsearch can reconstruct source content on the fly upon retrieval. -Enable this by using the value `synthetic` for the index setting `index.mapping.source.mode`: +To enable this https://www.elastic.co/subscriptions[subscription] feature, use the value `synthetic` for the index setting `index.mapping.source.mode`: [source,console,id=enable-synthetic-source-example] ---- @@ -30,7 +23,7 @@ PUT idx ---- // TESTSETUP -While this on the fly reconstruction is *generally* slower than saving the source +While this on-the-fly reconstruction is _generally_ slower than saving the source documents verbatim and loading them at query time, it saves a lot of storage space. Additional latency can be avoided by not loading `_source` field in queries when it is not needed.