From 5a496e715c04b96fddcf0f0ce3e9f20727dfcae5 Mon Sep 17 00:00:00 2001 From: Alan Wang Date: Wed, 28 Aug 2024 06:13:58 +0800 Subject: [PATCH] Remove the description of esCleaner.py from plugin/storage/es/README.md (#5891) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit esCleaner.py no longer exists, but README.md still has the relevant content, so I've made some adjustments. --------- Signed-off-by: 王然 Signed-off-by: Yuri Shkuro Co-authored-by: Yuri Shkuro Signed-off-by: Jared Tan --- cmd/es-index-cleaner/README.md | 17 +++++++++++++++++ plugin/storage/es/README.md | 25 ++++++++----------------- 2 files changed, 25 insertions(+), 17 deletions(-) create mode 100644 cmd/es-index-cleaner/README.md diff --git a/cmd/es-index-cleaner/README.md b/cmd/es-index-cleaner/README.md new file mode 100644 index 000000000000..d6b063c65643 --- /dev/null +++ b/cmd/es-index-cleaner/README.md @@ -0,0 +1,17 @@ +# es-index-cleaner + +It is common to only keep observability data for a limited time. +However, Elasticsearch does no support expiring of old data via TTL. +To help with this task, `es-index-cleaner` can be used to purge +old Jaeger indices. For example, to delete indixes older than 14 days: + +``` +docker run -it --rm --net=host -e ROLLOVER=true \ + jaegertracing/jaeger-es-index-cleaner:latest \ + 14 \ + http://localhost:9200 +``` + +Another alternative is to use [Elasticsearch Curator][curator]. + +[curator]: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/about.html diff --git a/plugin/storage/es/README.md b/plugin/storage/es/README.md index 78eeaaf8f7b7..9f4f1cc317dc 100644 --- a/plugin/storage/es/README.md +++ b/plugin/storage/es/README.md @@ -5,19 +5,10 @@ This provides a storage backend for Jaeger using [Elasticsearch](https://www.ela ## Indices Indices will be created depending on the spans timestamp. i.e., a span with a timestamp on 2017/04/21 will be stored in an index named `jaeger-2017-04-21`. -ElasticSearch also has no support for TTL, so there exists a script `./esCleaner.py` -that deletes older indices automatically. The [Elastic Curator](https://www.elastic.co/guide/en/elasticsearch/client/curator/current/about.html) -can also be used instead to do a similar job. -### Using `./esCleaner.py` -The script is using `python3`. All dependencies can be installed with: `python3 -m pip install elasticsearch elasticsearch-curator`. - -Parameters: - * Environment variable TIMEOUT that sets the timeout in seconds for indices deletion (default: 120) - * Optional environment variable ES_USERNAME and ES_PASSWORD - * a number that will delete any indices older than that number in days - * ElasticSearch hostnames - * Example usage: `TIMEOUT=120 ./esCleaner.py 4 localhost:9200` +It is common to only keep observability data for a limited time. +However, Elasticsearch does no support expiring of old data via TTL. +To purge old Jaeger indices, use [jaeger-es-index-cleaner](../../../cmd/es-index-cleaner/). ### Timestamps Because ElasticSearch's `Date` datatype has only millisecond granularity and Jaeger @@ -25,13 +16,13 @@ requires microsecond granularity, Jaeger spans' `StartTime` is saved as a long t The conversion is done automatically. ### Nested fields (tags) -`Tags` are [nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) fields in the +`Tags` are [nested](https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) fields in the ElasticSearch schema used for Jaeger. This allows for better search capabilities and data retention. However, because ElasticSearch creates a new document for every nested field, there is currently a limit of 50 nested fields per document. ### Shards and Replicas -Number of shards and replicas per index can be specified as parameters to the writer and/or through configs under -`./pkg/es/config/config.go`. If not specified, it defaults to ElasticSearch defaults: 5 shards and 1 replica. +Number of shards and replicas per index can be specified as parameters to the writer and/or through configs under +`./pkg/es/config/config.go`. If not specified, it defaults to ElasticSearch defaults: 5 shards and 1 replica. [This article](https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index) goes into more information about choosing how many shards should be chosen for optimization. @@ -42,7 +33,7 @@ This plugin queries against spans. This means that all tags in a query must lie query to successfully return a trace. ### Case-sensitivity -Queries are case-sensitive. For example, if a document with service name `ABC` is searched using a query `abc`, +Queries are case-sensitive. For example, if a document with service name `ABC` is searched using a query `abc`, the document will not be retrieved. ## Testing @@ -57,6 +48,6 @@ and that script be run from the top folder to integration test ElasticSearch as This script requires Docker to be running. ### Adding tests -Integration test framework for storage lie under `../integration`. +Integration test framework for storage lie under `../integration`. Add to `../integration/fixtures/traces/*.json` and `../integration/fixtures/queries.json` to add more trace cases.