Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove the need for external tools for managing ElasticSearch #6291

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

Rohanraj123
Copy link
Contributor

Which problem is this PR solving?

Description of the changes

  • I just added index-cleaner tool into jaeger binary. In furthur commits I will add other two as well. Also Documentation and tests will be added in furthur commits.

How was this change tested?

  • Pending

Checklist

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 3.07692% with 63 lines in your changes missing coverage. Please review.

Project coverage is 95.92%. Comparing base (39e3bfb) to head (e40a5f7).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
plugin/storage/es/index_cleaner.go 0.00% 28 Missing ⚠️
plugin/storage/es/index_rollover.go 0.00% 19 Missing ⚠️
plugin/storage/es/es_mapping_generator.go 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6291      +/-   ##
==========================================
- Coverage   96.20%   95.92%   -0.29%     
==========================================
  Files         356      359       +3     
  Lines       20416    20519     +103     
==========================================
+ Hits        19642    19683      +41     
- Misses        585      648      +63     
+ Partials      189      188       -1     
Flag Coverage Δ
badger_v1 8.78% <0.00%> (-0.12%) ⬇️
badger_v2 1.61% <0.00%> (-0.03%) ⬇️
cassandra-4.x-v1-manual 14.65% <0.00%> (-0.20%) ⬇️
cassandra-4.x-v2-auto 1.55% <0.00%> (-0.02%) ⬇️
cassandra-4.x-v2-manual 1.55% <0.00%> (-0.02%) ⬇️
cassandra-5.x-v1-manual 14.65% <0.00%> (-0.20%) ⬇️
cassandra-5.x-v2-auto 1.55% <0.00%> (-0.02%) ⬇️
cassandra-5.x-v2-manual 1.55% <0.00%> (-0.02%) ⬇️
elasticsearch-6.x-v1 18.36% <0.00%> (-0.25%) ⬇️
elasticsearch-7.x-v1 18.43% <0.00%> (-0.25%) ⬇️
elasticsearch-8.x-v1 18.60% <0.00%> (-0.25%) ⬇️
elasticsearch-8.x-v2 1.61% <0.00%> (-0.03%) ⬇️
grpc_v1 10.22% <0.00%> (-0.15%) ⬇️
grpc_v2 7.80% <0.00%> (-0.10%) ⬇️
kafka-v1 8.47% <0.00%> (-0.12%) ⬇️
kafka-v2 1.61% <0.00%> (-0.03%) ⬇️
memory_v2 1.61% <0.00%> (-0.03%) ⬇️
opensearch-1.x-v1 18.48% <0.00%> (-0.25%) ⬇️
opensearch-2.x-v1 18.48% <0.00%> (-0.25%) ⬇️
opensearch-2.x-v2 1.61% <0.00%> (-0.03%) ⬇️
tailsampling-processor 0.45% <0.00%> (-0.01%) ⬇️
unittests 94.83% <3.07%> (-0.28%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


// Validation function for Cleaner struct
func (c *Cleaner) Validate() error {
if c.Frequency <= 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validation can be declared in the field tags

type Cleaner struct {
Enabled bool `mapstructure:"enabled"`
Frequency time.Duration `mapstructure:"frequency"`
IncludeArchive bool `mapstructure:"include_archive"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in v2 archive storage is a separate instance, this property won't make sense


// ---- ES-specific config ----
// Cleaner holds the configuration for the Elasticsearch index Cleaner.
Cleaner Cleaner `mapstructure:"cleaner"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cleaner Cleaner `mapstructure:"cleaner"`
IndexCleaner IndexCleaner `mapstructure:"index_cleaner"`

@@ -585,3 +608,198 @@ func (c *Configuration) Validate() error {
_, err := govalidator.ValidateStruct(c)
return err
}

// Run starts the cleaner that runs periodically based on the Frequency field.
func (cfg *Configuration) RunCleaner() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config package is for configuration, not for business logic. This should be somewhere in plugin/storage/es

}

// deleteOldIndices deletes indices older than the configured MaxSpanAge.
func (cfg *Configuration) deleteOldIndices() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this logic already exist in in cmd/es-index-cleaner? Why do we need to duplicate it?

@@ -24,6 +24,8 @@ func main() {
command,
)

internal.StartCleaner(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's enabled by config option, no need to "start" anything expicitly

Signed-off-by: Rohanraj123 <[email protected]>
// Rollover struct for configuring roll over
type Rollover struct {
Enabled bool `mapstructure:"enabled" valid:"required"`
IndexPattern string `mapstructure:"index_pattern" valid:"required"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage already knows how indices are named, why is this arg needed?

// IndexCleaner struct for configuring index cleaning
type IndexCleaner struct {
Enabled bool `mapstructure:"enabled"`
Frequency time.Duration `mapstructure:"frequency" validate:"gt=0"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment describing the arg

type Rollover struct {
Enabled bool `mapstructure:"enabled" valid:"required"`
IndexPattern string `mapstructure:"index_pattern" valid:"required"`
Frequency time.Duration `mapstructure:"frequency" validate:"gt=0"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment

@Rohanraj123 Rohanraj123 marked this pull request as ready for review December 6, 2024 05:56
@Rohanraj123 Rohanraj123 requested a review from a team as a code owner December 6, 2024 05:56
@Rohanraj123
Copy link
Contributor Author

@yurishkuro Can you please review the PR once. So that i can see, whether going into right direction or not.

@@ -36,22 +36,22 @@ extensions:
index_prefix: "jaeger-main"
spans:
date_layout: "2006-01-02"
rollover_frequency: "day"
rollover_frequency: "1day"
Copy link
Member

@yurishkuro yurishkuro Dec 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create another PR (to merge before this one) that changes the RolloverFrequency field type to time.Duration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay!

}

// Run the esmapping-generator command
execCmd := exec.Command("go", "run", "./cmd/esmapping-generator")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you imagine this working in production if someone runs Jaeger as a container image? Where would they get go toolchain and Jaeger source code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you guide me here little

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you guide me here little

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first, I suggest you focus on one tool at a time, start with the cleaner only. The logic of the cleaner is already implemented in Go, we can call it directly as a library from the ES implementation. You can reuse all the sub-commands defined in cmd/es-cleaner, but you may have to move them out of main() into a reusable package (that would be a standalone refactoring and should be its own PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing - I think the easiest too to start with is es-mapping, because it's purely an on-demand command run with no need to set up periodic jobs.

I was wrong in the comment above - I don't think sub-commands will be needed once the cleaner/rollover are integrated into the main binary, because the only reason those sub-commands exist today is to allow the script to be run as externally managed cron job, so it needs to do different things. But if it's run as internal periodic job, it's one combined functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants