-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage] Remove dependency on archive flag in ES reader #6490
base: main
Are you sure you want to change the base?
[storage] Remove dependency on archive flag in ES reader #6490
Conversation
@@ -124,6 +123,7 @@ func NewSpanReader(p SpanReaderParams) *SpanReader { | |||
if p.UseReadWriteAliases { | |||
maxSpanAge = rolloverMaxSpanAge | |||
} | |||
archive := maxSpanAge == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro do we want to remove the concept of archive altogether? should we call this variable something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in the reader / writer (only leave it in Factory for now)
@yurishkuro How should we handle the archive namespacing in the reader and factory as they are currently hardcoded. |
20e7ccc
to
944f4e0
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6490 +/- ##
===========================================
+ Coverage 50.22% 96.25% +46.03%
===========================================
Files 188 372 +184
Lines 11403 21283 +9880
===========================================
+ Hits 5727 20486 +14759
+ Misses 5218 609 -4609
+ Partials 458 188 -270
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
712c974
to
9adf03e
Compare
plugin/storage/es/factory.go
Outdated
@@ -218,7 +218,8 @@ func (f *Factory) CreateArchiveSpanReader() (spanstore.Reader, error) { | |||
if !f.archiveConfig.Enabled { | |||
return nil, nil | |||
} | |||
sr, err := createSpanReader(f.getArchiveClient, f.archiveConfig, true, f.logger, f.tracer) | |||
// TODO: should use_aliases be always set to true here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro thoughts on this? is it okay to just respect the user input here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think you need to override use_aliases=true
here, because it may not be set via CLI flags. jaeger-v2 won't go through this code path at all, so for v2 user input would be respected.
if p.Archive { | ||
return func(_ time.Time) (string, string) { | ||
if p.UseReadWriteAliases { | ||
return archiveIndex(spanIndexPrefix, archiveWriteIndexSuffix), "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to test that it would still go to the same index names by default (may need to set some params in CreateAechiveWriter
67145ac
to
2f7a9be
Compare
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
2f7a9be
to
0ae679d
Compare
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
plugin/storage/es/factory.go
Outdated
suffix := "archive" | ||
if f.archiveConfig.UseReadWriteAliases { | ||
suffix += "-read" | ||
} | ||
sr, err := createSpanReader(f.getArchiveClient, f.archiveConfig, f.logger, f.tracer, suffix, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro Not ideal to have to do this but this is roughly what we need to maintain backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want to expose a configuration option to override the default read
and write
alias but that could be one way to streamline this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- who adds read/write suffix to the index name?
- how does that work for non-archive, does it not use
-read
suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who adds read/write suffix to the index name?
Its added by the reader and the writer but we need to handle the special case where the archive storage also has use_aliases set to true
how does that work for non-archive, does it not use -read suffix?
Yeah it does - its done by the reader and the writer. I made some changes to make it an else-if change. The current behavior is that if a suffix is passed in, we use that. If a suffix isn't passed in but use_aliases is set, then we use the default read/write aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I don't follow - if reader/writer automatically add -read/-write
suffix, why do you need to add it manually above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add it above just to address the case where use_aliases
wasn't initially set for the archive storage. Since we hardcode it to true for the archive storage now, we'll always set the index to be archive-read
and archive-write
in the reader/writer. But that would break backwards compatibility because previously it would just be archive
for both the read and write alias for the archive storage.
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
return func(query elastic.Query, nextTime uint64) *elastic.SearchSource { | ||
s := elastic.NewSearchSource(). | ||
Query(query). | ||
Size(maxDocCount) | ||
if !archive { | ||
if !useReadWriteAliases { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a bit suspicious. If you have a query that specifies the time range, e.g. (t-1d, t)
, and the index rotation frequency is 1 day with 7 days retention, then the read alias will cover all 7 days. It's more efficient to give the time range to the query because it could (theoretically) perform filtering of the indices based on the time range first and not search all 7 of them.
So question: do we have to exclude these two clauses? What's the upstream call path to this function, does it include both FindTraces and GetTrace? Where does the GetTrace get the time frame (since we didn't have it as part of GetTrace API until very recently)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just reading about ES aliases and I may be wrong, it doesn't look like it actually optimizes which indices to search. But since each index will have an inverted index no the time, ES would be able to quickly discard those indices if the query itself contains a date_range
. So my question still stands - should we be really dropping the nextTime
filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro Here is where the GetTrace
function gets the time frame. It subtracts the max span age from the current time to get the start time and then use the current time as the end time (and then looks in 1 hour past in each direction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed up a change that keeps the filter to see how the CI reacts. I think I tried this before and the integration tests complained but that may have been due to a different reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro It looks good!
plugin/storage/es/factory.go
Outdated
if f.archiveConfig.UseReadWriteAliases { | ||
writeAlias += "-write" | ||
} | ||
return createSpanWriter(f.getArchiveClient, f.archiveConfig, f.archiveMetricsFactory, f.logger, writeAlias, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro With the latest change, we may actually not even need to hardcode use_aliases
to true anymore since the only differentiator that remains is the actual alias that is used. If we want, we can change the ES reader to just take an optional suffix and then do something like this in getTimeRangeIndexFn
where suffix directly replaces the archive
bool
if (suffix != "") {
if (use_aliases) {
return suffix + read
}
return suffix
}
if (use_aliases) {
return read
}
return time ranged indices
Do you have a preference between the approach we currently have or the one that I just described?
var archiveSuffix string | ||
func getTimeRangeIndexFn(readAlias string, useReadWriteAliases bool, remoteReadClusters []string) timeRangeIndexFn { | ||
if readAlias != "" { | ||
var suffix string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can simplify
suffix := readAlias
if useReadWriteAliases {
suffix += "-read"
}
} | ||
return archiveIndex(spanIndexPrefix, archiveIndexSuffix), "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in the archive mode we never needed to write into service index since archive storage is only used to retrieve traces by ID, not to do any searches. Not sure if we can infer the same from some other signal / parameter.
plugin/storage/es/factory.go
Outdated
@@ -218,7 +218,7 @@ func (f *Factory) CreateArchiveSpanReader() (spanstore.Reader, error) { | |||
if !f.archiveConfig.Enabled { | |||
return nil, nil | |||
} | |||
sr, err := createSpanReader(f.getArchiveClient, f.archiveConfig, true, f.logger, f.tracer) | |||
sr, err := createSpanReader(f.getArchiveClient, f.archiveConfig, f.logger, f.tracer, "archive") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make this new parameter part of the config and pass it that way, otherwise how would v2 be able to do it since we don't want it to call CreateArchiveSpanReader in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro +1 - would you prefer it done in this PR? I was initially going to create a follow-up for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@yurishkuro A bit of a summary on how things are working in v1 today: For the archive reader, there can be two suffixes for the alias today:
Likewise, for
This PR makes a change to remove the
However, this shouldn't be necessary because if each storage has a unique indexPrefix, then needing another alias shouldn't be required. So, let's assume we decide not to expose the func getSourceFn(readAlias string, maxDocCount int) sourceFn {
return func(query elastic.Query, nextTime uint64) *elastic.SearchSource {
s := elastic.NewSearchSource().
Query(query).
Size(maxDocCount)
if readAlias == "" {
s.Sort("startTime", true).
SearchAfter(nextTime)
}
return s
}
} Consider the following block in the if s.useReadWriteAliases {
startTimeRangeQuery := s.buildStartTimeQuery(startTime.Add(-time.Hour*24), endTime.Add(time.Hour*24))
query = query.Must(startTimeRangeQuery)
}
if val, ok := searchAfterTime[traceID]; ok {
nextTime = val
}
s := s.sourceFn(query, nextTime) One other solution that we could think about is changing the |
a392964
to
00e90a9
Compare
This condition if readAlias == "" {
s.Sort("startTime", true).
SearchAfter(nextTime)
} looks very fishy to me. It's not whether we actually use aliases (the boolean), but it's checking if the suffix of the alias is blank - that kind of difference cannot be driving different behavior of the search. 3 months later I would never be able to explain why this is happening based on Your analysis is incomplete, so it makes it difficult to understand. It's not just the suffixes that are changing, the way indices are organized is also changing, so the logic should be based on the latter, not on the index naming. Why is it that Sort/SearchAfter need to be skipped in some cases? |
@yurishkuro The test span in the func (s *SpanReader) GetTrace(ctx context.Context, query spanstore.GetTraceParameters) (*model.Trace, error) {
ctx, span := s.tracer.Start(ctx, "GetTrace")
defer span.End()
currentTime := time.Now()
// TODO: use start time & end time in "query" struct
traces, err := s.multiRead(ctx, []model.TraceID{query.TraceID}, currentTime.Add(-s.maxSpanAge), currentTime) The startTime is set to 72 hours in the past. If we look inside multiRead, we have nextTime := model.TimeAsEpochMicroseconds(startTime.Add(-time.Hour)) which is used in s := s.sourceFn(query, nextTime) The distinction is needed because without it, we would only query data that is newer than 73 hours in the case of the integration test, but we don't want that limit when looking for traces in the archive storage. Should we introduce a new configuration that does that? |
good find, so this is what I meant by repurposing maxSpanAge - if it' set large enough for the archive storage (to reflect the actual retention), does this this bifurcation goes away? |
Yeah - in that case it should. We could try it out by changing the startTime in the integration test to be within the maxSpanAge. Although, this would result in a breaking change, right? |
What specifically would be the breaking change? From your explanation it sounds like maxSpanAge is not used for archive queries to day at all, so in create-archive we can set it to a large number like 10yrs, this will be backwards compatible. |
Ah okay got it - makes sense. It looks like we already do this for maxSpanAge := p.MaxSpanAge
// Setting the maxSpanAge to a large duration will ensure all spans in the "read" alias are accessible by queries (query window = [now - maxSpanAge, now]).
// When read/write aliases are enabled, which are required for index rollovers, only the "read" alias is queried and therefore should not affect performance.
if p.UseReadWriteAliases {
maxSpanAge = rolloverMaxSpanAge
} It would be nice to leverage that flag but doesn't work because of the |
00e90a9
to
6137308
Compare
what specifically is the issue? |
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
07c5976
to
6cf08e8
Compare
Signed-off-by: Mahad Zaryab <[email protected]>
@yurishkuro We need to be able to respect the case where archive storage has |
Signed-off-by: Mahad Zaryab <[email protected]>
Oh - just saw #6519 (review). Looks like we're on the same page then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in good shape, with some minor tweaks for readability.
@@ -136,9 +139,9 @@ func NewSpanReader(p SpanReaderParams) *SpanReader { | |||
spanConverter: dbmodel.NewToDomain(p.TagDotReplacement), | |||
timeRangeIndices: getLoggingTimeRangeIndexFn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: getLoggingTimeRangeIndexFn
is a decorator here. Below we use another decorator addRemoteReadClusters
, which is also always applied. I recommend moving addRemoteReadClusters
to be attached here, instead of inside getTimeRangeIndexFn
, to make getTimeRangeIndexFn
easier to read.
@@ -119,10 +116,16 @@ type SpanReaderParams struct { | |||
// NewSpanReader returns a new SpanReader with a metrics. | |||
func NewSpanReader(p SpanReaderParams) *SpanReader { | |||
maxSpanAge := p.MaxSpanAge | |||
readAlias := "" | |||
// Setting the maxSpanAge to a large duration will ensure all spans in the "read" alias are accessible by queries (query window = [now - maxSpanAge, now]). | |||
// When read/write aliases are enabled, which are required for index rollovers, only the "read" alias is queried and therefore should not affect performance. | |||
if p.UseReadWriteAliases { | |||
maxSpanAge = rolloverMaxSpanAge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should give rolloverMaxSpanAge
a better name, like dawnOfTimeSpanAge
.
archiveSuffix = archiveIndexSuffix | ||
} | ||
func getTimeRangeIndexFn(readAlias string, remoteReadClusters []string) timeRangeIndexFn { | ||
if readAlias != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like it broke the original logic. When aliases are used (whatever the prefix is) we always want to return a single index name, never fall back to timeRangeIndices
. But in the current form that only happens if readAlias
is not blank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see why it works - because readAlias is never empty if p.UseReadWriteAliases == true
. In this case I would recommend changing the condition to:
if readAlias != "" { | |
if useReadWriteAliases { |
It requires an extra arg to the function, but it's much better representation of the reasons for the code branching.
}) | ||
} | ||
|
||
func TestSpanReader_ArchiveTraces_ReadAlias(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this test? It seems it was counting on different index names, so useful to keep as backwards compatibility guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro Its because we don't concatenate the two inside the reader anymore. That logic is handled by the factory. If use_aliases is set and the read alias suffix is set, then we just use that suffix instead of concatenating the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we model that by altering withArchiveSpanReader
constructor? Specifically to test the condition where the indices become ***archive
vs ***archive-read
(which I understand happens depending on use_aliases
being set or not). You don't have any significant changes to the tests for the factory, but that's where the distinction is being captured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro That would just be a difference of changing ArchiveIndexSuffix
to be archive
vs archive-read
because the span reader just respects what is passed in today. This test would need to go into the factory but there isn't a nice way to do that right now because the reader is a concrete type so we can't use a mock to see what is passed into the createSpanReader function. We could make some changes to make the create functions parameters so they can be injected for testing.
} | ||
} | ||
|
||
if p.UseReadWriteAliases { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to reader, please continue using this boolean condition instead of alias!=""
@@ -109,7 +106,7 @@ type SpanReaderParams struct { | |||
SpanIndex cfg.IndexOptions | |||
ServiceIndex cfg.IndexOptions | |||
TagDotReplacement string | |||
Archive bool | |||
ReadAlias string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadAlias string | |
ReadAliasSuffix string |
let's be more precise, this code already has unnecessarily high complexity to complicate it further with inaccurate names
Signed-off-by: Mahad Zaryab <[email protected]>
Which problem is this PR solving?
Description of the changes
use_aliases
configuration so thearchive
bool in the archive reader was mostly replaced by theuse_aliases
configuration (more details on this are in the description of Phase out the distinction between primary and archive storage #6065).CreateArchiveSpanReader
andCreateArchiveSpanWriter
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:npm run lint
andnpm run test