-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpanKind support for badger #6376
base: main
Are you sure you want to change the base?
Conversation
I have changed the structure of cache which is leading to these concerns:
Once the correct approach is discussed I will handle some more edge cases and make the e2e tests pass (making |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6376 +/- ##
==========================================
- Coverage 96.24% 96.15% -0.10%
==========================================
Files 373 374 +1
Lines 21406 21588 +182
==========================================
+ Hits 20602 20757 +155
- Misses 612 632 +20
- Partials 192 199 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@yurishkuro Please review the approach and problems! |
@yurishkuro I have added more changes which reduces the iterations in prefill to 1 but it limits the |
I have an idea for old data without using the migration script! We can store the old data in two other data structures in cache (without kind). But then the only question which rises then: What to return when no span kind is given by user? Operations of new data of all kind or operations of old data (kind marked as unspecified) or an addition of both? |
then we should return all operations regardless of the span kind |
That means including all spans of old data also (Whose kind is not there in cache)? |
My current approach is leading to errors in unit test of
This is probably because
The only problem is that, during prefilling 6*NumberOfOperations Get Queries will be called. Please review this approach @yurishkuro and I think we need to discuss about autoCreation of new index or should we skip the creation of any new index and use the function given above? |
Signed-off-by: Manik2708 <[email protected]>
@yurishkuro I finally got rid of migration and now I think its ready for review! Please ignore my previous comments. The current commit has no linkage them! |
Signed-off-by: Manik2708 <[email protected]>
I have fixed all the tests except that of Update and Prefill, even they are also not manipulating the data structure, they are used just to check whether cache is storing by using the update or prefill |
@yurishkuro Can you please review? |
Signed-off-by: Manik2708 <[email protected]>
Signed-off-by: Manik2708 <[email protected]>
Q: do we have to maintain two indices forever, or is this only a side-effect of having to be backwards compatible with the existing data? For example, one way I could see this working is:
|
The key :
Then finding the trace ids would also work fine. So either we have to create an extra index or do this extra scanning! |
This index doesn't make sense to me. It cannot effectively support a query that only includes service+operation, you must always know the Wouldn't it make more sense to append the
|
We can try this but then we need to remember that it will break these conventions:
|
Why is it "breaking" if kind introduced after Time but not "breaking" when it's before Time? Whatever we do the changes must be backwards compatible. |
Please see this: jaeger/plugin/storage/badger/spanstore/writer.go Lines 117 to 128 in 1ae9c1a
This is how we are creating a key, when |
why does it matter? We're creating an index with a different layout, we don't have to be restricted by how that specific function is implemented, especially since we are introducing a different look up process (it seems all other indices are doing direct lookup by the prefix up to the timestamp and then scan / parse). |
Ok, will give it a try and get back to you! Thanks for your time! |
Signed-off-by: Manik2708 <[email protected]>
@yurishkuro I have tried to take care of all the edge cases, please review! |
Signed-off-by: Manik2708 <[email protected]>
@yurishkuro This PR is ready to review, I have added dual lookups and backward compatibility tests in this PR. |
@@ -42,6 +42,9 @@ type Config struct { | |||
// ReadOnly opens the data store in read-only mode. Multiple instances can open the same | |||
// store in read-only mode. Values still in the write-ahead-log must be replayed before opening. | |||
ReadOnly bool `mapstructure:"read_only"` | |||
// DualLookUp enables the look-up in old indexes of jaeger which are not deprecated. | |||
// By-default it is enabled and it is suggested not to disable it | |||
DualLookUp bool `mapstructure:"dual_look_up"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not expose this in the config. We should use a feature gate instead - see #6568
@@ -72,6 +75,7 @@ func DefaultConfig() *Config { | |||
}, | |||
MaintenanceInterval: defaultMaintenanceInterval, | |||
MetricsUpdateInterval: defaultMetricsUpdateInterval, | |||
DualLookUp: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read default value from feature gate
} | ||
err := writer.writeSpanWithOldIndex(&oldSpan) | ||
require.NoError(t, err) | ||
traces, err := reader.FindTraces(context.Background(), &spanstore.TraceQueryParameters{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I follow this test. What does FindTraces have to do with span kind in the operations retrieval? Also, backwards compatibility test only makes sense when it is executed against old and new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have changed the key but we need to make sure that traces are also fetched from old key when dual lookup is turned on. Please stress on a fact that operation key is used in getting traces also along with filling in cache, If you will look at this code, we are first writing span with old key and then testing whether it is able to fetch traces associated with that key (please see L42)
} | ||
*/ | ||
// The uint64 value is the expiry time of operation | ||
operations map[string]map[model.SpanKind]map[string]uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify, CacheStore is used to avoid expensive scans when loading services and operations, correct? In other words, it's all in-memory structure. In this case, why can we not change just the value of the map to be a combo {kind, expiration}
instead of changing the structure? When loading, scanning everything for a give service is still going to be negligible amount of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't understand this! Are you saying to keep these structures?
services map[string]uint64 // Already in the cache
operations map[string][string]kind
type kind struct {
kind SpanKind
expiry uint64
}
If yes, then how to handle when query is to fetch all operations for a service and kind? Should we iterate all operations and skip those operations which are not of the required kind? (We are using a similar approach currently, i.e iteralting for all kinds and skipping unrequired kinds but this was justified because max kinds can be 6 but number of operations aren't defined, so will this option viable?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So iterating all operations and skipping not required kinds will be right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
@@ -77,6 +93,7 @@ func (c *CacheStore) loadServices() { | |||
func (c *CacheStore) loadOperations(service string) { | |||
c.store.View(func(txn *badger.Txn) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a side note, but I find it very bizarre that Cache struct has the db loading logic, such logic should only be in the reader, and either Cache delegates to the reader or the reader delegates to the cache. Maybe a separate PR to refactor that first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But have a question over this: Currently Reader requires cache to serve operations and services and if we want cache to read from Reader wouldn't it create circular dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end user is only exposed to the Reader API, using cache internally is reader's business, it can orchestrate interaction with the cache as it wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I mean to say badger is the first object to get instantiated. Cache depends on store which is then instantiated and then Reader and Writer needs to instantiate both of which depends on store and cache. So when cache is as elemental as badger store, how it can depend on reader? And shouldn't cache be directly contacting the db instead of getting dependent on some other service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you heard of "clean architecture"? I'm not a big fan of it overall but the fundamental principles are sound. So how would you describe the responsibilities of each component here? What is the onion structure? Each component should perform well encapsulated function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion, I was wondering this without looking at the code! Thanks for your time and reply!
type badgerSpanKind int | ||
|
||
const ( | ||
badgerSpanKindUnspecified badgerSpanKind = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are using this as database values, please don't use implicit assignment via iota, give the concrete values that will never change. iota-based enum can change if the order is changed.
entriesToStore, | ||
w.createBadgerEntry( | ||
createOperationWithKindIndexKey( | ||
[]byte(span.Process.ServiceName+span.OperationName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please pass these separately. Is there any other way to call createOperationWithKindIndexKey?
@@ -128,6 +142,22 @@ func createIndexKey(indexPrefixKey byte, value []byte, startTime uint64, traceID | |||
return key | |||
} | |||
|
|||
func createOperationWithKindIndexKey(value []byte, startTime uint64, traceID model.TraceID, kind model.SpanKind) []byte { | |||
// KEY: indexKey<indexValue><startTime><spanKind><traceId> (traceId is last 16 bytes of the key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spell out indexValue
copy(key[1:pos], value) | ||
binary.BigEndian.PutUint64(key[pos:], startTime) | ||
pos += 8 | ||
key[pos] = getBadgerSpanKind(kind).String()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're reading the first character of the string - this is a bad pattern. Please define the enum as alias to byte
type and explicitly assign letters.
@@ -179,3 +209,51 @@ func createTraceKV(span *model.Span, encodingType byte, startTime uint64) ([]byt | |||
|
|||
return key, bb, err | |||
} | |||
|
|||
// This method is only for testing purpose to test backward compatibility, once dual-lookup is removed, this method can be removed | |||
func (w *SpanWriter) writeSpanWithOldIndex(span *model.Span) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be the old way of writing? Why create a new method then instead of having a new method only for NEW way. That way the diff will be smaller and more readable.
Which problem is this PR solving?
Description of the changes
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:npm run lint
andnpm run test