You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Once #1805 is fixed, schema metadata won't be duplicated across multiple cells of the same table anymore, i.e. all the cells in an incoming batch will share the same schema.
That's a great start but it's far from enough: schema metadata will still be duplicated across tables/batches.
We need a way of deduplicating schema information on the server side (and why not the clients too while we're at it), e.g. by hashing DataTypes and making them available through a global registry so the deserializer can make sure to deduplicate the data on entry.
Of course another solution is to have the clients directly access the central schema registry on the server so they can only send hashes to begin with; but that's future work for when we need it.
The text was updated successfully, but these errors were encountered:
With the work done in #4883, this is now irrelevant in the context of scalar time series.
The only very specific case where this would have any noticeable impact is when logging scalars with the batcher completely disabled so that each scalar allocates a different DataType... except it wouldn't, since the datatype in that case is flat and so any kind of heap deduplication is pointless; we would have to fundamentally change the definition of DataType so it takes less stack space to start with (see also #4883 (comment)).
Where this could potentially have an impact is when logging enum values whose schemas contain a lot of strings... but even then, those strings are stored in the heap part of the datatype, which is already deduplicated.
So you would have to specifically log small enum values with big schemas and do so with the batcher disabled.
That's pretty niche and outside the scope of this cycle.
Once #1805 is fixed, schema metadata won't be duplicated across multiple cells of the same table anymore, i.e. all the cells in an incoming batch will share the same schema.
That's a great start but it's far from enough: schema metadata will still be duplicated across tables/batches.
We need a way of deduplicating schema information on the server side (and why not the clients too while we're at it), e.g. by hashing
DataType
s and making them available through a global registry so the deserializer can make sure to deduplicate the data on entry.Of course another solution is to have the clients directly access the central schema registry on the server so they can only send hashes to begin with; but that's future work for when we need it.
The text was updated successfully, but these errors were encountered: