Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

stats: data sent to exporters keeps growing over time #968

Closed
Thooms opened this issue Nov 5, 2018 · 4 comments
Closed

stats: data sent to exporters keeps growing over time #968

Thooms opened this issue Nov 5, 2018 · 4 comments
Labels

Comments

@Thooms
Copy link

Thooms commented Nov 5, 2018

Bug description

When using opencensus-go to record metrics, all the aggregations values for every combination of tags are sent to all the exporters all the time.

To Reproduce

Just record points over time, on a view that depends on a tag with high cardinality, and see the quantity of data sent to the exporters grow over time.

Expected behavior

Only the data that has changed is sent.

Additional notes

An easy way to solve this would simply to keep track of what data has changed in the collector object: https://github.com/census-instrumentation/opencensus-go/blob/master/stats/view/collector.go. What do you think? We're ready to provide a pull request if needed!

PS: we checked if there were any issue already opened on the subject, it doesn't seem so. Do not hesitate to tell me if this issue is a duplicate, or if we missed something in the documentation and this behavior is expected.

@semistrict
Copy link
Contributor

In general with time series databases (like most backends we support through the stats interface) high cardinality labels cause storage problems. So it's best to not use high-cardinality tags in views. Which backend are you targeting?

That said, I agree that it's not ideal for memory to grow unbounded. But I'm not sure if it would be enough to just drop the data after exporting. For example, the recommended export interval for Stackdriver is 60s and if you have a high-cardinality label you could easily get too many values in 60s. I would propose looking into a solution that puts an absolute limit on tag cardinality per view - for example, 10k different tag combinations - and then starts dropping that view and logs an error after that limit is hit.

Whichever way we go I think this is going to be an issue for other OpenCensus libraries as well. We probably need a general policy on how to handle this. @bogdandrutu if you agree would you mind transferring this to opencensus-specs (the new "transfer this issue" feature requires Admin on both projects).

@Thooms
Copy link
Author

Thooms commented Nov 6, 2018

Thanks for your quick answer!

I agree, one should avoid using high-cardinality labels as a rule (that said, it allowed us to spot the issue). We mitigated on our side by removing the problematic tag. As a suggestion on this point, I think it should be stressed a bit more in the documentation :)

Apart from that cardinality issue, I still think that not garbage-collecting the data somehow is an issue: even with a low-cardinality tag (for instance, routes in a webserver), one'll end up sending data even if no measure was made for a while. Potential implications I see are network unnecessary usage as well as unnecessary spendings, depending on the backend (we use Stackdriver for instance).

What do you think?

@semistrict
Copy link
Contributor

I definitely agree that it's an important issue that we need to discuss. I cannot move issues to opencensus-specs because I am not an admin on that repo, but I think that is the best place to have this discussion since it affects more than just the Go library.

@rghetia
Copy link
Contributor

rghetia commented Mar 27, 2019

opened a new issue in spec repo.

@rghetia rghetia closed this as completed Mar 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants