Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Memory Calculator parameters at runtime #425

Open
michael-arndt-gcx opened this issue Sep 25, 2024 · 8 comments
Open

Provide Memory Calculator parameters at runtime #425

michael-arndt-gcx opened this issue Sep 25, 2024 · 8 comments

Comments

@michael-arndt-gcx
Copy link

The different parameters of the Memory Calculator, like the thread count, should be available at runtime.

Describe the Enhancement

All parameters, at least LoadedClassCount and ThreadCount are (optionally) made available at runtime.

Possible Solution

Each parameter has a corresponding environment variable, e.g. MEMORY_CALCULATOR_THREAD_COUNT

Motivation

We have multiple cases of out of memory exceptions now, usually due to more loaded classes than anticipated by the Memory Calculator, and we'd like to create alerts for each of the parameters, so we can avoid that. Some of the values the Memory Calculator uses are not available at runtime nor can be derived. Monitoring each of the parameters makes the alert rules clearer, and in case a threshold has been hit, the resulting alert will be clearer.

@michael-arndt-gcx
Copy link
Author

After sleeping over it, I realized that BPL_JVM_THREAD_COUNT, BPI_JVM_CLASS_COUNT as well as JAVA_TOOL_OPTIONS are available as environment variables at runtime.

@dmikusa
Copy link
Contributor

dmikusa commented Sep 26, 2024

If that works, then fantastic, otherwise feel free to reach back out. I'm not opposed to exposing more information. I think I'd want to get some more context about the information you want to gather and how you're wanting to gathering it (mostly before app starts or as app is running and if it's in the JVM process or external).

Thanks

@michael-arndt-gcx
Copy link
Author

michael-arndt-gcx commented Sep 26, 2024

Thanks for your answer! We're using it with Spring Boot and micrometer, effectively becoming prometheus metrics. I think the ideal solution depends on the stack here. In our case the metric must be repeated, because the prometheus alerts work on a time window – if there was no metric reported for a too long time, the alert would trigger, so reporting only at startup is not enough. We've added the metric by means of a custom Spring Boot autoconfiguration, but I've asked if they'd like this added generally, as I believe we're not the only once who wish to have these alerts.

But again, if your stack is not Spring Boot + Micrometer + Prometheus + Prometheus Alerts, the best way will probably be very different, so I don't think it can be solved further at a Buildpack level, at least not a general-purpose JVM Buildpack.

spring-projects/spring-boot#42458

@michael-arndt-gcx
Copy link
Author

After working a bit more on this, I realized that my assumtion was was not completely correct.
I started with alerting on the thread count, which is always available as environment variable BPL_JVM_THREAD_COUNT; however, the number of classes isn't available. There is BPI_JVM_CLASS_COUNT, which I assumed to be the number of classes used for the calculation, but it's only the number of classes from the JVM (if I understand the code correctly).
To get the final value, the formula is

totalClasses := float64(jvmClassCount+appClassCount+agentClassCount+staticAdjustment) * (float64(adjustmentFactor) / 100.0)
where only jvmClassCount is known through BPI_JVM_CLASS_COUNT.

Since all of this could be overridden by setting BPL_JVM_LOADED_CLASS_COUNT, I suggest to provide the final result totalClasses as layer.LaunchEnvironment["BPL_JVM_LOADED_CLASS_COUNT.default"].

As a consumer of the build pack this would in my opinion mirror the behavior of BPL_JVM_THREAD_COUNT: the used value is always available through the environment variable.

@dmikusa
Copy link
Contributor

dmikusa commented Oct 8, 2024

Since all of this could be overridden by setting BPL_JVM_LOADED_CLASS_COUNT, I suggest to provide the final result totalClasses as layer.LaunchEnvironment["BPL_JVM_LOADED_CLASS_COUNT.default"].

We can totally add this.

My only reservation is in how we expose this information. We have a convention that BPL_* env variables are input to the launch process. I'd rather not have output from buildpacks/launch process be put into env variables that follow that same convention.

I hate to make this a bigger deal than it is, but I also think we should pause to consider how this might scale out and be used by all of our buildpacks too.

Ideas that come to mind:

  1. New prefix, like BPO_ (i.e. buildpack output) or BPM_ (i.e. buildpack metrics). This is simple, but requires one be inside the container to see them (either the app that's running or exec into the container and invoke the launcher).

  2. Write a bunch of metrics to a file location, probably under /tmp since that is the only place we are guaranteed to be able to write to, and print them in some easily consumable or standard format (i.e. custom JSON/TOML or OpenMetrics). This could potentially scale better than env variables and provide a format that's easy to consume, but would still require access to the container file system or that we write to a volume mount (not always possible).

  3. Write in some structured format to STDOUT when the buildpack/launcher runs. This would be the easiest to consume, as it's just part of the existing logging. We'd have to structure the output in a specific way such that it can be easily picked out of the other output, maybe with a common prefix, like METRIC: and we'd need a way to turn this on/off, since not everyone might want to see this output.

I'm open to other ideas as well.

@dmikusa
Copy link
Contributor

dmikusa commented Oct 8, 2024

I see there's also an upstream RFC that aims to standardize how buildpacks do some metrics collection, although it seems focused on build-time only, not runtime. https://github.com/buildpacks/rfcs/blob/main/text/0131-build-observability.md

@michael-arndt-gcx
Copy link
Author

For our usecase the options are already ordered by preference. While 3 isn't completely unfeasible, it creates complications. It was actually our first attempt: We use loki for our logging and prometheus alerts, and used loki to parse the current output and turn it to a metric. However, the problem is that this metric would only be refreshed when the application starts, but the prometheus alerts work on a time frame – once the metric is too old, it's effectively absent. Also parsing our GBs of logs to get this metric is quite a few orders of magnitude more resource-consuming than the others.

2 is interesting, but has the downsides you mentioned. It is a better match for the references Buildpack Observability RFC though. However, the number of classes used for the calculator is a static value, not a trace.

I like the BPO_ prefix. I don't think the value itself is a metric. It may be turned into a metric down the line, but that's up to the consumers, so I wouldn't go for BPM_.

PS: I was not aware that the I in BPI_ is for "internal only" and shouldn't be used by consumers. I found this discussion https://github.com/orgs/paketo-buildpacks/discussions/38, and I agree that the specific values should not be documented, but I think mentioning the existence and semantics of this prefix is a good idea. If nothing else to avoid confusion and collisions.

@michael-arndt-gcx
Copy link
Author

In case anyone comes across this issue, our workaround for now is to set BPL_JVM_LOADED_CLASS_COUNT for each deployment and then publish that environment variable's value as metric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants