Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record metrics for invocations when they start #1411

Merged
merged 3 commits into from
Mar 12, 2020

Conversation

alexellis
Copy link
Member

Signed-off-by: Alex Ellis (OpenFaaS Ltd) [email protected]

Description

Record metrics for invocations when they start

Motivation and Context

  • This experimental patch records metrics as invocations start so that the metrics can be used to make better scale to zero decisions in faas-idler.

How Has This Been Tested?

Tested with Kubernetes on a single-node cluster, metrics
reported as expected. Existing metrics still report.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

gateway/metrics/metrics.go Show resolved Hide resolved
Inc()
} else if event == "started" {
serviceName := getServiceName(originalURL)
p.Metrics.StartedCounter.WithLabelValues(serviceName).Inc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if the metric name was symmetric with the metric that capture the end of the request GatewayFunctionInvocation and GatewayFunctionInvocationStarted or using FunctionInvocationCompleted and FunctionInvocationStarted to be even more precise

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already capture the end of the request using the existing metric.

What do you suggest? Renaming would be a breaking change, recording double the information would be hideous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am specifically talking about the variable name, Metrics.StartedCounter isn't very descriptive. It is entirely possible we will want other started counters later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give me a suggestion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gateway_function_invocation_started?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the name of the Go variable and my suggestion was/is GatewayFunctionInvocationStarted or FunctionInvocationStarted, so that this code would look like

p.Metrics.FunctionInvocationStarted.WithLabelValues(serviceName).Inc()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated as per that suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff still looks like this

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see what you mean now. I haven't changed that internal variable, I'll do it next.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and retrofitted into the other counter

func (p PrometheusFunctionNotifier) Notify(method string, URL string, originalURL string, statusCode int, duration time.Duration) {
seconds := duration.Seconds()
serviceName := getServiceName(originalURL)
func (p PrometheusFunctionNotifier) Notify(method string, URL string, originalURL string, statusCode int, event string, duration time.Duration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an enum for the event instead of an arbitrary string? it will be more type safe, easier to read in the docs and IDEs, and we can avoid bugs to do typos. Especially if we pair it with switch below on the enum constants.

type FunctionEvent int
const (
   FunctionStarted = iota
   FunctionEnded
)
func (d FunctionEvent) String() string {
    return [...]string{"started", "ended"}[d]
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree. This is an experimental PR to see if we can solve at least one of the issues for the user who is in production experiencing this problem today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

¯_(ツ)_/¯ if you are happy with it, then i guess we can leave it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable enhancement

@alexellis
Copy link
Member Author

alexellis commented Dec 19, 2019

Thanks for looking at the metrics. Happy to do polishing after we get to something that works.

This goes with https://github.com/openfaas-incubator/faas-idler/pull/35

Copy link
Member Author

@alexellis alexellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments

@alexellis
Copy link
Member Author

@stefanprodan @LucasRoesler I've added comments for this, also there was interest from a large enterprise user in this feature to know when long-running requests are started.

@StorkZ
Copy link

StorkZ commented Mar 9, 2020

@alexellis @stefanprodan @LucasRoesler
Hi guys! This fix would help me a lot. I can see that the progress has stalled for 2 weeks now.
Let me know if there is something I can do to help.

@alexellis
Copy link
Member Author

Hi @StorkZ,

Can you introduce who you are, what company you would be using this for and why you feel it's important?

Why do you think that "progress has stalled"? Perhaps this isn't a priority for our roadmap?

Alex

@StorkZ
Copy link

StorkZ commented Mar 9, 2020

I work at a saas startup called Kadromierz. We use this project to run exports for clients, that often are required only once a month. We use faas-idler to salce them down to 0 instances. The only problem is that there is a chance, that instance will be scaled to 0 before the function even ran. Currently we handle this by retrying given function call, but it is not a final solution.

I didn't mean it to sound like I'm hastening you. I'm very grateful for this project and if there is something I can do to push this forward, then just let me know.

@alexellis
Copy link
Member Author

I'm glad that OpenFaaS is useful to your business, thank you for giving some context. Perhaps you could start off with sending your use-case to the ADOPTERS file? #776

OpenFaaS is provided for free, features can be prioritized for clients who have a support relationship, otherwise it depends on the roadmap and what little time I can donate to the project.

I'd suggest reading up on how the project works here: Setting expectations, support and SLAs

In the meantime, I'll note that you are interested in this feature. Would you be willing to help with testing?

@StorkZ
Copy link

StorkZ commented Mar 9, 2020

Thanks for being patient with me.
If there is any testing needed just let me know.

@alexellis
Copy link
Member Author

Can you outline a scenario that you are running / running into that we can use to reproduce the behaviour and then test a fix, if we proceed with this patch?

@alexellis
Copy link
Member Author

@StorkZ

@alexellis alexellis force-pushed the openfaasltd/metrics-started-calls branch from c0d9233 to 6f2edb5 Compare March 12, 2020 10:48
* This experimental patch records metrics as invocations start
so that the metrics can be used to make better scale to zero
decisions in faas-idler.

Tested with Kubernetes on a single-node cluster, metrics
reported as expected. Existing metrics still report.

Signed-off-by: Alex Ellis (OpenFaaS Ltd) <[email protected]>
@alexellis alexellis force-pushed the openfaasltd/metrics-started-calls branch from 6f2edb5 to b7f09c3 Compare March 12, 2020 10:49
@alexellis
Copy link
Member Author

Squashing changes from review into a single commit.

This should be a "no-harm" change to use the namespace/sub-
system definition for CounterOpts.

Signed-off-by: Alex Ellis (OpenFaaS Ltd) <[email protected]>
@alexellis alexellis merged commit 37bf15e into master Mar 12, 2020
@alexellis alexellis deleted the openfaasltd/metrics-started-calls branch March 12, 2020 13:58
@alexellis
Copy link
Member Author

@StorkZ there are testing instructions and images for this and faasIdler shared in #contributors on Slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants