Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function invocation incorrect for multiple namespaces #1413

Closed
3 of 5 tasks
karuppiah7890 opened this issue Dec 21, 2019 · 4 comments · Fixed by #1590
Closed
3 of 5 tasks

Function invocation incorrect for multiple namespaces #1413

karuppiah7890 opened this issue Dec 21, 2019 · 4 comments · Fixed by #1590

Comments

@karuppiah7890
Copy link

karuppiah7890 commented Dec 21, 2019

My actions before raising this issue

Expected Behaviour

I have wordcount function in openfaas-fn and in another-ns namespaces

I invoke the wordcount function multiple times in both namespaces in my web UI portal. The invocation count in the web UI shows the correct number of invocations in both namespaces. And the CLI also shows same correct count as the web UI portal when function is invoked from web UI portal. When CLI invokes the function using $ faas invoke wordcount or using $ faas invoke wordcount -n <namespace>, the count is shown correctly in both CLI and web UI portal

Current Behaviour

I invoked the wordcount function multiple times in both namespaces in my web UI portal. But the invocation count in the web UI doesn't increase, it shows the same count, in both namespaces. And the CLI also shows same count as the web UI portal when invoked from web UI portal. But when CLI invokes the function using $ faas invoke wordcount , the count increases in both CLI and web UI portal

Steps to Reproduce (for bugs)

  1. install openfaas in k8s with k3sup. this installs with openfaas-fn as the default namespace for functions. login to the gateway in the cli

  2. deploy functions in openfaas-fn using this

$ faas deploy -f https://raw.githubusercontent.com/openfaas/faas/master/stack.yml
  1. add another namespace in k8s
$ kubectl create ns another-ns
$ kubectl annotate namespace/another-ns openfaas="1"
$ # check namespaces list
$ faas namespaces
  1. deploy wordcount function in the new namespace using the below yaml file
# stack.yml
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080  # can be a remote server

functions:
  wordcount:
    lang: dockerfile
    image: functions/alpine:latest
    fprocess: "wc"
    skip_build: true
    namespace: another-ns
$ faas deploy -f stack.yml
  1. Go to the web UI portal, invoke the function in openfaas-fn namespace and in another-ns namespace

You will notice that the count doesn't increase in the UI, and it doesn't increase in the CLI too, when doing $ faas list or $ faas list -n openfaas-fn or $ faas list -n another-ns

  1. Invoke the function using the CLI
$ faas invoke wordcount
...
$ faas list
...

Also check web UI portal. the count shows up now, increases, but is the number of CLI invocations

  1. Invoke the function using the CLI but with namespace flag. You will notice count doesn't increase and is wrong
$ faas invoke wordcount -n openfaas-fn
...
$ faas invoke wordcount -n another-ns
...
$ faas list
$ faas list -n openfaas-fn
$ faas list -n another-ns
...

Context

I was just trying out openfaas. Noticed that my invocation count shows up wrong

Your Environment

  • FaaS-CLI version ( Full output from: faas-cli version ):
CLI:
 commit:  73004c23e5a4d3fdb7352f953247473477477a64
 version: 0.11.3

Gateway
 uri:     http://127.0.0.1:8080
 version: 0.18.7
 sha:     59b7839236098820e73ed25301258b722c3d33e4
 commit:  Change how and when we fetch and parse namespace info


Provider
 name:          faas-netes
 orchestration: kubernetes
 version:       0.9.15
 sha:           41c33f9f7c29e8276bd01387f78d6f0cff847890
  • Docker version docker version (e.g. Docker 17.0.05 ):
# minikube vm docker
Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea838
 Built:             Wed Nov 13 07:22:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.12
  Git commit:       633a0ea838
  Built:            Wed Nov 13 07:28:45 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          commit: d736ef14f0288d6993a1845745d6756cfc9ddd5a
  GitCommit:
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Kubernetes

  • Operating System and version (e.g. Linux, Windows, MacOS):
    MacOS Catalina. v 10.15.2

  • Code example or link to GitHub repo or gist to reproduce problem:
    Provided it all above 😄

  • Other diagnostic information / logs from troubleshooting guide

Initially I kind of assumed that the invocation count comes from prometheus, later it turned out to be the truth. So, I was seeing Grafana and Prometheus and seeing the metrics for invocation count. This is what Prometheus shows when I try what I have described above:

prometheus metrics

You can see below how the CLI shows the number as 6 for wordcount function, in both namespaces. My default is openfaas-fn when nothing is provided

Screen Shot 2019-12-21 at 9 28 46 PM

In prometheus, the metric which has the value 6 is this:

gateway_function_invocation_total{app="gateway",code="200",function_name="wordcount",instance="172.17.0.17:8082",job="kubernetes-pods",kubernetes_namespace="openfaas",kubernetes_pod_name="gateway-6c94b87f84-xhqzb",pod_template_hash="6c94b87f84"}	6

If you notice the function_name label value, it's wordcount. This is the count for the number of invocations from the CLI with $ faas invoke wordcount

And there are two other metrics with different values, which relate to wordcount function

gateway_function_invocation_total{app="gateway",code="200",function_name="wordcount.another-ns",instance="172.17.0.17:8082",job="kubernetes-pods",kubernetes_namespace="openfaas",kubernetes_pod_name="gateway-6c94b87f84-xhqzb",pod_template_hash="6c94b87f84"}	11
gateway_function_invocation_total{app="gateway",code="200",function_name="wordcount.openfaas-fn",instance="172.17.0.17:8082",job="kubernetes-pods",kubernetes_namespace="openfaas",kubernetes_pod_name="gateway-6c94b87f84-xhqzb",pod_template_hash="6c94b87f84"}	22

Notice the function_name label values, it's wordcount.another-ns and wordcount.openfaas-fn. This is the count of invocations that happened when I invoked in web UI.

But it doesn't show up as that though, it shows up as 6

Screen Shot 2019-12-21 at 9 34 11 PM

Screen Shot 2019-12-21 at 9 34 00 PM

On checking a bit of code for how the metrics comes, some assumptions and intuitions based on input and output and how it's all related and connecting the dots, this is what I can say:

The key difference is how the request goes to the gateway. When I do a CLI invocation

$ faas invoke wordcount

gateway log is like

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:08:10 Forwarded [POST] to /function/wordcount - [200] - 0.019437 seconds

and for following invocations

$ faas invoke wordcount -n openfaas-fn

or invoking in web UI portal in openfaas-fn namespace

the gateway log is like

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:07:04 Forwarded [POST] to /function/wordcount.openfaas-fn - [200] - 0.033784 seconds

for another-ns

$ faas invoke wordcount -n another-ns

or invoking in web UI portal in another-ns namespace

the gateway log is like

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:10:22 Forwarded [POST] to /function/wordcount.another-ns - [200] - 0.017615 seconds

The invocation count is shown using the response from the gateway for list functions API. Checking the code, gateway uses the following code to find the invocation count using the prometheus metrics data

faasHandlers.ListFunctions = metrics.AddMetricsHandler(faasHandlers.ListFunctions, prometheusQuery)

expr := url.QueryEscape(`sum(gateway_function_invocation_total{function_name=~".*", code=~".*"}) by (function_name, code)`)
// expr := "sum(gateway_function_invocation_total%7Bfunction_name%3D~%22.*%22%2C+code%3D~%22.*%22%7D)+by+(function_name%2C+code)"
results, fetchErr := prometheusQuery.Fetch(expr)

mixIn(&functions, results)

if v.Metric.FunctionName == function.Name {

You can see how the metric label's function name and the function name from the provider (? not sure about the term 😅) are matched, without considering namespace. So, seeing the above prometheus metrics data, naturally the value 6 will come, no matter what namespace the user is looking at in CLI or web ui portal.

So, that's one issue, reading of the invocations count data. I think to fix it - just adding the namespace along with name like <name>.<namespace> should work. And a test for it too!

Next issue is, how did the wrong data even get into prometheus in the first place? There are three sets of invocation counts, but only two namespaces. The metric with label function_name=wordcount is not a correct one, there should always be a namespace to be specific about which namespace the count refers to, even though if there's just one. I guess I'm right about this, considering every function Must be in a namespace and it's very clear that multiple namespaces is supported, namespace must be part of the label value. Do correct if I'm missing something 😅

And checking the code, the gateway is what exposes the metrics at port 8082. And prometheus scrapes these metrics.

Looking at the metrics data, the metrics seem to be right for web UI portal invocations and for CLI invocations with namespace flag, except for the one with function_name=wordcount label, which got created from CLI invocations without namespace flag. This is how the gateway logs look for such a case

gateway-6c94b87f84-xhqzb gateway 2019/12/21 16:08:10 Forwarded [POST] to /function/wordcount - [200] - 0.019437 seconds

Now why does this log matter? The url path in this matters, which is /function/wordcount

My guess based on the code - when requests are made, they go through these parts of the code

faas/gateway/server.go

Lines 204 to 206 in 03dc882

r.HandleFunc("/function/{name:["+NameExpression+"]+}", functionProxy)
r.HandleFunc("/function/{name:["+NameExpression+"]+}/", functionProxy)
r.HandleFunc("/function/{name:["+NameExpression+"]+}/{params:.*}", functionProxy)

faasHandlers.Proxy = handlers.MakeForwardingProxyHandler(reverseProxy, functionNotifiers, functionURLResolver, functionURLTransformer, nil)

functionNotifiers is here and has prometheus in it

functionNotifiers := []handlers.HTTPNotifier{loggingNotifier, prometheusNotifier}

And the notify call is made here

notifier.Notify(r.Method, requestURL, originalURL, statusCode, seconds)

And for prometheus, the implementation is here

func (p PrometheusFunctionNotifier) Notify(method string, URL string, originalURL string, statusCode int, duration time.Duration) {

and this is where the service name is obtained

serviceName := getServiceName(originalURL)

serviceName = matches[nameIndex]

And here is where the metric is created for prometheus to scrape

p.Metrics.GatewayFunctionInvocation.
With(prometheus.Labels{"function_name": serviceName, "code": code}).
Inc()

This is all good, but I think the service name will be wordcount when url is /function/wordcount, but it will be wordcount.openfaas-fn when url is /function/wordcount.openfaas-fn, and so metric also will be wrong.

The following is speculation - Have to check CLI code for this.

To fix this, I think the CLI has to make calls with namespace in the url path like /function/wordcount.openfaas-fn if openfaas-fn is the default namespace. I think this is not happening, but it still works because behind the scenes, even with /function/wordcount, as the default namespace function is taken up somehow and it all works, but to make the invocation count work, we might have to pull in namespaces and use the first one as the default according to this idea and then use that for the request

I'll check CLI code next to understand better 😄 and also check web UI portal code, and then post more here about my findings

@alexellis
Copy link
Member

@LucasRoesler @Waterdrips PTAL

@alexellis
Copy link
Member

/set title: Function invocation incorrect for multiple namespaces

@derek derek bot changed the title Function invocation count not correct Function invocation incorrect for multiple namespaces Dec 22, 2019
@LucasRoesler
Copy link
Member

Hi @karuppiah7890 i also noticed this the other day and was thinking about the correct fix. I think the most direct way top to fix this can be found in the scaling handler

functionName, namespace := getNamespace(defaultNamespace, getServiceName(r.URL.String()))

We should pass the default namespace to the PrometheusNotifier struct

faas/gateway/server.go

Lines 75 to 77 in 238ce1b

prometheusNotifier := handlers.PrometheusFunctionNotifier{
Metrics: &metricsOptions,
}
so that it knows what the default namespace is. Similar to how we do it here
FunctionNamespace: config.Namespace,

And then using the same methods as in the scaling handler, we can then set the metric correctly.

Additionally, the AddMetricsHandler will need to be updated, once we are consistently setting the namespace. We either need to add the namespace to prometheus query

expr := url.QueryEscape(`sum(gateway_function_invocation_total{function_name=~".*", code=~".*"}) by (function_name, code)`)
// expr := "sum(gateway_function_invocation_total%7Bfunction_name%3D~%22.*%22%2C+code%3D~%22.*%22%7D)+by+(function_name%2C+code)"
results, fetchErr := prometheusQuery.Fetch(expr)
or we need to pass the namespace to the mixing
func mixIn(functions *[]types.FunctionStatus, metrics *VectorQueryResponse) {
so it can filter the metrics for the requested namespace.

The namespace would need to be selected from the GET parameters and looks like this https://github.com/openfaas/faas-netes/blob/7a645a75749a130da130fc8fc77884e712fbac5e/handlers/reader.go#L25-L31 This of course requires constructing the handler so that it knows the default namespace, passing it here

func AddMetricsHandler(handler http.HandlerFunc, prometheusQuery PrometheusQueryFetcher) http.HandlerFunc {

The two changes should fix the issue of saving and the selecting the correct metric per namespace, including the default namespace. I think this is the shortest and easiest way to fix the behavior. Long term, we might also want to consider adding the namespace to the prometheus labels.

In summary:

  1. pass the default namespace to PrometheusNotifier
  2. pass the default namespace to AddMetricsHandler
  3. During notify, ensure that the function name is formatted as <function name>.<namespace>, inserting the default namespace as needed
  4. during the metrics mixin, make sure that we filter the metrics correctly by reading the requested namespace from the GET parameters and then filtering thee query/response from prometheus

@alexellis
Copy link
Member

alexellis commented Jul 21, 2020

A fix is being worked on by @viveksyngh in #1488 - I'm not sure if it follows the above 1:1, but we are reviewing it.

Waterdrips added a commit to Waterdrips/faas that referenced this issue Nov 11, 2020
Fixes openfaas#1413
Fixes openfaas/faas-netes#707

This adds both namespace and non-namespace scoped counts of invocation
to metric agregation and fixes the Gateway UI not updating the
invocation count automatically without a page reload.

Signed-off-by: Alistair Hey <[email protected]>
Waterdrips added a commit to Waterdrips/faas that referenced this issue Nov 11, 2020
Fixes openfaas#1413
Fixes openfaas/faas-netes#707

This adds both namespace and non-namespace scoped counts of invocation
to metric agregation and fixes the Gateway UI not updating the
invocation count automatically without a page reload.

Signed-off-by: Alistair Hey <[email protected]>
Waterdrips added a commit to Waterdrips/faas that referenced this issue Nov 11, 2020
Fixes openfaas#1413
Fixes openfaas/faas-netes#707

This adds both namespace and non-namespace scoped counts of invocation
to metric agregation and fixes the Gateway UI not updating the
invocation count automatically without a page reload.

Signed-off-by: Alistair Hey <[email protected]>
Waterdrips added a commit to Waterdrips/faas that referenced this issue Nov 11, 2020
Fixes openfaas#1413
Fixes openfaas/faas-netes#707

This fixes the Gateway UI not updating the
invocation count automatically without a page reload.

Tested by deploying on a local cluster and making sure invocations go up
with and without namespace suffix

Signed-off-by: Alistair Hey <[email protected]>
alexellis pushed a commit that referenced this issue Nov 17, 2020
Fixes #1413
Fixes openfaas/faas-netes#707

This fixes the Gateway UI not updating the
invocation count automatically without a page reload.

Tested by deploying on a local cluster and making sure invocations go up
with and without namespace suffix

Signed-off-by: Alistair Hey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants