Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: (Bug) correct resultType when storing instant query results in cache #12312

Merged
merged 4 commits into from
Mar 31, 2024

Conversation

kavirajk
Copy link
Contributor

@kavirajk kavirajk commented Mar 22, 2024

What this PR does / why we need it:
tldr; Previously the results cache for instant metric query is stored as wrong type TypeMatrix so some of the results were ignored during final computation. This PR fixes the correct type TypeVector

Before

]-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[1h]))' -q
[
  {
    "metric": {},
    "value": [
      1711117293.063,
      "0.5336111111111111"
    ]
  }
]-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[1h]))' -q
[
  {
    "metric": {},
    "value": [
      1711117295.905,
      "0.19944444444444445"
    ]
  }
]-bash-5.2$

First query will miss the cache and store it in the cache after downstream requests. Second request fetch from the cache. NOTE both are are same (bug)

After

-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[1h]))' -q
[
  {
    "metric": {},
    "value": [
      1711117478.337,
      "0.5011111111111111"
    ]
  }
]-bash-5.2$
-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[1h]))' -q
[
  {
    "metric": {},
    "value": [
      1711117484.699,
      "0.5011111111111111"
    ]
  }
]-bash-5.2$

NOTE: both first (without cache) and second (with cache) have same results

Detail explanation

The bug is the assumption of existing ResultCache that it always works with range queries everytime it has to store LokiPromResponse in the cache. But in reality we share the same response for both instant and range queries. With adding support for instant queries recently.

But I failed to update the case where ResultsCache whenever trying to merge the multiple cache entries (we call Extent) it create new bigger response type (LokiPromResponse) with resultType set to matrix (assuming only range queries uses it)

So when the cache is hit and these values are returned for computing the merged responses, it considers only samples with resultType vector (because it's instant query). That's the reason, when original query split into multiple subqueries and some of those subqueries even though hit cache, it's value is ignored in the final computation. Hence wrong results.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

…cache.

Previously the results cache for instant metric query is stored as wrong type `TypeMatrix` so
some of the results were ignored during final computation. This PR fixes the correct type `TypeVector`

Signed-off-by: Kaviraj <[email protected]>
@kavirajk kavirajk requested a review from a team as a code owner March 22, 2024 09:22
@kavirajk kavirajk marked this pull request as draft March 22, 2024 09:22
@kavirajk kavirajk changed the title [WIP] bugfix: Fix correct resultType when storing instant query results in cache fix: correct resultType when storing instant query results in cache Mar 22, 2024
@kavirajk kavirajk changed the title fix: correct resultType when storing instant query results in cache fix: (Bug) correct resultType when storing instant query results in cache Mar 22, 2024
@pull-request-size pull-request-size bot added size/L and removed size/M labels Mar 26, 2024
@kavirajk kavirajk marked this pull request as ready for review March 26, 2024 10:00
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kavirajk kavirajk merged commit 7480468 into main Mar 31, 2024
10 checks passed
@kavirajk kavirajk deleted the kavirajk/instant-query-cache-bug-fix branch March 31, 2024 16:00
trevorwhitney added a commit that referenced this pull request Apr 1, 2024
commit 018856c
Author: Callum Styan <[email protected]>
Date:   Mon Apr 1 06:40:16 2024 -0700

    fix: fix span logging based on changes to request types timestamps (#12393)

    Signed-off-by: Callum Styan <[email protected]>

commit 5190dda
Author: Shantanu Alshi <[email protected]>
Date:   Mon Apr 1 18:30:21 2024 +0530

    feat(detected_labels): Initial skeleton for the API (#12390)

    Co-authored-by: Cyril Tovena <[email protected]>

commit 0b7ff48
Author: Sandeep Sukhani <[email protected]>
Date:   Mon Apr 1 14:21:50 2024 +0530

    chore: delete request processing improvements (#12259)

commit a509871
Author: Ed Welch <[email protected]>
Date:   Sun Mar 31 22:14:21 2024 -0400

    chore: remove experimental flags for l2 cache and memcached "addresses" config (#12410)

commit 7480468
Author: Kaviraj Kanagaraj <[email protected]>
Date:   Sun Mar 31 18:00:53 2024 +0200

    fix: (Bug) correct resultType when storing instant query results in cache (#12312)

    Signed-off-by: Kaviraj <[email protected]>

commit 246623f
Author: Trevor Whitney <[email protected]>
Date:   Fri Mar 29 17:05:36 2024 -0600

    fix(detected_fields): fix issues with frontend integration (#12406)

    This PRs fixes issues we found when integrating with the frontend
    * the `/experimental` api made it difficult to interact with using the existing datasource, so move to `v1/detected_fields`
    * the config flag was considered cumbersome as the only potential negative impact of the endpoint is when it is used, and nothing is currently using it
    * the use of an enum in the protobuf produced unexpected results in the json, so type was converted to string
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants