feat: Support split align and caching for instant metric query results #11814

kavirajk · 2024-01-29T08:34:31Z

What this PR does / why we need it:
Follow up to metadata results caching, this PR adds support for instant metric queries. It also adds support to enable aligning the subquery for more reusability.

Config changes:

cache_instant_metric_results - Enable/disable (default disable) - Boolean
instant_metric_results_cache - CacheConfig to tweak (usually not needed, have sane defaults)
instant_metric_query_split_align - Enable/disable (default disable) - Boolean

How it works (without split align)

Consider following query
Query: sum(rate({foo="bar"}[3h])) @ 12:34:00
SplitInterval: 1h

So we need results from 09:34:00 to 12:34:00. (3h total)

Currently, After range mapper, it splits into

sum(rate({foo="bar"}[1h])) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 1h)) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 2h)) @ 12:34:00

Even if we remove the offset it turns into

sum(rate({foo="bar"}[1h])) @ 12:34:00
sum(rate({foo="bar"}[1h])) @ 11:34:00
sum(rate({foo="bar"}[1h])) @ 10:34:00

But the problem is now eval time is not aligned. And it's mostly unlikely these subqueries are reused.

How it works (with split align)

Now consider the same exact query
Query: sum(rate({foo="bar"}[3h])) @ 12:34:00
SplitInterval: 1h

After range mapper, it splits into

sum(rate({foo="bar"}[34m])) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 34m)) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 1h 34m)) @ 12:34:00
sum(rate({foo="bar"}[26m] offset 2h 34m)) @ 12:34:00

And after removing the offset it tuns into (properly eval time aligned)

sum(rate({foo="bar"}[34m])) @ 12:34:00
sum(rate({foo="bar"}[1h])) @ 12:00:00
sum(rate({foo="bar"}[1h])) @ 11:00:00
sum(rate({foo="bar"}[26m])) @ 10:00:00

Now we have (2) and (3) subqueries properly aligned and highly likely be reused.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
Will have some follow up PRs (tried doing it in single PR, but turns out really hard to review and big change) to

Refactor configs to unify and simplify all the results cache configs
Refactor results cache metrics, to avoid lots of duplicates
Simplify some protobuf definitions (particularly stats.proto)

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Signed-off-by: Kaviraj <[email protected]>

Fix test cases that failed with this changes Signed-off-by: Kaviraj <[email protected]>

Signed-off-by: Kaviraj <[email protected]>

pkg/querier/queryrange/downstreamer.go

pkg/logql/rangemapper.go

1. Update both start and end when removing offset 2. Unify subqueries generation in splitalign method Signed-off-by: Kaviraj <[email protected]>

Signed-off-by: Kaviraj <[email protected]>

pkg/validation/limits.go

docs/sources/configure/_index.md

pkg/querier/queryrange/instant_metric_cache.go

Signed-off-by: Kaviraj <[email protected]>

dannykopping

Love the detailed description and lots of tests!
Added a few comments, mostly nits. Good job Kavi!

docs/sources/configure/_index.md

pkg/logql/rangemapper.go

pkg/logql/rangemapper_test.go

pkg/logqlmodel/stats/stats.proto

pkg/querier/queryrange/codec.go

pkg/querier/queryrange/instant_metric_cache.go

docs/sources/configure/_index.md

pkg/querier/queryrange/roundtrip.go

pkg/logql/rangemapper.go

ashwanthgoli · 2024-02-16T14:57:31Z

I need to do another pass to review the tests, rest lgtm. Nice one @kavirajk ❤️

Signed-off-by: Kaviraj <[email protected]>

dannykopping

LGTM!

Signed-off-by: Kaviraj <[email protected]>

kavirajk · 2024-02-20T07:34:33Z

Caching and split align in action

Do instant query for 3h range (first time)

-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[3h]))'
2024/02/20 08:18:37 http://localhost:3100/loki/api/v1/query?direction=BACKWARD&limit=30&query=sum%28rate%28%7Bjob%3D%22varlogs%22%7D%5B3h%5D%29%29&time=1708413517893995000
[
  {
    "metric": {},
    "value": [
      1708413517.893,
      "0.21814814814814815"
    ]
  }

How the query is split and cache reqs and cache hits (these logs lines are from metrics.go and engine.go on query-frontend and queriers)

Quey-frontend (saying 4 requests were actually made for cache, and none of those got hit)

latency=fast query="sum(rate({job=\"varlogs\"}[3h]))" query_hash=1737987035 cache_result_req=4 cache_result_hit=0

How the above query got split and run

msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[1h]))" 
msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[1h]))"
msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[41m22s106ms]))" msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[18m37s893ms]))"

Do the instant query for 3h range (second time)

-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[3h]))'
2024/02/20 08:19:11 http://localhost:3100/loki/api/v1/query?direction=BACKWARD&limit=30&query=sum%28rate%28%7Bjob%3D%22varlogs%22%7D%5B3h%5D%29%29&time=1708413551869792000
[
  {
    "metric": {},
    "value": [
      1708413551.869,
      "0.02935185185185185"
    ]
  }

How the query is split and cache reqs and cache hits (these logs lines are from metrics.go and engine.go on query-frontend and queriers)

Quey-frontend (saying 4 requests were actually made for cache, 2 of those got hit)

latency=fast query="sum(rate({job=\"varlogs\"}[3h]))" query_hash=1737987035 cache_result_req=4 cache_result_hit=2

How the above query got split and run (only two queries that miss cache hit, got run this time)

msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[19m11s869ms]))" 
msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[40m48s130ms]))"

Signed-off-by: Kaviraj <[email protected]>

Use right split duration (new InstantSplitDuration) for instant queries Signed-off-by: Kaviraj <[email protected]>

grafana#11814) Signed-off-by: Kaviraj <[email protected]>

kavirajk added 4 commits January 3, 2024 14:30

feat(caching): Support caching on instant metric queries results

5ba4fa4

Signed-off-by: Kaviraj <[email protected]>

Merge branch 'main' into kavirajk/cache-instant-queries2

4d64df4

Signed-off-by: Kaviraj <[email protected]>

integrate the basic middleware

5ea7700

Signed-off-by: Kaviraj <[email protected]>

fixing overrides

c39b68d

Signed-off-by: Kaviraj <[email protected]>

pull-request-size bot added the size/L label Jan 29, 2024

kavirajk added 3 commits January 30, 2024 08:25

idk

e4fbe8f

Signed-off-by: Kaviraj <[email protected]>

Tweak sub queries without offset before caching

27dcfe6

Signed-off-by: Kaviraj <[email protected]>

test to assert offset removal

57d77a9

Signed-off-by: Kaviraj <[email protected]>

kavirajk force-pushed the kavirajk/cache-instant-queries2 branch from bb83a3a to 57d77a9 Compare January 30, 2024 10:55

kavirajk added 4 commits January 31, 2024 20:10

Fix timestamp adjustments

08c5b4b

Signed-off-by: Kaviraj <[email protected]>

missed error handling

bd558a1

Signed-off-by: Kaviraj <[email protected]>

fix failing TestMetricsTripperware_SplitShardStats test

566bc4f

Signed-off-by: Kaviraj <[email protected]>

tweak downstreamer test

e2e91b0

Signed-off-by: Kaviraj <[email protected]>

pull-request-size bot added size/XL and removed size/L labels Feb 2, 2024

kavirajk added 5 commits February 7, 2024 09:03

Fix split_by_range test cases for sub queries

d8ff56f

Signed-off-by: Kaviraj <[email protected]>

fix Downstream with offset removed test case

84bb4d4

Signed-off-by: Kaviraj <[email protected]>

update stats

634d7f8

Signed-off-by: Kaviraj <[email protected]>

support split and align of instant subquery for cache reuse

d538f67

Fix test cases that failed with this changes Signed-off-by: Kaviraj <[email protected]>

Fix some bugs on split align and add more tests

62cd346

Signed-off-by: Kaviraj <[email protected]>

pull-request-size bot added size/XXL and removed size/XL labels Feb 13, 2024

kavirajk changed the title ~~feat(caching): Support caching for instant metric query results~~ feat(caching): Support split align and caching for instant metric query results Feb 13, 2024

kavirajk added 2 commits February 13, 2024 09:33

Merge branch 'main' into kavirajk/cache-instant-queries2

ae2e565

fix some build failures from merge with main

597d40f

Signed-off-by: Kaviraj <[email protected]>

kavirajk marked this pull request as ready for review February 13, 2024 08:48

kavirajk requested a review from a team as a code owner February 13, 2024 08:48

kavirajk changed the title ~~feat(caching): Support split align and caching for instant metric query results~~ feat: Support split align and caching for instant metric query results Feb 13, 2024

ashwanthgoli reviewed Feb 13, 2024

View reviewed changes

pkg/querier/queryrange/downstreamer.go Outdated Show resolved Hide resolved

pkg/logql/rangemapper.go Outdated Show resolved Hide resolved

PR remarks

fce06dc

1. Update both start and end when removing offset 2. Unify subqueries generation in splitalign method Signed-off-by: Kaviraj <[email protected]>

kavirajk added 3 commits February 16, 2024 11:07

Merge branch 'main' into kavirajk/cache-instant-queries2

a6fe289

Signed-off-by: Kaviraj <[email protected]>

fix additional arguments to results cache related to extent

998051a

Signed-off-by: Kaviraj <[email protected]>

make doc

4db5398

Signed-off-by: Kaviraj <[email protected]>

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 16, 2024

make format

c307592

Signed-off-by: Kaviraj <[email protected]>

kavirajk requested a review from ashwanthgoli February 16, 2024 10:42

ashwanthgoli reviewed Feb 16, 2024

View reviewed changes

kavirajk added 3 commits February 16, 2024 14:33

PR remarks

292f13a

Signed-off-by: Kaviraj <[email protected]>

add changelog entry

5cb63d4

Signed-off-by: Kaviraj <[email protected]>

remove unused ingester query options

e373341

Signed-off-by: Kaviraj <[email protected]>

dannykopping reviewed Feb 16, 2024

View reviewed changes

ashwanthgoli reviewed Feb 16, 2024

View reviewed changes

pkg/logql/rangemapper.go Outdated Show resolved Hide resolved

pkg/logql/rangemapper.go Show resolved Hide resolved

kavirajk added 2 commits February 19, 2024 09:09

PR remarks

655844f

Signed-off-by: Kaviraj <[email protected]>

PR remarks and TODO to handle edge case

ef9afeb

Signed-off-by: Kaviraj <[email protected]>

kavirajk requested review from dannykopping and ashwanthgoli February 19, 2024 08:25

dannykopping approved these changes Feb 19, 2024

View reviewed changes

kavirajk added 2 commits February 19, 2024 10:33

PR remarks

a5ad611

Signed-off-by: Kaviraj <[email protected]>

Merge branch 'main' into kavirajk/cache-instant-queries2

bbe5605

kavirajk added 3 commits February 20, 2024 08:38

Add cache hit log lines for instant metric query

38e71d6

Signed-off-by: Kaviraj <[email protected]>

Merge branch 'main' into kavirajk/cache-instant-queries2

35f2c53

fix breaking test cases.

3ff5150

Use right split duration (new InstantSplitDuration) for instant queries Signed-off-by: Kaviraj <[email protected]>

kavirajk merged commit fac5997 into main Feb 20, 2024
9 checks passed

kavirajk deleted the kavirajk/cache-instant-queries2 branch February 20, 2024 10:09

kavirajk mentioned this pull request Mar 22, 2024

fix: (Bug) correct resultType when storing instant query results in cache #12312

Merged

8 tasks

loki-gh-app bot mentioned this pull request Mar 27, 2024

chore(add-major-release-workflow): release 3.0.0-rc.1 #12380

Closed

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024

feat: Support split align and caching for instant metric query results (

b2e4905

grafana#11814) Signed-off-by: Kaviraj <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support split align and caching for instant metric query results #11814

feat: Support split align and caching for instant metric query results #11814

kavirajk commented Jan 29, 2024 •

edited

Loading

dannykopping left a comment

ashwanthgoli commented Feb 16, 2024

dannykopping left a comment

kavirajk commented Feb 20, 2024

feat: Support split align and caching for instant metric query results #11814

feat: Support split align and caching for instant metric query results #11814

Conversation

kavirajk commented Jan 29, 2024 • edited Loading

How it works (without split align)

How it works (with split align)

dannykopping left a comment

Choose a reason for hiding this comment

ashwanthgoli commented Feb 16, 2024

dannykopping left a comment

Choose a reason for hiding this comment

kavirajk commented Feb 20, 2024

kavirajk commented Jan 29, 2024 •

edited

Loading