Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(query): speed up fetching redis data from dictionaries via mget. #16766

Merged

Conversation

Dragonliu2018
Copy link
Contributor

@Dragonliu2018 Dragonliu2018 commented Nov 5, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Previous Impl: For the Column type, loop through the Column and get the values from Redis one by one via OpenDAL.

Current Impl: OpenDAL only supports fetching one value at a time, so we use redis client to fetch the data in batches (via mget).

We also try to add a cache for performance optimization. The LRU cache is used first, and the performance is shown in the table below. However, the performance is not stable. Performance is good if the required data is cached, but poor if it is not or if the cache is full. In addition, we need to handle stale data in the cache to avoid wasting memory. So the TTL cache is tried, but the performance is worse than the LRU cache. So we temporarily give up on using cache for optimization. Related code: #16882

strategy \ key count \ elapsed time(seconds) 1w 10w 100w 1kw
Previous Impl (one by one) 7.338 73.229 594.109 386.483
Current Impl (in batches) 0.051 0.145 0.66 5.194
in batches + LRU cache (data not cached) 0.074 0.197 1.049 6.841
in batches + LRU cache (data cached) 0.034 0.079 0.505 3.228

Reproduce:

# Create Redis DICTIONARY
CREATE OR REPLACE DICTIONARY d1 ( key VARCHAR NULL, value VARCHAR NOT NULL) PRIMARY KEY key
SOURCE(redis(host='100.73.238.81' port='6379'));

# Create test table and insert 1kw keys
CREATE OR REPLACE TABLE red_test_10000000 AS
SELECT
    number % 100000000 AS id,
    concat('key', id::string) as key
FROM numbers(10000000);

# test SQL
select key, dict_get(d1, 'value', key) from red_test_10000000;

part of #16550

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 5, 2024
@Dragonliu2018 Dragonliu2018 force-pushed the dragonliu/dictionary_redis_opt branch 4 times, most recently from c0b6010 to c0d02dd Compare November 11, 2024 15:48
@Dragonliu2018 Dragonliu2018 force-pushed the dragonliu/dictionary_redis_opt branch 6 times, most recently from 8e46370 to a95bc22 Compare November 20, 2024 03:37
@Dragonliu2018 Dragonliu2018 changed the title feat(query): Support access Redis data from dictionaries via the dict_get_batch function. feat(query): speed up fetching redis data from dictionaries via mget. Nov 20, 2024
@Dragonliu2018 Dragonliu2018 marked this pull request as ready for review November 20, 2024 08:02
@Xuanwo
Copy link
Member

Xuanwo commented Nov 20, 2024

This PR reminds me the possibility of adding mget support in opendal.

@b41sh b41sh self-requested a review November 20, 2024 10:18
@b41sh
Copy link
Member

b41sh commented Nov 21, 2024

This PR reminds me the possibility of adding mget support in opendal.

@Xuanwo Can OpenDAL consider adding mget? With OpenDAL, we can simplify the code.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 21, 2024

@Xuanwo Can OpenDAL consider adding mget? With OpenDAL, we can simplify the code.

Not in the near future. We can revisit this part later ❤️

@b41sh b41sh added this pull request to the merge queue Nov 21, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 21, 2024
@sundy-li sundy-li added this pull request to the merge queue Nov 22, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 22, 2024
@b41sh b41sh added this pull request to the merge queue Nov 22, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 22, 2024
@sundy-li sundy-li added this pull request to the merge queue Nov 22, 2024
Merged via the queue into databendlabs:main with commit 7d09e1e Nov 22, 2024
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants