Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core]Support async lookup #4341

Closed
wants to merge 2 commits into from

Conversation

neuyilan
Copy link
Member

@neuyilan neuyilan commented Oct 16, 2024

Purpose

Currently, lookup join does not really support asynchronous, and the purpose of this PR is to support asynchronous lookup join.

Tests

IT in LookupJoinITCase.java

API and Format

no

Documentation

no doc

@neuyilan neuyilan changed the title support async lookup [Core]Support async lookup Oct 16, 2024
@neuyilan neuyilan marked this pull request as draft October 16, 2024 12:48
@neuyilan neuyilan marked this pull request as ready for review October 18, 2024 08:18
@neuyilan
Copy link
Member Author

@JingsongLi Could you please riview this PR? Thanks

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @neuyilan , thanks for your contribution!

But I think there's a very big risk here, can we change it a little bit one by one to make everything in there thread-safe?

ReentrantLock lock = locks[Math.abs(file.hashCode()) % lookupAsyncThreadNumber];
try {
lock.lock();
LookupFile lookupFile = lookupFileCache.getIfPresent(file.fileName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just create a lock in LookupFile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just create a lock in LookupFile?

I think is ok, create one lock in LookupFile, the same DataFileMeta file use the same lock to ensure thread safe, is that what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@neuyilan
Copy link
Member Author

But I think there's a very big risk here, can we change it a little bit one by one to make everything in there thread-safe?
Hi, @JingsongLi What do you mean change it a little bit one by one?

In the current design, FileStoreLookupFunction is non thread safe. This class is the entry point for lookup, so it is necessary to ensure the thread safety of FileStoreLookupFunction.
The current design mainly focuses on the following points:

  1. Ensure thread safety when initializing FileStoreLookupFunction;
  2. The implementation of two lookup caches, namely in LookupLevels and RocksDBState, ensures thread safety.
  3. Since both lookup serialization and deserialization use the same object, this PR also ensures thread safety when serializing and deserializing lookup keys and values.

@JingsongLi
Copy link
Contributor

But I think there's a very big risk here, can we change it a little bit one by one to make everything in there thread-safe?
Hi, @JingsongLi What do you mean change it a little bit one by one?

In the current design, FileStoreLookupFunction is non thread safe. This class is the entry point for lookup, so it is necessary to ensure the thread safety of FileStoreLookupFunction. The current design mainly focuses on the following points:

  1. Ensure thread safety when initializing FileStoreLookupFunction;
  2. The implementation of two lookup caches, namely in LookupLevels and RocksDBState, ensures thread safety.
  3. Since both lookup serialization and deserialization use the same object, this PR also ensures thread safety when serializing and deserializing lookup keys and values.

Thanks for your reply, can you split the PR to multiple PRs? Ensure that every change is thoroughly discussed, otherwise there are too many things to discuss in a large PR, making it difficult to review.

@neuyilan
Copy link
Member Author

But I think there's a very big risk here, can we change it a little bit one by one to make everything in there thread-safe?
Hi, @JingsongLi What do you mean change it a little bit one by one?

In the current design, FileStoreLookupFunction is non thread safe. This class is the entry point for lookup, so it is necessary to ensure the thread safety of FileStoreLookupFunction. The current design mainly focuses on the following points:

  1. Ensure thread safety when initializing FileStoreLookupFunction;
  2. The implementation of two lookup caches, namely in LookupLevels and RocksDBState, ensures thread safety.
  3. Since both lookup serialization and deserialization use the same object, this PR also ensures thread safety when serializing and deserializing lookup keys and values.

Thanks for your reply, can you split the PR to multiple PRs? Ensure that every change is thoroughly discussed, otherwise there are too many things to discuss in a large PR, making it difficult to review.

Got it, I will do this.

@neuyilan
Copy link
Member Author

neuyilan commented Nov 1, 2024

I will close this PR, and submit new PRs
step1 PR: #4423 support async lookup in hash cache store.

@neuyilan neuyilan closed this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants