Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use as of system time when listing tables to improve performance #565

Merged

Conversation

stevie9868
Copy link
Contributor

Context:

In the context of Polaris, some database may have large number of tables, and when listing them, crdb will hold the lock and prevents other db write operations to be performed, leading to high latency.

Blocker
Initially, I thought we can simply include the AS OF SYSTEM in @query for JPA, which turns out to be not the case.
See this commit for reference. Basically it will throw Caused by: org.postgresql.util.PSQLException: ERROR: AS OF SYSTEM TIME: only constant expressions or follower_read_timestamp are allowed, and looks like jpa treat the passed in param as dynamic expressions.

Alternative
Use entityManager to execute the native query instead.

What are some major changes

  1. Since AS OF SYSTEM only is supported in crdb, some of the unit tests, which use h2db will not work so I move those tests to functional tests which use crdb.
  2. Need to create a customRepo, which hosts queries that are not feasible using JPA. In this case, the AS OF SYSTEM TIME syntax.
  3. PolarisTableRepository extends the customRepo

@SpringBootTest(classes = {PolarisPersistenceConfig.class})
@ActiveProfiles(profiles = {"polaris_functional_test"})
@AutoConfigureDataJpa
public class PolarisConnectorTableServiceFunctionalTest extends PolarisConnectorTableServiceTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing from h2db to postgresql and adding these functional tests for both the table service and store service


try {
// pause execution for 10000 milliseconds (10 seconds)
Thread.sleep(10000);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of sleep to wait for the create to complete, the create call should be synchronous and return the created object after it is done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are now only reading the snapshot that is follower_reader_timestamp (4.8s) ago, we need to sleep for sometime so the following list operation can read the 3 tables created.

public List<PolarisTableEntity> findAllTablesByDbNameAndTablePrefix(
final String dbName, final String tableNamePrefix, final int pageFetchSize) {
Pageable page = PageRequest.of(0, pageFetchSize, Sort.by("tbl_name").ascending());
entityManager.createNativeQuery("SET TRANSACTION AS OF SYSTEM TIME follower_read_timestamp()")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entityManager is an instance variable in this class so if multiple threads are all running for this api at the same time, would there be situations where they overwrite the timestamp set by each other, since follower_read_timestamp() is a crdb function that is evaluated to a value upon execution

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the entityManager is annotated with @PersistenceContext it's scoped within a transaction.

https://www.baeldung.com/jpa-hibernate-persistence-context

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info and reference, upon closer look seems like spring will inject a special proxy for the entityManager in this setup so indeed should be safe
https://github.com/spring-projects/spring-framework/blob/main/spring-orm/src/main/java/org/springframework/orm/jpa/SharedEntityManagerCreator.java

// pause execution for 10000 milliseconds (10 seconds)
Thread.sleep(10000);
} catch (InterruptedException e) {
System.out.println("Sleep was interrupted");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: use logger.

public List<PolarisTableEntity> findAllTablesByDbNameAndTablePrefix(
final String dbName, final String tableNamePrefix, final int pageFetchSize) {
Pageable page = PageRequest.of(0, pageFetchSize, Sort.by("tbl_name").ascending());
entityManager.createNativeQuery("SET TRANSACTION AS OF SYSTEM TIME follower_read_timestamp()")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the entityManager is annotated with @PersistenceContext it's scoped within a transaction.

https://www.baeldung.com/jpa-hibernate-persistence-context

insyncoss
insyncoss previously approved these changes Jan 5, 2024
@stevie9868 stevie9868 merged commit b4f73aa into master Jan 6, 2024
3 checks passed
@insyncoss insyncoss deleted the use_as_of_system_time_when_listing_tables_working branch February 2, 2024 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants