[Core] Support jdbc catalog #2866

sunxiaojian · 2024-02-07T10:42:05Z

Support jdbc catalog

Linked issue : #841

sunxiaojian · 2024-02-08T15:59:42Z

@JingsongLi PTAL

sunxiaojian · 2024-02-27T10:02:01Z

@FangYongs @yuzelin @JingsongLi If you have time, could you please help review it ? Thx

JingsongLi · 2024-02-27T10:54:28Z

Thanks @sunxiaojian for your contribution.

This PR looks good! I will take a look this week~

JingsongLi · 2024-02-28T05:16:00Z

docs/content/how-to/creating-catalogs.md

@@ -175,3 +176,185 @@ Using the table option facilitates the convenient definition of Hive table param
 Parameters prefixed with `hive.` will be automatically defined in the `TBLPROPERTIES` of the Hive table. 
 For instance, using the option `hive.table.owner=Jon` will automatically add the parameter `table.owner=Jon` to the table properties during the creation process.

+
+
+## Creating a Catalog with Filesystem Metastore


Why copy this?

JingsongLi · 2024-02-28T05:36:08Z

docs/content/how-to/creating-catalogs.md

+    -- 'jdbc.user' = '...', 
+    -- 'jdbc.password' = '...', 
+    -- 'initialize-catalog-tables'='true'
+    -- 'warehouse' = 'hdfs:///path/to/warehouse',


warehouse should be required.

You should remove --

JingsongLi · 2024-02-28T05:38:48Z

docs/content/how-to/creating-catalogs.md

+    -- 'uri' = 'jdbc:mysql://<host>:<port>/<databaseName>',
+    -- 'jdbc.user' = '...', 
+    -- 'jdbc.password' = '...', 
+    -- 'initialize-catalog-tables'='true'


Do we need this one? What is the scene?

Do we need this one? What is the scene?

@JingsongLi When creating a JdbcCatalog, it will be determined based on this parameter whether to automatically create a table or not

When does it not need to be initialized? You added a public API. Please explain why, how to use.

Or you can remove this option.

JingsongLi · 2024-02-28T05:39:48Z

docs/content/how-to/creating-catalogs.md

+
+You can define any default table options with the prefix `table-default.` for tables created in the catalog.
+
+Also, you can create [FlinkGenericCatalog]({{< ref "engines/flink" >}}).


No, FlinkGenericCatalog must use HiveCatalog.

JingsongLi · 2024-02-28T05:45:22Z

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcUtils.java

+                    + ")"
+                    + ")";
+
+    static final String DISTRIBUTED_LOCK_ACQUIRE_SQL =


Can you explain this? How to lock?

Can you explain this? How to lock?

@JingsongLi First, create a table for distributed locks and write a data entry to the database table using database.table as the lockId. If successful, the lock acquisition is considered successful. If the primary key conflicts, it is considered a failure

coolderli · 2024-02-28T07:29:46Z

@sunxiaojian Thanks for your work. This is really what we want!

zhongyujiang · 2024-02-28T08:10:31Z

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcUtils.java

+/** Util for jdbc catalog. */
+public class JdbcUtils {
+    private static final Logger LOG = LoggerFactory.getLogger(JdbcUtils.class);
+    public static final String METADATA_LOCATION_PROP = "metadata_location";


If I understand correctly, Paimon considers the commit successful once the snapshot file is successfully committed to the file system. Storing the current metadata location in the catalog metastore will cause consistency problems because the two operations of committing the snapshot file to the file system and committing it to the metastore are not atomic. So there seems to be no point in storing this metadata location？ @JingsongLi Can you please review this? Thanks.

@zhongyujiang You understanding is very correct! Yes, Paimon only consider Snapshot files as source of truth.
metadata_location is come from Iceberg, we don't need this.

zhongyujiang · 2024-02-28T08:17:38Z

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcUtils.java

+
+    private static int tryClearExpireLock(JdbcClientPool connections, String lockName, long timeout)
+            throws SQLException, InterruptedException {
+        long expirationTimeMillis = System.currentTimeMillis() - timeout * 1000;


I'm not sure I understand this, could you please explain?

zhongyujiang · 2024-02-28T08:36:27Z

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcUtils.java

+                            connection.prepareStatement(DISTRIBUTED_LOCK_ACQUIRE_SQL)) {
+                        preparedStatement.setString(1, lockName);
+                        preparedStatement.setTimestamp(
+                                2, new Timestamp(System.currentTimeMillis()));


So we are using timestamp of clients as acquired_at? I think this makes no sense since clients can be distributed on different machines, which have no consistent system time. Since acquired_at is the timestamp of acquiring the lock, I think we should use the system time of the metastore.

So we are using timestamp of clients as acquired_at? I think this makes no sense since clients can be distributed on different machines, which have no consistent system time. Since acquired_at is the timestamp of acquiring the lock, I think we should use the system time of the metastore.

@zhongyujiang There may indeed be the issue you mentioned, but the database used for the meta storage system is different, and the generation system time function used is also different. May require separate compatibility

The initial time setting was for metastore, but it was not compatible with all databases, so this version of the modification was made. In fact, specific database specific time functions can be set for each database

zhongyujiang · 2024-02-28T08:46:47Z

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcUtils.java

+    public static boolean acquire(JdbcClientPool connections, String lockName, long timeout)
+            throws SQLException, InterruptedException {
+        // Check and clear expire lock
+        int affectedRows = tryClearExpireLock(connections, lockName, timeout);


I think timeout is a property that should be set when acquiring the lock, not when the lock expires. Because different clients may configure different timeouts.

I think timeout is a property that should be set when acquiring the lock, not when the lock expires. Because different clients may configure different timeouts.

In fact, none of them are the most suitable. They all belong to the timeout set by the client, but your solution should be better

sunxiaojian · 2024-02-29T13:04:11Z

@JingsongLi @zhongyujiang Thank you for the review, I have fixed the issues raised above.

JingsongLi · 2024-03-03T12:19:36Z

Hi @sunxiaojian , you should make sure that every comment is resolved.

sunxiaojian · 2024-03-03T13:53:01Z

Hi @sunxiaojian , you should make sure that every comment is resolved.

@JingsongLi All have been resolved

JingsongLi · 2024-03-04T02:04:19Z

docs/content/how-to/creating-catalogs.md

+```
+You can define any connection parameters for a database with the prefix "jdbc.".
+
+You can define the "initialize-catalog-tables" configuration to automatically create tables required for the initial JdbcCatalog. If it is true, tables will be automatically created when initializing JdbcCatalog. If it is false, tables will not be automatically created and you must manually create them.


If it is false, tables will not be automatically created and you must manually create them.

Why is there such a need? I don't understand.

Why is there such a need? I don't understand.

Some online business databases do not allow table creation through jdbc and must be submitted for approval in advance

The final decision is not to keep this configuration. It is recommended that users enable automatic table creation permissions for independent databases, which would be better

Some online business databases do not allow table creation through jdbc and must be submitted for approval in advance

You can document this.

Some online business databases do not allow table creation through jdbc and must be submitted for approval in advance

You can document this.

Most users may not need it, I will remove this parameter and add it back if there is a strong requirement.

JingsongLi · 2024-03-04T02:13:29Z

docs/content/how-to/creating-catalogs.md

+    'uri' = 'jdbc:mysql://<host>:<port>/<databaseName>',
+    'jdbc.user' = '...', 
+    'jdbc.password' = '...', 
+    'catalog-name'='jdbc'


Here we already have the concept of a catalog name, which is my_jdbc.

Maybe this option can be renamed to store-key or something else?

at least, jdbc.catalog-name, you need documentation to explain that it is designed to store multiple catalogs, and it is different from the catalog name you use in Flink SQL or Spark SQL.

JingsongLi · 2024-03-04T02:13:42Z

docs/content/how-to/creating-catalogs.md

+    'jdbc.user' = '...', 
+    'jdbc.password' = '...', 
+    'catalog-name'='jdbc'
+    'initialize-catalog-tables'='true'


you forgot ,

JingsongLi · 2024-03-04T02:15:33Z

paimon-core/src/main/java/org/apache/paimon/ClientPoolImpl.java

+import static org.apache.paimon.utils.Preconditions.checkState;
+
+/** Source: [core/src/main/java/org/apache/iceberg/ClientPoolImpl.java]. */
+public abstract class ClientPoolImpl<C, E extends Exception>


Move them to paimon-common.

sunxiaojian · 2024-03-04T09:51:47Z

@JingsongLi All have been resolved

zhongyujiang · 2024-03-05T03:46:00Z

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcCatalogLock.java

+        String lockUniqueName = String.format("%s.%s.%s", catalogName, database, table);
+        lock(lockUniqueName);
+        try {
+            return callable.call();


What happens if the lock expires before the commit is completed? Will there be conflicts? Does this break the atomicity of commits?

For example, let's consider the first column as logical timestamp, and the second column is the events that occurred at that time:
1 client-1 successfully aquired a lock of table a
2 client-1 starts committing but gets stuck for some reason
5 the lock of table a timeouts
6 client-2 cleaned the expired lock of table a and acquired a new lock of table a
7 client-2 starts committing
8 client-2 made a successful committment
9 client-1 can start committing

In this case, client-1's commit should fail, but here client-1 will continue to commit. If the underlying file system allows overwriting of files with the same name, then client-1's commit will overwrite client-2's commit.

You can start a new thread renewal timeout to solve the problem, but you can also set a reasonable timeout to solve it

For the current strategy, it is difficult for this situation to occur because currently using lock is only renaming the file, and this operation should be completed quickly.

JingsongLi · 2024-03-05T10:03:07Z

@zhongyujiang Hi, do you have other comments or concerns?

zhongyujiang

Left two minor comments, I have no other worries except the above one, Thanks @sunxiaojian

paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcClientPool.java

zhongyujiang · 2024-03-05T11:11:46Z

docs/content/how-to/creating-catalogs.md

@@ -175,3 +176,40 @@ Using the table option facilitates the convenient definition of Hive table param
 Parameters prefixed with `hive.` will be automatically defined in the `TBLPROPERTIES` of the Hive table. 
 For instance, using the option `hive.table.owner=Jon` will automatically add the parameter `table.owner=Jon` to the table properties during the creation process.

+## Creating a Catalog with JDBC Metastore
+
+By using the Paimon JDBC catalog, changes to the catalog will be directly stored in relational databases such as MySQL, postgres, etc.


postgres is not supported for now, should we remove it?

postgres is not supported for now, should we remove it?

it's just that locking is not supported

Can you document this? only SQLITE and MYSQL supports catalog lock

JingsongLi · 2024-03-05T14:00:59Z

docs/content/how-to/creating-catalogs.md

+    'uri' = 'jdbc:mysql://<host>:<port>/<databaseName>',
+    'jdbc.user' = '...', 
+    'jdbc.password' = '...', 
+    'store-key'='jdbc',


Can you modify this to jdbc.catalog-key?

Can you modify this to jdbc.catalog-key?

The prefix "jdbc." is used to connect to the database configuration and can be directly parsed and placed in the connection configuration. If it is necessary to replace it, it can be replaced with "catalog-key"?

sunxiaojian · 2024-03-06T04:20:24Z

@JingsongLi The above issues have been resolved, please review again

JingsongLi

+1 Thanks @sunxiaojian and @zhongyujiang

sunxiaojian force-pushed the support-jdbc-catalog branch 3 times, most recently from 7cc4f4c to 442d343 Compare February 8, 2024 15:59

sunxiaojian force-pushed the support-jdbc-catalog branch 6 times, most recently from 4fd21ac to 443dd36 Compare February 22, 2024 06:18

Support jdbc catalog

5c1fa42

sunxiaojian force-pushed the support-jdbc-catalog branch from 4ab8d2c to 5c1fa42 Compare February 22, 2024 06:27

JingsongLi reviewed Feb 28, 2024

View reviewed changes

zhongyujiang reviewed Feb 28, 2024

View reviewed changes

sunxiaojian force-pushed the support-jdbc-catalog branch 7 times, most recently from a4fd960 to 516b5cf Compare February 29, 2024 12:40

sunxiaojian force-pushed the support-jdbc-catalog branch 3 times, most recently from 558025f to 9d41884 Compare March 1, 2024 04:08

rewrite distributed lock & fixed doc

6a70411

sunxiaojian force-pushed the support-jdbc-catalog branch from 9d41884 to 6a70411 Compare March 1, 2024 04:09

Improve doc

3df2995

JingsongLi reviewed Mar 4, 2024

View reviewed changes

fixed doc

8c2e10d

sunxiaojian force-pushed the support-jdbc-catalog branch from 332fea0 to 8c2e10d Compare March 4, 2024 06:06

zhongyujiang reviewed Mar 5, 2024

View reviewed changes

fixed exception info

0b7b739

sunxiaojian force-pushed the support-jdbc-catalog branch from 736b0e2 to 0b7b739 Compare March 5, 2024 11:57

JingsongLi reviewed Mar 5, 2024

View reviewed changes

sunxiaojian force-pushed the support-jdbc-catalog branch 2 times, most recently from 8809531 to 41fb584 Compare March 6, 2024 02:33

supplement the doc and rename the store-key

bd81c01

sunxiaojian force-pushed the support-jdbc-catalog branch from d6b8d88 to bd81c01 Compare March 6, 2024 02:53

JingsongLi approved these changes Mar 6, 2024

View reviewed changes

JingsongLi merged commit 9d29737 into apache:master Mar 6, 2024
10 checks passed


		You can define any default table options with the prefix `table-default.` for tables created in the catalog.

		Also, you can create [FlinkGenericCatalog]({{< ref "engines/flink" >}}).

[Core] Support jdbc catalog #2866

[Core] Support jdbc catalog #2866

Conversation

sunxiaojian commented Feb 7, 2024 • edited Loading

sunxiaojian commented Feb 8, 2024

sunxiaojian commented Feb 27, 2024 • edited Loading

JingsongLi commented Feb 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coolderli commented Feb 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunxiaojian commented Feb 29, 2024

JingsongLi commented Mar 3, 2024

sunxiaojian commented Mar 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunxiaojian commented Mar 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JingsongLi commented Mar 5, 2024

zhongyujiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunxiaojian commented Mar 6, 2024

JingsongLi left a comment

Choose a reason for hiding this comment

sunxiaojian commented Feb 7, 2024 •

edited

Loading

sunxiaojian commented Feb 27, 2024 •

edited

Loading