Skip to content

Commit

Permalink
Add co-author: @XuQianJin-Stars
Browse files Browse the repository at this point in the history
  • Loading branch information
zhuangchong committed Mar 18, 2024
2 parents 291a9bc + 2df0c1e commit 217ad91
Show file tree
Hide file tree
Showing 166 changed files with 2,993 additions and 1,247 deletions.
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,7 @@ paimon-core/src/main/java/org/apache/paimon/jdbc/JdbcCatalog.java
paimon-core/src/main/java/org/apache/paimon/utils/ZOrderByteUtils.java
paimon-core/src/test/java/org/apache/paimon/utils/TestZOrderByteUtil.java
paimon-hive/paimon-hive-common/src/test/java/org/apache/paimon/hive/TestHiveMetastore.java
paimon-hive/paimon-hive-connector-common/src/main/java/org/apache/paimon/hive/mapred/TezUtil.java
paimon-spark/paimon-spark-common/src/main/antlr4/org.apache.spark.sql.catalyst.parser.extensions/PaimonSqlExtensions.g4
paimon-spark/paimon-spark-common/src/main/java/org/apache/paimon/spark/SparkGenericCatalog.java
paimon-spark/paimon-spark-common/src/main/scala/org/apache/spark/sql/catalyst/analysis/CoerceArguments.scala
Expand Down
7 changes: 4 additions & 3 deletions docs/content/engines/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,11 @@ Apache Spark and Apache Hive.
| Spark | 3.1 - 3.5 |||||| ✅(3.3+) ||
| Hive | 2.1 - 3.1 ||||||||
| Spark | 2.4 ||||||||
| Trino | 358 - 422 ||||||||
| Trino | 422 - 426 ||||||||
| Trino | 427 - 439 ||||||||
| Presto | 0.236 - 0.280 ||||||||
| [StarRocks](https://docs.starrocks.io/docs/data_source/catalog/paimon_catalog/) | 3.1+ ||||||||
| [Doris](https://doris.apache.org/docs/lakehouse/multi-catalog/paimon/) | 2.0+ ||||||||
| [StarRocks](https://docs.starrocks.io/docs/data_source/catalog/paimon_catalog/) | 3.1+ ||||||||
| [Doris](https://doris.apache.org/docs/lakehouse/multi-catalog/paimon/) | 2.0+ ||||||||

Recommended versions are Flink 1.17.2, Spark 3.5.0, Hive 2.3.9

Expand Down
6 changes: 6 additions & 0 deletions docs/content/engines/spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ Append path to paimon jar file to the `--jars` argument when starting `spark-sql
spark-sql ... --jars /path/to/paimon-spark-3.3-{{< version >}}.jar
```

OR use the `--packages` option.

```bash
spark-sql ... --packages org.apache.paimon:paimon-spark-3.3:{{< version >}}
```

Alternatively, you can copy `paimon-spark-3.3-{{< version >}}.jar` under `spark/jars` in your Spark installation directory.

**Step 2: Specify Paimon Catalog**
Expand Down
61 changes: 23 additions & 38 deletions docs/content/engines/trino.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,13 @@ This documentation is a guide for using Paimon in Trino.

## Version

Paimon currently supports Trino 358 and above.
Paimon currently supports Trino 422 and above.

## Filesystem

From version 0.8, paimon share trino filesystem for all actions, which means, iyou should
config trino filesystem before using trino-paimon. You can find information about how to config
filesystems for trino on trino official website.

## Preparing Paimon Jar File

Expand All @@ -43,38 +49,17 @@ https://paimon.apache.org/docs/master/project/download/

{{< unstable >}}

| Version | Package |
|------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| [358, 368) | [paimon-trino-358-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-358/{{< version >}}/) |
| [368, 369) | [paimon-trino-368-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-368/{{< version >}}/) |
| [369, 370) | [paimon-trino-369-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-369/{{< version >}}/) |
| [370, 388) | [paimon-trino-370-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-370/{{< version >}}/) |
| [388, 393) | [paimon-trino-388-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-388/{{< version >}}/) |
| [393, 422] | [paimon-trino-393-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-393/{{< version >}}/) |
| [422, latest] | [paimon-trino-422-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-422/{{< version >}}/) |
| Version | Package |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| [422, 427] | [paimon-trino-422-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-422/{{< version >}}/) |
| [427, latest] | [paimon-trino-427-{{< version >}}-plugin.tar.gz](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-427/{{< version >}}/) |

{{< /unstable >}}

You can also manually build a bundled jar from the source code. However, there are a few preliminary steps that need to be taken before compiling:

- To build from the source code, [clone the git repository]({{< trino_github_repo >}}).
- Install JDK11 and JDK17 locally, and configure JDK11 as a global environment variable;
- Configure the toolchains.xml file in ${{ MAVEN_HOME }}, the content is as follows.

```
<toolchains>
<toolchain>
<type>jdk</type>
<provides>
<version>17</version>
<vendor>adopt</vendor>
</provides>
<configuration>
<jdkHome>${{ JAVA_HOME }}</jdkHome>
</configuration>
</toolchain>
</toolchains>
```
- Install JDK17 locally, and configure JDK17 as a global environment variable;

Then,you can build bundled jar with the following command:

Expand Down Expand Up @@ -111,7 +96,7 @@ Let Paimon use a secure temporary directory.
```bash
tar -zxf paimon-trino-<trino-version>-{{< version >}}-plugin.tar.gz -C ${TRINO_HOME}/plugin
```
the variable `trino-version` is module name, must be one of 358, 368, 369, 370, 388, 393, 422.
the variable `trino-version` is module name, must be one of 422, 427.
> NOTE: For JDK 17, when Deploying Trino, should add jvm options: `--add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED`
### Configure
Expand All @@ -128,6 +113,15 @@ If you are using HDFS, choose one of the following ways to configure your HDFS:
- set environment variable HADOOP_CONF_DIR.
- configure `hadoop-conf-dir` in the properties.

If you are using a hadoop filesystem, you can still use trino-hdfs and trino-hive to config it.
For example, if you use oss as a storage, you can write in `paimon.properties` according to [Trino Reference](https://trino.io/docs/current/connector/hive.html#hdfs-configuration):

```
hive.config.resources=/path/to/core-site.xml
```

Then, config core-site.xml according to [Jindo Reference](https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/4.x/4.6.x/4.6.12/oss/presto/jindosdk_on_presto.md)

## Kerberos

You can configure kerberos keytab file when using KERBEROS authentication in the properties.
Expand Down Expand Up @@ -194,7 +188,7 @@ SELECT * FROM paimon.test_db.orders
## Query with Time Traveling
{{< tabs "time-travel-example" >}}

{{< tab "version >=368" >}}
{{< tab "version >=422" >}}

```sql
-- read the snapshot from specified timestamp
Expand All @@ -206,15 +200,6 @@ SELECT * FROM t FOR VERSION AS OF 1;

{{< /tab >}}

{{< tab "version < 368" >}}

```sql
-- read the snapshot from specified timestamp with a long value in unix milliseconds
SET SESSION paimon.scan_timestamp_millis=1679486589444;
SELECT * FROM t;
```

{{< /tab >}}

{{< /tabs >}}

Expand Down
14 changes: 7 additions & 7 deletions docs/content/filesystems/oss.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,14 @@ SELECT COUNT(1) FROM test_table;

{{< tab "Trino" >}}

Place `paimon-oss-{{< version >}}.jar` together with `paimon-trino-{{< version >}}.jar` under `plugin/paimon` directory.
From version 0.8, paimon-trino use trino filesystem as basic file read and write system. We strongly recommend you to use jindo-sdk in trino.

You can find [How to config jindo sdk on trino](https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/4.x/4.6.x/4.6.12/oss/presto/jindosdk_on_presto.md) here.
Please note that:
* Use paimon to replace hive-hadoop2 when you decompress the plugin jar and find location to put in.
* You can specify the `core-site.xml` in `paimon.properties` on configuration [hive.config.resources](https://trino.io/docs/current/connector/hive.html#hdfs-configuration).
* Presto and Jindo are the in the same configaration method.

Add options in `etc/catalog/paimon.properties`.
```shell
fs.oss.endpoint=oss-cn-hangzhou.aliyuncs.com
fs.oss.accessKeyId=xxx
fs.oss.accessKeySecret=yyy
```

{{< /tab >}}

Expand Down
9 changes: 2 additions & 7 deletions docs/content/filesystems/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,9 @@ SELECT COUNT(1) FROM test_table;

{{< tab "Trino" >}}

Place `paimon-s3-{{< version >}}.jar` together with `paimon-trino-{{< version >}}.jar` under `plugin/paimon` directory.
Paimon use shared trino filesystem as basic read and write system.

Add options in `etc/catalog/paimon.properties`.
```shell
s3.endpoint=your-endpoint-hostname
s3.access-key=xxx
s3.secret-key=yyy
```
Please refer to [Trino S3](https://trino.io/docs/current/object-storage/file-system-s3.html) to config s3 filesystem in trino.

{{< /tab >}}

Expand Down
2 changes: 1 addition & 1 deletion docs/content/how-to/altering-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ ALTER TABLE my_table SET TBLPROPERTIES (
ALTER TABLE my_table SET PROPERTIES write_buffer_size = '256 MB';
```

> NOTE: Versions below Trino 368 do not support changing/adding table properties.
> NOTE: Versions below Trino 427 do not support changing/adding table properties.
{{< /tab >}}

Expand Down
12 changes: 1 addition & 11 deletions docs/content/how-to/querying-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,17 +123,7 @@ spark.read

{{< /tab >}}

{{< tab "Trino" >}}

```sql
-- read the snapshot from specified timestamp with a long value in unix milliseconds
SET SESSION paimon.scan_timestamp_millis=1679486589444;
SELECT * FROM t;
```

{{< /tab >}}

{{< tab "Trino 368+" >}}
{{< tab "Trino 422+" >}}

```sql
-- read the snapshot from specified timestamp
Expand Down
7 changes: 1 addition & 6 deletions docs/content/project/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,8 @@ This documentation is a guide for downloading Paimon Jars.
| Presto 0.268 | [paimon-presto-0.268-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-presto-0.268/{{< version >}}/) |
| Presto 0.273 | [paimon-presto-0.273-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-presto-0.273/{{< version >}}/) |
| Presto SQL 332 | [paimon-prestosql-332-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-prestosql-332/{{< version >}}/) |
| Trino 358 | [paimon-trino-358-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-358/{{< version >}}/) |
| Trino 368 | [paimon-trino-368-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-368/{{< version >}}/) |
| Trino 369 | [paimon-trino-369-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-369/{{< version >}}/) |
| Trino 370 | [paimon-trino-370-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-370/{{< version >}}/) |
| Trino 388 | [paimon-trino-388-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-388/{{< version >}}/) |
| Trino 393 | [paimon-trino-393-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-393/{{< version >}}/) |
| Trino 422 | [paimon-trino-422-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-422/{{< version >}}/) |
| Trino 427 | [paimon-trino-427-{{< version >}}.jar](https://repository.apache.org/snapshots/org/apache/paimon/paimon-trino-427/{{< version >}}/) |

{{< /unstable >}}

Expand Down
8 changes: 7 additions & 1 deletion docs/layouts/shortcodes/generated/catalog_configuration.html
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,17 @@
<td>Boolean</td>
<td>Enable Catalog Lock.</td>
</tr>
<tr>
<td><h5>lock.type</h5></td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>The Lock Type for Catalog, such as 'hive', 'zookeeper'.</td>
</tr>
<tr>
<td><h5>metastore</h5></td>
<td style="word-wrap: break-word;">"filesystem"</td>
<td>String</td>
<td>Metastore of paimon catalog, supports filesystemhive and jdbc.</td>
<td>Metastore of paimon catalog, supports filesystem, hive and jdbc.</td>
</tr>
<tr>
<td><h5>table.type</h5></td>
Expand Down
7 changes: 7 additions & 0 deletions docs/layouts/shortcodes/generated/core_configuration.html
Original file line number Diff line number Diff line change
Expand Up @@ -593,6 +593,13 @@
<td>Duration</td>
<td>In watermarking, if a source remains idle beyond the specified timeout duration, it triggers snapshot advancement and facilitates tag creation.</td>
</tr>
<tr>
<td><h5>sort-compaction.range-strategy</h5></td>
<td style="word-wrap: break-word;">QUANTITY</td>
<td><p>Enum</p></td>
<td>The range strategy of sort compaction, the default value is quantity.
If the data size allocated for the sorting task is uneven,which may lead to performance bottlenecks, the config can be set to size.<br /><br />Possible values:<ul><li>"SIZE"</li><li>"QUANTITY"</li></ul></td>
</tr>
<tr>
<td><h5>sort-engine</h5></td>
<td style="word-wrap: break-word;">loser-tree</td>
Expand Down
Loading

0 comments on commit 217ad91

Please sign in to comment.