Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Add read performance doc. #4336

Closed

Conversation

wwj6591812
Copy link
Contributor

@wwj6591812 wwj6591812 commented Oct 16, 2024

Purpose

Recently, when I read Paimon data through Flink batch job, I found that the job run time is often very long and often OOM.
Lucky, I have solved by adjust some Paimon parameters.
But in this process, I found that the Paimon lacks a doc which describes how to speed up reading.
So I initialized a doc related to reading performance, hoping to help more Paimon users, and also welcome more friends in Paimon community to supplement this doc.

Linked issue: close #xxx

Tests

API and Format

Documentation

@wwj6591812 wwj6591812 force-pushed the add_read_performance_doc_1016 branch from 8e5e877 to d353377 Compare October 25, 2024 00:54
There are many ways to improve read performance when reading Paimon data with Flink.

## Acceleration In Flink client
When use Flink batch job scan Paimon table, if you don't add option `scan.parallelism`, Paimon will infer the parallelism by read manifest files which take lots of time. So you can read Paimon table with the `scan.parallelism` table property.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that for most tables it will cause "take lots of time".

When use Flink batch job scan Paimon table, if you don't add option `scan.parallelism`, Paimon will infer the parallelism by read manifest files which take lots of time. So you can read Paimon table with the `scan.parallelism` table property.

## Read Compact Snapshot
You can read snapshot which commitKind = Compact, Compared to read other commitKind snapshot, reading this snapshot requires fewer merge operations, resulting in higher performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may only be faster if there is a configuration full-compact.

@wwj6591812
Copy link
Contributor Author

Close this PR.

@wwj6591812 wwj6591812 closed this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants