Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce AsyncPositionOutputStream #3875

Merged
merged 10 commits into from
Aug 6, 2024

Conversation

JingsongLi
Copy link
Contributor

@JingsongLi JingsongLi commented Aug 2, 2024

Purpose

This PR introduces asynchronous file writing to increase data throughput.

In order to avoid the problem of byte array reuse, all written arrays are copied once, which is the overhead brought by this PR. However, compared to writing files, copying a byte array seems to be less expensive.

Tests

AsyncPositionOutputStreamTest

API and Format

Documentation

@JingsongLi JingsongLi changed the title [WIP][core] Introduce AsyncPositionOutputStream [core] Introduce AsyncPositionOutputStream Aug 5, 2024
} catch (ExecutionException e) {
throw new RuntimeException(e);
} finally {
isClosed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usage of this isClosed? The thread needs an EndEvent or isClosed == true to exit. However you use future.get() to wait for thread and then set isClosed = true, meaning that only EndEvent can stop the thread.

@Override
public void close() throws IOException {
checkException();
flushBuffer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flushBuffer();
flush();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is flushBuffer, close will also do inner flush too.

private final int offset;
private final int length;

public DataEvent(byte[] data, int offset, int length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset is always 0, no need for this field.

import java.io.ByteArrayOutputStream;

/** A {@link ByteArrayOutputStream} which can reuse byte array. */
public class ReuseByteArrayOutputStream extends ByteArrayOutputStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the usage of this class? I don't see it is used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for util usage.

@JingsongLi JingsongLi merged commit 84732cd into apache:master Aug 6, 2024
11 checks passed
wxplovecc pushed a commit to tongcheng-elong/incubator-paimon that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants