Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a new custom struct for efficient encoding in the write procedure #950

Open
ShiKaiWi opened this issue May 31, 2023 · 0 comments
Open
Assignees
Labels
feature New feature or request

Comments

@ShiKaiWi
Copy link
Member

ShiKaiWi commented May 31, 2023

Describe This Problem

In the current write procedure:

  • RowGroup is used for write method of Table trait;
  • Above the Table.write, there are two sources converted to RowGroup:
    • RemoteEngineService.write_batch receives the raw bytes of arrow record batch and converts the record batches to RowGroup;
    • StorageService.write receives the raw bytes of custom protobuf struct and converts the protobuf struct to RowGroup;
  • Under the Table.write, the RowGroup will be encoded into raw bytes for wal logs and memtable rows, and the wal log payload doesn't have any special requirement for the encoding method while the memtable rows require that the RowGroup must be encoded in rows to keep all rows in primary key order;

Proposal

From the description above, it can be found that there are too many conversions during the write procedure, leading to high CPU utilization, which has been proven in the production environment.

Maybe we can use only one struct for the whole write procedure to avoid extra conversions. And for the wal and memetable, I guess we can let the wal log payload shares the same encoded bytes used by memtable. And such struct must be designed for writing, that is to say, there is no need to include complex schema information.

Additional Context

The encoding and decoding of the arrow ipc performs very well, and I guess it should a benchmark for the new struct designed for write procedure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants