Use a new custom struct for efficient encoding in the write procedure #950

ShiKaiWi · 2023-05-31T02:02:48Z

Describe This Problem

In the current write procedure:

RowGroup is used for write method of Table trait;
Above the Table.write, there are two sources converted to RowGroup:
- RemoteEngineService.write_batch receives the raw bytes of arrow record batch and converts the record batches to RowGroup;
- StorageService.write receives the raw bytes of custom protobuf struct and converts the protobuf struct to RowGroup;
Under the Table.write, the RowGroup will be encoded into raw bytes for wal logs and memtable rows, and the wal log payload doesn't have any special requirement for the encoding method while the memtable rows require that the RowGroup must be encoded in rows to keep all rows in primary key order;

Proposal

From the description above, it can be found that there are too many conversions during the write procedure, leading to high CPU utilization, which has been proven in the production environment.

Maybe we can use only one struct for the whole write procedure to avoid extra conversions. And for the wal and memetable, I guess we can let the wal log payload shares the same encoded bytes used by memtable. And such struct must be designed for writing, that is to say, there is no need to include complex schema information.

Additional Context

The encoding and decoding of the arrow ipc performs very well, and I guess it should a benchmark for the new struct designed for write procedure.

The text was updated successfully, but these errors were encountered:

ShiKaiWi added the feature New feature or request label May 31, 2023

chunshao90 self-assigned this May 31, 2023

chunshao90 mentioned this issue Jul 3, 2023

Columnar memtable & Organize data in a columnar way in the write process #1044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a new custom struct for efficient encoding in the write procedure #950

Use a new custom struct for efficient encoding in the write procedure #950

ShiKaiWi commented May 31, 2023 •

edited

Loading

Use a new custom struct for efficient encoding in the write procedure #950

Use a new custom struct for efficient encoding in the write procedure #950

Comments

ShiKaiWi commented May 31, 2023 • edited Loading

Describe This Problem

Proposal

Additional Context

ShiKaiWi commented May 31, 2023 •

edited

Loading