Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add support of sequence field in first row #3264

Closed
wants to merge 1 commit into from

Conversation

Aitozi
Copy link
Contributor

@Aitozi Aitozi commented Apr 26, 2024

Purpose

This PR is meant to support set sequence.field for first row. For example, user may want to keep the first row in Eventtime semantics (Otherwise, the result may not correct). In this mode, we will generate UB message to retract old value.

Also support generating deletion vector for first row with sequence field.

Tests

API and Format

Documentation

@Aitozi Aitozi marked this pull request as draft April 26, 2024 10:29
@Aitozi Aitozi force-pushed the first-row-seq branch 4 times, most recently from 2e6d78d to 3a65c7a Compare April 30, 2024 11:59
@Aitozi Aitozi marked this pull request as ready for review April 30, 2024 12:00
@Aitozi
Copy link
Contributor Author

Aitozi commented May 6, 2024

CC @JingsongLi

@JingsongLi
Copy link
Contributor

Hi @Aitozi , thanks for your contribution!

Semantically speaking, supporting sequence in FIRST-ROW is very good, and it is this semantics.

However, there are two issues:

  1. Users will experience a significant performance gap, with significant differences in streaming read, streaming write, and batch reading, which can greatly affect tuning. Therefore, from this perspective, it is better to let them clearly use another approach.
  2. There are many additional judgments in the code, and the previous assumptions about FISRT-ROW have been broken one by one, which will increase the complexity of the code.

Here, I recommend another implementation method by introducing an option: sequence.field.reverse, flipping the comparison of sequences. The advantage of this is that it is relatively easy to modify. We can remind users of exception when using FIRST-ROW with sequence.

If there are not many scenes, it is still recommended to use this low-cost approach, but if you think there are many scenes, we can tolerate these changes to FIRST-ROW.

@Aitozi
Copy link
Contributor Author

Aitozi commented May 8, 2024

Hi @Aitozi , thanks for your contribution!

Semantically speaking, supporting sequence in FIRST-ROW is very good, and it is this semantics.

However, there are two issues:

  1. Users will experience a significant performance gap, with significant differences in streaming read, streaming write, and batch reading, which can greatly affect tuning. Therefore, from this perspective, it is better to let them clearly use another approach.
  2. There are many additional judgments in the code, and the previous assumptions about FISRT-ROW have been broken one by one, which will increase the complexity of the code.

Here, I recommend another implementation method by introducing an option: sequence.field.reverse, flipping the comparison of sequences. The advantage of this is that it is relatively easy to modify. We can remind users of exception when using FIRST-ROW with sequence.

If there are not many scenes, it is still recommended to use this low-cost approach, but if you think there are many scenes, we can tolerate these changes to FIRST-ROW.

Hi @JingsongLi , thanks for your valuable inputs. sequence.field.reverse looks good to me, will try this solution

@Aitozi Aitozi closed this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants