-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Support statistic with time travel #4251
Conversation
(Long) | ||
snapshotIdPredicate | ||
.visit(LeafPredicateExtractor.INSTANCE) | ||
.get("snapshot_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of "snapshot_id" in this class, could you add:
private static final String SNAPSHOT_ID = "snapshot_id";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And TABLE_TYPE in this class should be private.
public Optional<Statistics> statistics(Long snapshotId) { | ||
if (!snapshotManager().snapshotExists(snapshotId)) { | ||
throw new SnapshotNotExistException( | ||
String.format("snapshot id: %s is not exisit", snapshotId.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change "snapshotId.toString()" to "snapshotId".
String.format("snapshot id: %s is not exisit", snapshotId.toString())); | ||
} | ||
|
||
Long latestSnapshotId = snapshotManager().latestSnapshotId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latestSnapshotId may be null, you should add a check.
public Optional<Statistics> statistics(Long snapshotId) { | ||
if (!snapshotManager().snapshotExists(snapshotId)) { | ||
throw new SnapshotNotExistException( | ||
String.format("snapshot id: %s is not exisit", snapshotId.toString())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change "is not exisit" to "is not existed";
private StatisticSplit(Path location) { | ||
private final @Nullable LeafPredicate snapshotIdPredicate; | ||
|
||
private StatisticSplit(Path location, LeafPredicate snapshotIdPredicate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 159, add @nullable before "LeafPredicate snapshotIdPredicate".
public static final RowType TABLE_TYPE = | ||
private static final String SNAPSHOT_ID = "snapshot_id"; | ||
|
||
private static final RowType TABLE_TYPE = | ||
new RowType( | ||
Arrays.asList( | ||
new DataField(0, "snapshot_id", new BigIntType(false)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNAPSHOT_ID
|
||
Map<String, LeafPredicate> leafPredicates = | ||
predicate.visit(LeafPredicateExtractor.INSTANCE); | ||
snapshotIdPredicate = leafPredicates.get("snapshot_id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNAPSHOT_ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also in line 110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TABLE_TYPE maybe not need set with constant value,keep the style with other table seems better,other position would change to SNAPSHOT_ID.
Good Job! |
Addressed, Thanks for @wwj6591812 reivew, I would add a doc for it. |
+1 |
@@ -74,6 +74,9 @@ default String fullName() { | |||
@Experimental | |||
Optional<Statistics> statistics(); | |||
|
|||
@Experimental | |||
Optional<Statistics> statistics(Long snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a long
instead of Long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Snapshot latestSnapshot = snapshotManager().latestSnapshot(); | ||
if (latestSnapshot != null) { | ||
return store().newStatsFileHandler().readStats(latestSnapshot); | ||
} | ||
return Optional.empty(); | ||
} | ||
|
||
@Override | ||
public Optional<Statistics> statistics(Long snapshotId) { | ||
if (!snapshotManager().snapshotExists(snapshotId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return store().newStatsFileHandler().readStats(latestSnapshot);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should return store().newStatsFileHandler().readStats(latestSnapshot) better. Thanks~ @JingsongLi
@@ -166,14 +166,43 @@ public Identifier identifier() { | |||
|
|||
@Override | |||
public Optional<Statistics> statistics() { | |||
// todo: support time travel | |||
Snapshot latestSnapshot = snapshotManager().latestSnapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just here to respect time travel options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -74,6 +74,9 @@ default String fullName() { | |||
@Experimental | |||
Optional<Statistics> statistics(); | |||
|
|||
@Experimental | |||
Optional<Statistics> statistics(long snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid introducing this method?
Maybe we can Table.copy('scan.snapshot-id', 'xxx').statistics
to get stats for snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice reminder,from this way can implement time travel for statistic. Had change it.
return statistics; | ||
} | ||
} | ||
latestSnapshotId--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to find the snapshot with ANALYZE commit. The snapshot will inherit its parent snapshot.
So I said, just return the stats of this snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According the logic maybe we need traversal. Because StatsFileHandler##readStats from a snapshotid,can only get the latest analyzed snapshot,but from the logic statistic file snapshot_id may less than the anaylzed snapshot about it. I added a case to the ut at end which may refer it.
traversal logic do a optimize, before: need from latest snapshot to the target analyzed snapshot, after: change from the nearest snapshot to the target analyzed snapshot. @JingsongLi |
After discuss, change to query by real snapshot id. @JingsongLi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Scenario:user query statistic table by snapshot-id and analyzed table multiple times. Such as user can do the analyzed at regular time and collect the statistic metric to monitor.
Currently paimon query statistic system table with snapshot_id would error out due to PaimonAnalyzeTableColumnCommand##commitStatistics not keep the real snapshot id the same with statistic snapshot_id,the pr is aimed to fix it.
before fix:
after fix:
Linked issue: close #xxx
Tests
AnalyzeTableTestBase "Paimon analyze: test statistic system table with predicate"
API and Format
Documentation