Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce PredicateUtils to refactor Predicate handle #4469

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.paimon.predicate;

import org.apache.paimon.utils.Preconditions;

import java.util.List;
import java.util.function.Consumer;

/** A utils to handle {@link Predicate}. */
public class PredicateUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InPredicateVisitor?


/**
* Method for handling with CompoundPredicate.
*
* @param predicate CompoundPredicate to traverse handle
* @param leafName LeafPredicate name
* @param matchConsumer leafName matched handle
* @param unMatchConsumer leafName unmatched handle
*/
public static void traverseCompoundPredicate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractInElements

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return Optional<List<Object>>, elements of in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it, would modify as suggestion,Thanks @JingsongLi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm traveling outside, would modify it in these two days.

Copy link
Member Author

@xuzifu666 xuzifu666 Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the pr to a new pr #4486 close it first.

Predicate predicate,
String leafName,
Consumer<Predicate> matchConsumer,
Consumer<Predicate> unMatchConsumer) {
Preconditions.checkState(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be use Preconditions.checkArgument instead better?

predicate instanceof CompoundPredicate,
"PredicateUtils##handleCompoundPredicate should handle with a CompoundPredicate.");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PredicateUtils.traverseCompoundPredicate only supports processing Predicates of CompoundPredicate type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

CompoundPredicate compoundPredicate = (CompoundPredicate) predicate;
List<Predicate> children = compoundPredicate.children();
for (Predicate leaf : children) {
if (leaf instanceof LeafPredicate
&& (((LeafPredicate) leaf).function() instanceof Equal)
&& leaf.visit(LeafPredicateExtractor.INSTANCE).get(leafName) != null
&& matchConsumer != null) {
matchConsumer.accept(leaf);
} else {
if (unMatchConsumer != null) {
unMatchConsumer.accept(leaf);
}
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import org.apache.paimon.predicate.LessThan;
import org.apache.paimon.predicate.Or;
import org.apache.paimon.predicate.Predicate;
import org.apache.paimon.predicate.PredicateUtils;
import org.apache.paimon.reader.RecordReader;
import org.apache.paimon.schema.SchemaManager;
import org.apache.paimon.schema.TableSchema;
Expand Down Expand Up @@ -205,7 +206,7 @@ private class SchemasRead implements InnerTableRead {

private Optional<Long> optionalFilterSchemaIdMax = Optional.empty();
private Optional<Long> optionalFilterSchemaIdMin = Optional.empty();
private final List<Long> schemaIds = new ArrayList<>();
private List<Long> schemaIds = new ArrayList<>();

public SchemasRead(FileIO fileIO) {
this.fileIO = fileIO;
Expand All @@ -221,26 +222,28 @@ public InnerTableRead withFilter(Predicate predicate) {
if (predicate instanceof CompoundPredicate) {
CompoundPredicate compoundPredicate = (CompoundPredicate) predicate;
if ((compoundPredicate.function()) instanceof And) {
List<Predicate> children = compoundPredicate.children();
for (Predicate leaf : children) {
handleLeafPredicate(leaf, leafName);
}
PredicateUtils.traverseCompoundPredicate(
predicate,
leafName,
(Predicate p) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May replaced with expression lambda better?

(Predicate p) -> handleLeafPredicate(p, leafName)

handleLeafPredicate(p, leafName);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can handleLeafPredicate method also be extracted to the util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered about it,this method common logic is less,extract may cost more codes,so keep as current maybe better. Thanks~ @LinMingQiang

null);
}

// optimize for IN filter
if ((compoundPredicate.function()) instanceof Or) {
List<Predicate> children = compoundPredicate.children();
for (Predicate leaf : children) {
if (leaf instanceof LeafPredicate
&& (((LeafPredicate) leaf).function() instanceof Equal)
&& leaf.visit(LeafPredicateExtractor.INSTANCE).get(leafName)
!= null) {
schemaIds.add((Long) ((LeafPredicate) leaf).literals().get(0));
} else {
schemaIds.clear();
break;
}
}
PredicateUtils.traverseCompoundPredicate(
predicate,
leafName,
(Predicate p) -> {
if (schemaIds != null) {
schemaIds.add((Long) ((LeafPredicate) p).literals().get(0));
}
},
(Predicate p) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Predicate p) -> schemaIds = null

schemaIds = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not schemaIds.clear()?

});
}
} else {
handleLeafPredicate(predicate, leafName);
Expand Down Expand Up @@ -299,7 +302,7 @@ public RecordReader<InternalRow> createReader(Split split) {
SchemaManager manager = new SchemaManager(fileIO, location, branch);

Collection<TableSchema> tableSchemas;
if (!schemaIds.isEmpty()) {
if (schemaIds != null && !schemaIds.isEmpty()) {
tableSchemas = manager.schemasWithId(schemaIds);
} else {
tableSchemas =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
import org.apache.paimon.predicate.LessThan;
import org.apache.paimon.predicate.Or;
import org.apache.paimon.predicate.Predicate;
import org.apache.paimon.predicate.PredicateUtils;
import org.apache.paimon.reader.RecordReader;
import org.apache.paimon.table.FileStoreTable;
import org.apache.paimon.table.ReadonlyTable;
Expand Down Expand Up @@ -208,7 +209,7 @@ private class SnapshotsRead implements InnerTableRead {
private RowType readType;
private Optional<Long> optionalFilterSnapshotIdMax = Optional.empty();
private Optional<Long> optionalFilterSnapshotIdMin = Optional.empty();
private final List<Long> snapshotIds = new ArrayList<>();
private List<Long> snapshotIds = new ArrayList<>();

public SnapshotsRead(FileIO fileIO) {
this.fileIO = fileIO;
Expand All @@ -223,26 +224,27 @@ public InnerTableRead withFilter(Predicate predicate) {
String leafName = "snapshot_id";
if (predicate instanceof CompoundPredicate) {
CompoundPredicate compoundPredicate = (CompoundPredicate) predicate;
List<Predicate> children = compoundPredicate.children();
if ((compoundPredicate.function()) instanceof And) {
for (Predicate leaf : children) {
handleLeafPredicate(leaf, leafName);
}
PredicateUtils.traverseCompoundPredicate(
predicate,
leafName,
(Predicate p) -> handleLeafPredicate(p, leafName),
null);
}

// optimize for IN filter
if ((compoundPredicate.function()) instanceof Or) {
for (Predicate leaf : children) {
if (leaf instanceof LeafPredicate
&& (((LeafPredicate) leaf).function() instanceof Equal)
&& leaf.visit(LeafPredicateExtractor.INSTANCE).get(leafName)
!= null) {
snapshotIds.add((Long) ((LeafPredicate) leaf).literals().get(0));
} else {
snapshotIds.clear();
break;
}
}
PredicateUtils.traverseCompoundPredicate(
predicate,
leafName,
(Predicate p) -> {
if (snapshotIds != null) {
snapshotIds.add((Long) ((LeafPredicate) p).literals().get(0));
}
},
(Predicate p) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Predicate p) -> snapshotIds = null

snapshotIds = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not snapshotIds.clear()?

Copy link
Member Author

@xuzifu666 xuzifu666 Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had just set to null which more efficient.

});
}
} else {
handleLeafPredicate(predicate, leafName);
Expand Down Expand Up @@ -304,7 +306,7 @@ public RecordReader<InternalRow> createReader(Split split) throws IOException {
new SnapshotManager(fileIO, ((SnapshotsSplit) split).location, branch);

Iterator<Snapshot> snapshots;
if (!snapshotIds.isEmpty()) {
if (snapshotIds != null && !snapshotIds.isEmpty()) {
snapshots = snapshotManager.snapshotsWithId(snapshotIds);
} else {
snapshots =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.apache.paimon.predicate.LeafPredicateExtractor;
import org.apache.paimon.predicate.Or;
import org.apache.paimon.predicate.Predicate;
import org.apache.paimon.predicate.PredicateUtils;
import org.apache.paimon.reader.RecordReader;
import org.apache.paimon.table.FileStoreTable;
import org.apache.paimon.table.ReadonlyTable;
Expand Down Expand Up @@ -191,6 +192,7 @@ private class TagsRead implements InnerTableRead {

private final FileIO fileIO;
private RowType readType;
private Map<String, Tag> predicateMap = new TreeMap<>();

public TagsRead(FileIO fileIO) {
this.fileIO = fileIO;
Expand Down Expand Up @@ -223,7 +225,6 @@ public RecordReader<InternalRow> createReader(Split split) {
TagManager tagManager = new TagManager(fileIO, location, branch);

Map<String, Tag> nameToSnapshot = new TreeMap<>();
Map<String, Tag> predicateMap = new TreeMap<>();
if (predicate != null) {
if (predicate instanceof LeafPredicate
&& ((LeafPredicate) predicate).function() instanceof Equal
Expand All @@ -239,31 +240,24 @@ public RecordReader<InternalRow> createReader(Split split) {
CompoundPredicate compoundPredicate = (CompoundPredicate) predicate;
// optimize for IN filter
if ((compoundPredicate.function()) instanceof Or) {
List<Predicate> children = compoundPredicate.children();
for (Predicate leaf : children) {
if (leaf instanceof LeafPredicate
&& (((LeafPredicate) leaf).function() instanceof Equal
&& ((LeafPredicate) leaf).literals().get(0)
instanceof BinaryString)
&& predicate
.visit(LeafPredicateExtractor.INSTANCE)
.get(TAG_NAME)
!= null) {
String equalValue =
((LeafPredicate) leaf).literals().get(0).toString();
if (tagManager.tagExists(equalValue)) {
predicateMap.put(equalValue, tagManager.tag(equalValue));
}
} else {
predicateMap.clear();
break;
}
}
PredicateUtils.traverseCompoundPredicate(
predicate,
TAG_NAME,
(Predicate p) -> {
String equalValue =
((LeafPredicate) p).literals().get(0).toString();
if (predicateMap != null && tagManager.tagExists(equalValue)) {
predicateMap.put(equalValue, tagManager.tag(equalValue));
}
},
(Predicate p) -> {
predicateMap = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not predicateMap.clear();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had just set to null which more efficient.

});
}
}
}

if (!predicateMap.isEmpty()) {
if (predicateMap != null && !predicateMap.isEmpty()) {
nameToSnapshot.putAll(predicateMap);
} else {
for (Pair<Tag, String> tag : tagManager.tagObjects()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ public void testSchemasTable() {
result =
sql(
"SELECT schema_id, fields, partition_keys, "
+ "primary_keys, options, `comment` FROM T$schemas where schema_id>0 and schema_id<3");
+ "primary_keys, options, `comment` FROM T$schemas where schema_id>0 and schema_id<3 order by schema_id");
assertThat(result.toString())
.isEqualTo(
"[+I[1, [{\"id\":0,\"name\":\"a\",\"type\":\"INT NOT NULL\"},"
Expand All @@ -312,7 +312,7 @@ public void testSchemasTable() {
result =
sql(
"SELECT schema_id, fields, partition_keys, "
+ "primary_keys, options, `comment` FROM T$schemas where schema_id in (1, 3)");
+ "primary_keys, options, `comment` FROM T$schemas where schema_id in (1, 3) order by schema_id");
assertThat(result.toString())
.isEqualTo(
"[+I[1, [{\"id\":0,\"name\":\"a\",\"type\":\"INT NOT NULL\"},"
Expand Down
Loading