[parquet] Support using file index result to filter row ranges #4780

Tan-JiaLiang · 2024-12-26T06:17:09Z

Purpose

Support using file index result to filter row ranges.

Tests

org.apache.paimon.table.AppendOnlyFileStoreTableTest#testBitmapIndexResultFilterParquetRowRanges

API and Format

No.

Documentation

No.

JingsongLi · 2024-12-26T14:53:55Z

Hi @Tan-JiaLiang , can you show some test results after index pushing down?

Tan-JiaLiang · 2024-12-27T06:54:46Z

I test in a 20 fields table, generate 1000000 rows, using the BSI index to test the EQ predicate.

generate random values bound in [0, 1000) for BSI index field, and test predicate EQ with a random value（index result was no filter any ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
without-bsi-index-1000-744	3387 / 3565	885.7	1129.1	1.0X
without-bsi-index-1000-441	3521 / 3599	852.1	1173.5	1.0X
without-bsi-index-1000-778	3389 / 3485	885.3	1129.6	1.0X
with-bsi-index-1000-744	3425 / 3604	875.9	1141.7	1.0X
with-bsi-index-1000-441	3305 / 3449	907.7	1101.7	1.0X
with-bsi-index-1000-778	3330 / 3459	901.0	1109.9	1.0X

generate random values bound in [0, 10000) for BSI index field, and test predicate EQ with a random value（index result maybe more filter 4-5 ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
without-bsi-index-10000-7069	3471 / 3608	864.4	1156.8	1.0X
without-bsi-index-10000-9186	3329 / 3467	901.2	1109.6	1.0X
without-bsi-index-10000-9679	3369 / 3452	890.5	1122.9	1.0X
with-bsi-index-10000-7069	2992 / 3159	1002.8	997.2	1.2X
with-bsi-index-10000-9186	3074 / 3174	976.1	1024.5	1.1X
with-bsi-index-10000-9679	3123 / 3262	960.5	1041.1	1.1X

generate random values bound in [0, 20000) for BSI index field, and test predicate EQ with a random value（index result maybe more filter 10-20 ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
read-without-bsi-index-20000-18921	3212 / 3421	933.9	1070.8	1.0X
read-without-bsi-index-20000-4309	3246 / 3403	924.3	1081.9	1.0X
read-without-bsi-index-20000-14889	3421 / 3439	877.1	1140.2	0.9X
read-with-bsi-index-20000-18921	2054 / 2129	1460.8	684.6	1.6X
read-with-bsi-index-20000-4309	2310 / 2494	1298.7	770.0	1.4X
read-with-bsi-index-20000-14889	2920 / 3072	1027.5	973.2	1.1X

values bound in [0, 50000), and test predicate EQ with a random value（index result maybe more filter 30-35 ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
read-without-bsi-index-50000-469	3324 / 3398	902.4	1108.1	1.0X
read-without-bsi-index-50000-7187	3437 / 3493	872.9	1145.7	1.0X
read-without-bsi-index-50000-1208	3404 / 3470	881.3	1134.6	1.0X
read-with-bsi-index-50000-469	1451 / 1518	2068.2	483.5	2.3X
read-with-bsi-index-50000-7187	1206 / 1259	2487.3	402.0	2.8X
read-with-bsi-index-50000-1208	917 / 1044	3270.8	305.7	3.6X

generate random values bound in [0, 100000) for BSI index field, and test predicate EQ with a random value（index result maybe more filter 35-45 ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
read-without-bsi-index-100000-99543	3414 / 3497	878.7	1138.0	1.0X
read-without-bsi-index-100000-78972	3390 / 3420	885.0	1130.0	1.0X
read-without-bsi-index-100000-82689	3316 / 3406	904.6	1105.5	1.0X
read-with-bsi-index-100000-99543	797 / 867	3763.9	265.7	4.3X
read-with-bsi-index-100000-78972	544 / 613	5516.6	181.3	6.3X
read-with-bsi-index-100000-82689	733 / 822	4091.1	244.4	4.7X

generate random values bound in [0, 200000) for BSI index field, and test predicate EQ with a random value（index result maybe more filter 35-45 ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
read-without-bsi-index-200000-113323	3393 / 3419	884.2	1130.9	1.0X
read-without-bsi-index-200000-124857	3409 / 3440	880.1	1136.2	1.0X
read-without-bsi-index-200000-13588	3341 / 3405	898.0	1113.5	1.0X
read-with-bsi-index-200000-113323	164 / 173	18273.7	54.7	20.7X
read-with-bsi-index-200000-124857	264 / 295	11347.3	88.1	12.8X
read-with-bsi-index-200000-13588	260 / 279	11540.2	86.7	13.1X

generate random values bound in [0, 500000) for BSI index field, and test predicate EQ with a random value（index result maybe more filter 35-45 ranges）

read	Best/Avg Time(ms)	Row Rate(K/s)	Per Row(ns)	Relative
read-without-bsi-index-500000-222300	3260 / 3507	920.2	1086.8	1.0X
read-without-bsi-index-500000-211083	3427 / 3640	875.3	1142.5	1.0X
read-without-bsi-index-500000-172343	3530 / 3545	850.0	1176.5	0.9X
read-with-bsi-index-500000-222300	188 / 208	15969.4	62.6	17.4X
read-with-bsi-index-500000-211083	226 / 244	13271.6	75.3	14.4X
read-with-bsi-index-500000-172343	356 / 379	8426.1	118.7	9.2X

Tan-JiaLiang · 2024-12-27T06:58:01Z

This result was testing after #4765 fixed.

Tan-JiaLiang · 2024-12-27T07:47:22Z

It occurred to me that we could use the bitmap result to narrow down the row ranges, which might skip some deserialisation overhead. Be aware that this may cause fragmentation of the row ranges, but I think the row ranges themselves are limited, it doesn't take up much memory.

Tan-JiaLiang · 2024-12-27T09:37:31Z

@JingsongLi Can you help to trigger the build again? The Container was startup timeout in E2E test.

JingsongLi · 2024-12-28T13:57:40Z

Hi @Tan-JiaLiang thanks for your benchmark, what did you compare in the benchmark? Before and after this PR? Is that so, with or without an index?

JingsongLi

+1

Tan-JiaLiang · 2024-12-30T02:22:52Z

Yes, I compare with the EQ predicate in enable or disable the option file-index.read.enabled.

Tan-JiaLiang · 2024-12-30T02:26:25Z

This is the Benchmark code

/** Benchmark for table read. */
public class BitmapIndexPushDownBenchmark {

    private static final int VALUE_COUNT = 20;

    private final int rowCount = 1000000;
    @TempDir java.nio.file.Path tempFile;

    private final RandomDataGenerator random = new RandomDataGenerator();

    @Test
    public void testParquetRead() throws Exception {
        System.out.println(tempFile);
        int[] bounds = new int[] {1000, 10000, 50000, 100000};
        for (int bound : bounds) {
            Table table = prepareData(bound, parquet(), "parquet_" + bound);
            Map<String, Table> tables = new LinkedHashMap<>();
            tables.put("without-bsi-index", table.copy(Collections.singletonMap("file-index.read.enabled", "false")));
            tables.put("with-bsi-index", table.copy(Collections.singletonMap("file-index.read.enabled", "true")));

            int[] values = new int[5];
            for (int i = 0; i < values.length; i++) {
                values[i] = random.nextInt(0, bound);
            }
            innerTest(tables, bound, values);
        }
    }

    private Options parquet() {
        Options options = new Options();
        options.set(CoreOptions.FILE_FORMAT, CoreOptions.FILE_FORMAT_PARQUET);
        options.set("file-index.bsi.columns", "k");
        return options;
    }

    private void innerTest(Map<String, Table> tables, int bound, int[] values) {
        int readTime = 3;
        Benchmark benchmark =
                new Benchmark("read", readTime * rowCount)
                        .setNumWarmupIters(1)
                        .setOutputPerIteration(true);

        for (String name : tables.keySet()) {
            for (int value : values) {
                benchmark.addCase(
                        "read-" + name + "-" + bound + "-" + value,
                        3,
                        () -> {
                            Table table = tables.get(name);
                            Predicate predicate = new PredicateBuilder(table.rowType()).equal(0, value);
                            for (int i = 0; i < readTime; i++) {
                                List<Split> splits = table.newReadBuilder().newScan().plan().splits();
                                AtomicLong readCount = new AtomicLong(0);
                                try {
                                    for (Split split : splits) {
                                        RecordReader<InternalRow> reader =
                                                table.newReadBuilder()
                                                        .withFilter(predicate)
                                                        .newRead()
                                                        .createReader(split);
                                        reader.forEachRemaining(row -> readCount.incrementAndGet());
                                    }
                                    System.out.printf("Finish read %d rows.\n", readCount.get());
                                } catch (Exception e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        });
            }
        }
        benchmark.run();
    }

    private Table prepareData(int bound, Options options, String tableName) throws Exception {
        Table table = createTable(options, tableName);
        StreamWriteBuilder writeBuilder = table.newStreamWriteBuilder();
        StreamTableWrite write = writeBuilder.newWrite();
        StreamTableCommit commit = writeBuilder.newCommit();
        AtomicInteger writeCount = new AtomicInteger(0);
        for (int i = 0; i < rowCount; i++) {
            try {
                write.write(newRandomRow(bound));
                writeCount.incrementAndGet();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
        List<CommitMessage> commitMessages = write.prepareCommit(true, 1);
        commit.commit(1, commitMessages);

        write.close();
        return table;
    }

    protected Table createTable(Options tableOptions, String tableName)
            throws Exception {
        Options catalogOptions = new Options();
        catalogOptions.set(CatalogOptions.WAREHOUSE, tempFile.toUri().toString());
        Catalog catalog = CatalogFactory.createCatalog(CatalogContext.create(catalogOptions));
        String database = "default";
        catalog.createDatabase(database, true);

        List<DataField> fields = new ArrayList<>();
        fields.add(new DataField(0, "k", new IntType()));
        for (int i = 1; i <= VALUE_COUNT; i++) {
            fields.add(new DataField(i, "f" + i, DataTypes.STRING()));
        }
        Schema schema =
                new Schema(fields, Collections.emptyList(), Collections.emptyList(), tableOptions.toMap(), "");
        Identifier identifier = Identifier.create(database, tableName);
        catalog.createTable(identifier, schema, false);
        return catalog.getTable(identifier);
    }

    protected InternalRow newRandomRow(int bound) {
        GenericRow row = new GenericRow(1 + VALUE_COUNT);
        row.setField(0, random.nextInt(0, bound));
        for (int i = 1; i <= VALUE_COUNT; i++) {
            row.setField(i, BinaryString.fromString(random.nextHexString(32)));
        }
        return row;
    }
}

Tan-JiaLiang added 3 commits December 26, 2024 14:07

[parquet] Support using file index result to filter row ranges

111f5b5

mvn spotless:apply

3ebff7c

fix java docs

ec88dd1

Tan-JiaLiang added 2 commits December 27, 2024 15:35

using bitmap index result to narrow the row ranges

a13ee8e

mvn spotless:apply

032f10f

JingsongLi approved these changes Dec 30, 2024

View reviewed changes

JingsongLi merged commit 43dff3a into apache:master Dec 30, 2024
11 of 12 checks passed

JingsongLi pushed a commit that referenced this pull request Dec 30, 2024

[parquet] Support using file index result to filter row ranges (#4780)

049a5ae

Tan-JiaLiang mentioned this pull request Dec 31, 2024

[parquet] Fix file index result filter the row ranges missing rowgroup offset problem #4806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[parquet] Support using file index result to filter row ranges #4780

[parquet] Support using file index result to filter row ranges #4780

Tan-JiaLiang commented Dec 26, 2024

JingsongLi commented Dec 26, 2024

Tan-JiaLiang commented Dec 27, 2024

Tan-JiaLiang commented Dec 27, 2024

Tan-JiaLiang commented Dec 27, 2024

Tan-JiaLiang commented Dec 27, 2024

JingsongLi commented Dec 28, 2024

JingsongLi left a comment

Tan-JiaLiang commented Dec 30, 2024

Tan-JiaLiang commented Dec 30, 2024

[parquet] Support using file index result to filter row ranges #4780

[parquet] Support using file index result to filter row ranges #4780

Conversation

Tan-JiaLiang commented Dec 26, 2024

Purpose

Tests

API and Format

Documentation

JingsongLi commented Dec 26, 2024

Tan-JiaLiang commented Dec 27, 2024

Tan-JiaLiang commented Dec 27, 2024

Tan-JiaLiang commented Dec 27, 2024

Tan-JiaLiang commented Dec 27, 2024

JingsongLi commented Dec 28, 2024

JingsongLi left a comment

Choose a reason for hiding this comment

Tan-JiaLiang commented Dec 30, 2024

Tan-JiaLiang commented Dec 30, 2024