Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Failed to write parquet format #4020

Closed
1 of 2 tasks
liuyehcf opened this issue Aug 21, 2024 · 3 comments
Closed
1 of 2 tasks

[Bug] Failed to write parquet format #4020

liuyehcf opened this issue Aug 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@liuyehcf
Copy link

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.8.2

Compute Engine

SDK itself

Minimal reproduce step

package org.byconity.paimon;

import org.apache.commons.io.FileUtils;
import org.apache.paimon.catalog.Catalog;
import org.apache.paimon.catalog.CatalogContext;
import org.apache.paimon.catalog.CatalogFactory;
import org.apache.paimon.catalog.Identifier;
import org.apache.paimon.data.BinaryArray;
import org.apache.paimon.data.BinaryArrayWriter;
import org.apache.paimon.data.BinaryWriter;
import org.apache.paimon.data.GenericRow;
import org.apache.paimon.fs.Path;
import org.apache.paimon.options.CatalogOptions;
import org.apache.paimon.options.Options;
import org.apache.paimon.schema.Schema;
import org.apache.paimon.table.Table;
import org.apache.paimon.table.sink.BatchTableCommit;
import org.apache.paimon.table.sink.BatchTableWrite;
import org.apache.paimon.table.sink.BatchWriteBuilder;
import org.apache.paimon.table.sink.CommitMessage;
import org.apache.paimon.types.ArrayType;
import org.apache.paimon.types.DataType;
import org.apache.paimon.types.IntType;
import org.junit.Test;

import java.io.File;
import java.util.List;

public class ParquetWriteTest {

    private final DataType intType = new IntType(false);
    private final DataType innerArrayType = new ArrayType(false, intType);
    private final DataType outerArrayType = new ArrayType(false, innerArrayType);

    @Test
    public void test() throws Exception {
        String localPath = "/tmp/paimon_warehouse";
        FileUtils.deleteDirectory(new File(localPath));

        Options options = new Options();
        options.set(CatalogOptions.METASTORE, "filesystem");
        options.set(CatalogOptions.WAREHOUSE, new Path(localPath).toUri().toString());
        CatalogContext context = CatalogContext.create(options);
        Catalog catalog = CatalogFactory.createCatalog(context);

        String dbName = "testDb";
        String tblName = "testTbl";

        catalog.createDatabase(dbName, false);

        Schema.Builder schemaBuilder = Schema.newBuilder();
        schemaBuilder.column("col_nestedarray", outerArrayType);
        schemaBuilder.option("file.format", "orc");
        Schema schema = schemaBuilder.build();
        Identifier tableId = Identifier.create(dbName, tblName);
        catalog.createTable(tableId, schema, false);
        Table table = catalog.getTable(tableId);

        BatchWriteBuilder writeBuilder = table.newBatchWriteBuilder().withOverwrite();
        try (BatchTableWrite write = writeBuilder.newWrite()) {
            GenericRow record = GenericRow.of(generateOuterArray());
            write.write(record);
            List<CommitMessage> messages = write.prepareCommit();
            try (BatchTableCommit commit = writeBuilder.newCommit()) {
                commit.commit(messages);
            }
        }
    }

    private Object generateOuterArray() {
        BinaryArray binaryArray = new BinaryArray();
        BinaryArrayWriter writer = new BinaryArrayWriter(binaryArray, 1,
                BinaryArray.calculateFixLengthPartSize(innerArrayType));
        BinaryWriter.ValueSetter valueSetter = BinaryWriter.createValueSetter(innerArrayType);
        valueSetter.setValue(writer, 0, generateInnerArray());
        writer.complete();
        return binaryArray;
    }

    private Object generateInnerArray() {
        BinaryArray binaryArray = new BinaryArray();
        BinaryArrayWriter writer = new BinaryArrayWriter(binaryArray, 1,
                BinaryArray.calculateFixLengthPartSize(intType));
        BinaryWriter.ValueSetter valueSetter = BinaryWriter.createValueSetter(intType);
        valueSetter.setValue(writer, 0, 1);
        writer.complete();
        return binaryArray;
    }
}

Same code works fine with orc and avro, but failed with parquet

  • schemaBuilder.option("file.format", "orc"); change this line to shift format

What doesn't meet your expectations?

parquet failed to write complex types

Anything else?

Nothing

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@liuyehcf liuyehcf added the bug Something isn't working label Aug 21, 2024
@JingsongLi
Copy link
Contributor

Try to use master?

@davedwwang
Copy link

企业微信截图_5f184c94-955e-4f54-b7b4-15db4aaf2ce4 master works

@JingsongLi
Copy link
Contributor

@davedwwang 0.9 support writing parquet nested nested types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants