Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[format] Format prefix in option keys are processed in each format #4245

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

tsreaper
Copy link
Contributor

Purpose

Currently, format prefix in option keys are removed before sending into the constructor of each format. However, most formats (orc, parquet) still recover these prefixes in their code, so it would be better to let the format itself deal with these prefixes.

Also, we're going to introduce a special compacted changelog format, which wraps other file formats. However with the current code, each format can only see the options related to itself. If option keys are processed in each format, we can see options for other formats and keep what we need.

This PR process option keys in each format.

Tests

Existing tests should cover this change.

API and Format

No format changes.

Documentation

No new feature.


public AvroFileFormat(FormatContext context) {
super(IDENTIFIER);
this.context = context;

this.options = getIdentifierPrefixOptions(context.options(), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also modify AVRO_OUTPUT_CODEC and AVRO_ROW_NAME_MAPPING to add prefix?

String prefix = formatIdentifier.toLowerCase() + ".";
for (String key : options.keySet()) {
if (key.toLowerCase().startsWith(prefix)) {
String substr = key.substring(prefix.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change substr to another better name?

@@ -62,11 +62,14 @@ public class AvroFileFormat extends FileFormat {
public static final ConfigOption<Map<String, String>> AVRO_ROW_NAME_MAPPING =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In AvroFileFormat, IDENTIFIER && AVRO_OUTPUT_CODEC && AVRO_ROW_NAME_MAPPING should be private.

public static FileFormat fromIdentifier(String identifier, Options options) {
return fromIdentifier(identifier, new FormatContext(options, 1024, 1024));
return fromIdentifier(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This called method fromIdentifier should be private.

@@ -47,8 +47,8 @@ public void testAbsent() {
@Test
public void testPresent() {
Options options = new Options();
options.setString("haha", "1");
options.setString("compress", "zlib");
options.setString("orc.haha", "1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change haha to other normal orc option?

@wwj6591812
Copy link
Contributor

wwj6591812 commented Sep 24, 2024

Good work!
After download your code in my computer, I have found no big problems.
Already add some small suggests.

@JingsongLi
Copy link
Contributor

+1

@JingsongLi JingsongLi merged commit 9e16ae6 into apache:master Sep 25, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants