Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

By default match fields by position #80

Merged
merged 2 commits into from
Nov 28, 2024
Merged

Conversation

aykut-bozkurt
Copy link
Collaborator

@aykut-bozkurt aykut-bozkurt commented Nov 26, 2024

We add an option for COPY FROM called match_by [position(default) | name] which determines the match method for the Postgres table columns and Parquet file fields. By default, the match method is position. Match by name is useful when field order differs between the Parquet file and the table, but their names match.

!!IMPORTANT!!: This is a breaking change. Before the PR, we match always by name. This is a bit strict and not common way to match schemas. (e.g. COPY FROM csv at postgres or COPY FROM of duckdb match by field position by default) This is why we match by position by default now and have a COPY FROM option match_by that can be set to name for the old behavior.

Closes #39.

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.13%. Comparing base (fbaeadb) to head (9541643).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #80      +/-   ##
==========================================
+ Coverage   92.01%   92.13%   +0.11%     
==========================================
  Files          70       71       +1     
  Lines        8879     9009     +130     
==========================================
+ Hits         8170     8300     +130     
  Misses        709      709              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/match_by_position branch 2 times, most recently from ed4907a to e6b10bb Compare November 27, 2024 15:59
We add an option for `COPY FROM` called `match_by_name` which matches Parquet file fields to PostgreSQL table columns
`by their names` rather than `by their order` in the schema. By default, the option is `false`. The option is useful
when field order differs between the Parquet file and the table, but their names match.

**!!IMPORTANT!!**: This is a breaking change. Before the PR, we match always by name. This is a bit strict and not common
way to match schemas. (e.g. COPY FROM csv at postgres or COPY FROM of duckdb match by field position by default)
This is why we match by position by default and have a COPY FROM option `match_by_name` that can be set to true
for the old behaviour.

Closes #39.
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/match_by_position branch from e6b10bb to 4241cae Compare November 27, 2024 16:09
@aykut-bozkurt aykut-bozkurt added the api-change includes breaking changes label Nov 27, 2024
@aykut-bozkurt aykut-bozkurt changed the title Match fields by position via option Match fields by position by default Nov 27, 2024
@aykut-bozkurt aykut-bozkurt changed the title Match fields by position by default By default match fields by position Nov 27, 2024
Copy link
Collaborator

@marcoslot marcoslot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

  • could consider a GUC with the default match_by method
  • maybe add a table to the readme with important changes

@aykut-bozkurt
Copy link
Collaborator Author

Looks good

  • could consider a GUC with the default match_by method
  • maybe add a table to the readme with important changes

lets move it to breaking changes section when we release.

@aykut-bozkurt aykut-bozkurt merged commit fb34b7d into main Nov 28, 2024
6 checks passed
@aykut-bozkurt aykut-bozkurt deleted the aykut/match_by_position branch November 28, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change includes breaking changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Column name checks are too strict
2 participants