Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix escape character #31

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

rabidaudio
Copy link

@rabidaudio rabidaudio commented Aug 13, 2021

We got an error like this:

target-snowflake | snowflake.connector.errors.ProgrammingError: 100038 (22018): Numeric value '' is not recognized
target-snowflake |   File '__tmp/warehouseTMP_5418BACF_B9BC_4D7D_B782_494D1B1517D9__efe6f22145d64b83918a750d4ea26768', line 23006, character 2017
target-snowflake |   Row 22932, column "TMP_5418BACF_B9BC_4D7D_B782_494D1B1517D9"["ORDERTOTAL":38]
target-snowflake |   If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.

It turns out the column/character references were misleading. What actually happened was an earlier column in the row had the literal value \ for a string field. This caused the CSV to serialize as:

123,\,\\N,whatever,123,,\\N

Snowflake interprets the second column here as escaping the comma, offsetting all the columns by one.

Python csv didn't escape or quote the value because it is using the default excel dialect, which has no escape character.

The ideal solution would be to tell Python csv that \ is an escape character (csv.DictWriter(out, csv_headers, escapechar='\\'). This causes the field to be quoted. However it also causes \\N to be quoted, breaking the null columns.

The best solution I could come up with was to tell it to use \ as an escape character and use escaping rather than quoting (quoting=csv.QUOTE_NONE). I don't know if this will have any unintended side effects. It does mean FIELD_OPTIONALLY_ENCLOSED_BY = '"' is no longer necessary as it will never happen; fields with commas/newlines will now be escaped instead of quoted. I'm open to other solutions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant