Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split does not do proper thing on lines with escaped newlines #228

Open
cliu587 opened this issue Mar 12, 2016 · 0 comments
Open

Split does not do proper thing on lines with escaped newlines #228

cliu587 opened this issue Mar 12, 2016 · 0 comments

Comments

@cliu587
Copy link
Contributor

cliu587 commented Mar 12, 2016

Just to keep track of this issue introduced in https://github.com/coursera/dataduct/pull/227/files
If you set the split property for an extract-rds step to be not the default value of 1, it will split improperly for rows with columns that have strings with newlines.

This is because we are using the split unix command, which cannot handle escaped newlines. I think it might be possible to fix this by transforming escaped newlines to a token character and then transforming it back after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant