Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include encoding information in the header #163

Open
dalonsoa opened this issue Dec 13, 2024 · 1 comment
Open

Include encoding information in the header #163

dalonsoa opened this issue Dec 13, 2024 · 1 comment
Labels
question Further information is requested

Comments

@dalonsoa
Copy link
Collaborator

dalonsoa commented Dec 13, 2024

I think encoding is the sort of thing that you need to know upfront to read the file properly. Reading it from the header might be an option, but requires assuming things about the header, already and I think the premise of PyCSVY is not to make any assumption. As Alex suggest, if we want to consider it, it will be a separate PR for sure.

I do agree with you, but just to add that people do put the encoding in actual documents sometimes (e.g. you see it in HTML etc.). It's kinda weird, but it works fine if a) the encoding is ASCII-compatible (like cp1252 or UTF-8) and b) you don't use any non-ASCII chars before the bit where you say what encoding the document uses. Definitely a job for another day though, if indeed we bother at all. (Who wouldn't want to use UTF-8 these days anyway?)

Originally posted by @alexdewar in #124 (comment)

@dalonsoa
Copy link
Collaborator Author

Maybe as a validation option, if the user asks for UTF-8 and the header says a different one, raising an error, but I'd not use it as the main source of information about how to load the file.

@dalonsoa dalonsoa added the question Further information is requested label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant