You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think encoding is the sort of thing that you need to know upfront to read the file properly. Reading it from the header might be an option, but requires assuming things about the header, already and I think the premise of PyCSVY is not to make any assumption. As Alex suggest, if we want to consider it, it will be a separate PR for sure.
I do agree with you, but just to add that people do put the encoding in actual documents sometimes (e.g. you see it in HTML etc.). It's kinda weird, but it works fine if a) the encoding is ASCII-compatible (like cp1252 or UTF-8) and b) you don't use any non-ASCII chars before the bit where you say what encoding the document uses. Definitely a job for another day though, if indeed we bother at all. (Who wouldn't want to use UTF-8 these days anyway?)
Maybe as a validation option, if the user asks for UTF-8 and the header says a different one, raising an error, but I'd not use it as the main source of information about how to load the file.
I do agree with you, but just to add that people do put the encoding in actual documents sometimes (e.g. you see it in HTML etc.). It's kinda weird, but it works fine if a) the encoding is ASCII-compatible (like
cp1252
or UTF-8) and b) you don't use any non-ASCII chars before the bit where you say what encoding the document uses. Definitely a job for another day though, if indeed we bother at all. (Who wouldn't want to use UTF-8 these days anyway?)Originally posted by @alexdewar in #124 (comment)
The text was updated successfully, but these errors were encountered: