Include encoding information in the header #163

dalonsoa · 2024-12-13T06:35:17Z

I think encoding is the sort of thing that you need to know upfront to read the file properly. Reading it from the header might be an option, but requires assuming things about the header, already and I think the premise of PyCSVY is not to make any assumption. As Alex suggest, if we want to consider it, it will be a separate PR for sure.

I do agree with you, but just to add that people do put the encoding in actual documents sometimes (e.g. you see it in HTML etc.). It's kinda weird, but it works fine if a) the encoding is ASCII-compatible (like cp1252 or UTF-8) and b) you don't use any non-ASCII chars before the bit where you say what encoding the document uses. Definitely a job for another day though, if indeed we bother at all. (Who wouldn't want to use UTF-8 these days anyway?)

Originally posted by @alexdewar in #124 (comment)

The text was updated successfully, but these errors were encountered:

dalonsoa · 2024-12-13T06:53:13Z

Maybe as a validation option, if the user asks for UTF-8 and the header says a different one, raising an error, but I'd not use it as the main source of information about how to load the file.

dalonsoa mentioned this issue Dec 13, 2024

Set UTF-8 as default encoding when reading and writing #124

Merged

dalonsoa added the question Further information is requested label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include encoding information in the header #163

Include encoding information in the header #163

dalonsoa commented Dec 13, 2024 •

edited

Loading

dalonsoa commented Dec 13, 2024

Include encoding information in the header #163

Include encoding information in the header #163

Comments

dalonsoa commented Dec 13, 2024 • edited Loading

dalonsoa commented Dec 13, 2024

dalonsoa commented Dec 13, 2024 •

edited

Loading