-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems parsing cvs with æ,ø and å characters #65
Comments
This seems to work for me (although I admit I have not pushed all of my changes yet). Can you provide an example of how this fails for you? |
Hi @brwnx, I have a unit test in place to test for this, but it appears to be passing. Can you provide more information about how you're seeing this fail? |
I think I had the same problem. When I have special characters in names the parser stops at that character for the line. The CSV file I had came from exporting from Excel. However, I believe it is Excel that is failing to export UTF-8 characters correctly. eg: should have been: So, the fault was with the file Excel created when I used Save As ... CSV. |
@skyvalleystudio both of those strings parse correctly with the latest release of the parser. |
I tried with the July version and still had the problem (first on the line with bib 148). My test file is here: https://drive.google.com/file/d/0B7DnwOciz86uWWk0UDNXV1IteXM/edit?usp=sharing Download with: I still think Excel is not really saving in unicode. |
Thanks @skyvalleystudio, I'll start working on it. Is this CSV file something that I could check into the repository as part of the unit tests? |
Feel free to use the file. I wish I understood character sets better right about now... I work around the problem by exporting to UTF-16 .txt in Excel. Then replacing Tab with Comma and renaming the file. The result imports fine with your parser. |
It's a file encoding problem. It's coming across the You could work around this by explicitly specifying a different encoding for the file, but I'll try and figure out what the parser is supposed to do. |
Any progress with this? I have same problem, just realised I created a duplicate issue report :/ Tried forcing different encodings to parser, none helped. Have no control over actual file, have to use it as given. Don't care how long parsing takes, so would be happy to modify each row in my own code before parser sees it. |
[csvString CSVComponents]; fails when values contain special characters, like æ,ø and å
Thanks
The text was updated successfully, but these errors were encountered: