Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 file names decoding/encoding issue #7

Open
edouardhue opened this issue Oct 1, 2014 · 4 comments
Open

UTF-8 file names decoding/encoding issue #7

edouardhue opened this issue Oct 1, 2014 · 4 comments
Labels

Comments

@edouardhue
Copy link
Member

With @symac's patched CommonsDownloader version of October 1st (see mail discussions), resulting file names on the filesystem are not properly encoded : two bytes UTF-8 chars (like é), that are properly encoded in the file list (Abbaye Saint-Pierre de Marcilhac-sur-Célé - Eglise.JPG,99999, get translated to two one byte chars, like in Abbaye_Saint-Pierre_de_Marcilhac-sur-Célé_-_Eglise. On the console, output is misencoded too, but not in the same way : Downloading Abbaye_Saint-Pierre_de_Marcilhac-sur-C├®l├®_-_Eglise.JPG.

Running with Python 2.7.8 in PowerShell under Windows 8.1 Pro N.

@JeanFred
Copy link
Member

JeanFred commented Oct 3, 2014

Quick answer on one point:

With @symac's patched CommonsDownloader version of October 1st (see mail discussions)

@symac edits were merged in #5 and #6

@JeanFred
Copy link
Member

JeanFred commented Oct 3, 2014

file names on the filesystem are not properly encoded : two bytes UTF-8 chars (like é), that are properly encoded in the file list (Abbaye Saint-Pierre de Marcilhac-sur-Célé - Eglise.JPG,99999, get translated to two one byte chars, like in Abbaye_Saint-Pierre_de_Marcilhac-sur-Célé_-_Eglise

Can’t reproduce with Python 2.7.3 under Ubuntu 12.04 Needs further investigation for Windows

@JeanFred
Copy link
Member

JeanFred commented Oct 3, 2014

On the console, output is misencoded too, but not in the same way : Downloading Abbaye_Saint-Pierre_de_Marcilhac-sur-C├®l├®_-_Eglise.JPG

Can’t reproduce either. Would not that rather be an issue with your Terminal which does not properly support Unicode @edouardhue ?

@edouardhue
Copy link
Member Author

Garbage on the console is probably a terminal issue: it is the same with [http://github.com/edouardhue/commons-downloader]. But I don't have any issue with file names in the filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants