Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySvn decodes unicode in log messages incorrectly on Windows #154

Open
GardenTools opened this issue May 1, 2020 · 0 comments
Open

PySvn decodes unicode in log messages incorrectly on Windows #154

GardenTools opened this issue May 1, 2020 · 0 comments

Comments

@GardenTools
Copy link

PySvn incorrectly decodes the text of log messages on windows, either resulting in junk characters or an exception.

Example, create an svn commit with the content “some words” the quotes area U+201C and a U+201D (RIGHT DOUBLE QUOTATION MARK).

The result is an exception:

Traceback (most recent call last):
File "F:\Program Files\Python37\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "F:\Program Files\Python37\lib\subprocess.py", line 1238, in _readerthread
buffer.append(fh.read())
File "F:\Program Files\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 161: character maps to
Exception in thread Thread-13:
Traceback (most recent call last):
File "F:\Program Files\Python37\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "F:\Program Files\Python37\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "F:\Program Files\Python37\lib\subprocess.py", line 1238, in _readerthread
buffer.append(fh.read())
File "F:\Program Files\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 161: character maps to

Traceback (most recent call last):
File "F:\Program Files\Python37\lib\subprocess.py", line 939, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "F:\Program Files\Python37\lib\subprocess.py", line 1288, in _communicate
stdout = stdout[0]
IndexError: list index out of range

In the call to subprocess.Popen() there is the following
self.stdout = io.open(c2pread, 'rb', bufsize) if self.text_mode: self.stdout = io.TextIOWrapper(self.stdout, encoding=encoding, errors=errors)

Normally here for PySvn text_mode is True and encoding is None, this results in a call to getpreferredencoding() which returns the system encoding (for me this is 'cp1252' ). Note that this is not the same as sys.getdefaultencoding(), which is "utf-8". communicate() then returns the text from svn decoded using cp1252 and not utf-8. The byte sequence is b'\xe2\x80\x9d' and x\9d is an invalid character in cp1252.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant