You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed the example from your documentation to convert my docx file to html:
from pydocx import PyDocX
# Pass in a path
html = PyDocX.to_html('file.docx')
# Pass in a file object
html = PyDocX.to_html(open('file.docx', 'rb'))
# Pass in a file-like object
from cStringIO import StringIO
buf = StringIO()
with open('file.docx') as f:
buf.write(f.read())
html = PyDocX.to_html(buf)
As I am using Python 3.6 I changed cStringIO to io. However I always have the same issue with my .docx file at the line buf.write(f.read())
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-24-598c617210d8> in <module>()
10 buf = StringIO()
11 with open('file.docx') as f:
---> 12 buf.write(f.read())
13 html = PyDocX.to_html(buf)
~/anaconda3/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 14: invalid start byte
It is the case with all the .docx files I tried. Does anybody can suggest what is wrong ?
The text was updated successfully, but these errors were encountered:
Hello,
I followed the example from your documentation to convert my docx file to html:
As I am using Python 3.6 I changed cStringIO to io. However I always have the same issue with my .docx file at the line buf.write(f.read())
It is the case with all the .docx files I tried. Does anybody can suggest what is wrong ?
The text was updated successfully, but these errors were encountered: