Unicode Error #34

sjb554 · 2016-11-14T17:40:36Z

I am only getting this error once in a while, but it looks like this:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte

Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?

Thanks,
SJB

evidens · 2016-11-16T16:02:38Z

It sounds like the file in question might not be UTF8. You say, once in a while, are the sources different?
When a file is encoded improperly many text editors can detect the encoding and open them regardless. Some, like TextMate allow you to 'save as' to UTF8

On Mon, Nov 14, 2016 at 9:40 AM -0800, "sjb554" [email protected] wrote:

I am only getting this error once in a while, but it looks like this:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte

Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?

Thanks,

SJB

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

sjb554 · 2016-11-17T07:17:23Z

I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB).

I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on.

Thanks for all the great answers,
SJB

evidens · 2016-11-17T15:51:42Z

I would see how you're saving your files. If they're properly encoded in UTF8, it should support extended character sets (I'm pretty sure I've tested it with French input in the past))

On Wed, Nov 16, 2016 at 11:17 PM -0800, "sjb554" [email protected] wrote:

I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB).

I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on.

Thanks for all the great answers,

SJB

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

sjb554 · 2016-11-17T16:06:27Z

That makes sense. My download and save code is not very robust:

`
def save_json(url):
import os
filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','')
path = "C:/xxx/json"
fullpath = os.path.join(path, filename)
import urllib2
response = urllib2.urlopen(url)
webContent = response.read()
f = open(fullpath, 'w')
f.write(webContent)
f.close()

f = open('U:/xxx/url_list.txt')
p = f.read()
url_list = p.split('\n') #here's where \n is the line break delimiter that can be changed
for url in url_list:
save_json(url)
`

evidens · 2016-11-17T16:29:37Z

Use io.open like in this example http://stackoverflow.com/a/14870531 then the files are written out as utf-8.

On Thu, Nov 17, 2016 at 8:06 AM -0800, "sjb554" [email protected] wrote:

That makes sense. My download and save code is not very robust:

`

def save_json(url):

import os

filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','')

path = "C:/xxx/json"

fullpath = os.path.join(path, filename)

import urllib2

response = urllib2.urlopen(url)

webContent = response.read()

f = open(fullpath, 'w')

f.write(webContent)

f.close()

f = open('U:/xxx/url_list.txt')

p = f.read()

url_list = p.split('
') #here's where
is the line break delimiter that can be changed

for url in url_list:

save_json(url)

`

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode Error #34

Unicode Error #34

sjb554 commented Nov 14, 2016

evidens commented Nov 16, 2016

sjb554 commented Nov 17, 2016

evidens commented Nov 17, 2016

sjb554 commented Nov 17, 2016

evidens commented Nov 17, 2016

Unicode Error #34

Unicode Error #34

Comments

sjb554 commented Nov 14, 2016

evidens commented Nov 16, 2016

sjb554 commented Nov 17, 2016

evidens commented Nov 17, 2016

sjb554 commented Nov 17, 2016

evidens commented Nov 17, 2016