-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode Error #34
Comments
It sounds like the file in question might not be UTF8. You say, once in a while, are the sources different? On Mon, Nov 14, 2016 at 9:40 AM -0800, "sjb554" [email protected] wrote: I am only getting this error once in a while, but it looks like this: UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here? Thanks, SJB — |
I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB). I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on. Thanks for all the great answers, |
I would see how you're saving your files. If they're properly encoded in UTF8, it should support extended character sets (I'm pretty sure I've tested it with French input in the past)) On Wed, Nov 16, 2016 at 11:17 PM -0800, "sjb554" [email protected] wrote: I discovered it was only letters such as 'Í' and 'Ó'. My files were large, but none so large that I couldn't manually go in and replace them (largest being a little over 1 GB). I am thinking that if it comes up more frequently, I would need to make some kind of scrubbing program to change 'Ó' to 'O' and so on. Thanks for all the great answers, SJB — |
That makes sense. My download and save code is not very robust: ` f = open('U:/xxx/url_list.txt') |
Use On Thu, Nov 17, 2016 at 8:06 AM -0800, "sjb554" [email protected] wrote: That makes sense. My download and save code is not very robust: ` def save_json(url): import os filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('Json','.json').replace('|','').replace('?','').replace('=','').replace('&','').replace('_','').replace('-','') path = "C:/xxx/json" fullpath = os.path.join(path, filename) import urllib2 response = urllib2.urlopen(url) webContent = response.read() f = open(fullpath, 'w') f.write(webContent) f.close() f = open('U:/xxx/url_list.txt') p = f.read() url_list = p.split(' for url in url_list: save_json(url) ` — |
I am only getting this error once in a while, but it looks like this:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcd in position 7: invalid continuation byte
Can this be solved by changing the requirements.txt file? Or, is some other solution appropriate here?
Thanks,
SJB
The text was updated successfully, but these errors were encountered: