Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep running into utf-8 errors in Python 3.x #175

Closed
amelio-vazquez-reina opened this issue Sep 28, 2016 · 4 comments
Closed

Keep running into utf-8 errors in Python 3.x #175

amelio-vazquez-reina opened this issue Sep 28, 2016 · 4 comments

Comments

@amelio-vazquez-reina
Copy link

qds.py --token=$QUBOLE_KEY_AVAZQUEZ hivecmd getresult "37309793" > my_output

Fails with:

Traceback (most recent call last):
  File "/Users/amelio/anaconda/envs/py35/bin/qds.py", line 604, in <module>
    sys.exit(main())
  File "/Users/amelio/anaconda/envs/py35/bin/qds.py", line 557, in main
    return cmdmain(a0, args)
  File "/Users/amelio/anaconda/envs/py35/bin/qds.py", line 195, in cmdmain
    return globals()[action + "action"](cmdclass, args)
  File "/Users/amelio/anaconda/envs/py35/bin/qds.py", line 161, in getresultaction
    return _getresult(cmdclass, cmd)
  File "/Users/amelio/anaconda/envs/py35/bin/qds.py", line 119, in _getresult
    cmd.get_results(sys.stdout, delim='\t')
  File "/Users/amelio/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 245, in get_results
    skip_data_avail_check=isinstance(self, PrestoCommand))
  File "/Users/amelio/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 1345, in _download_to_local
    _read_iteratively(one_path, fp, delim=delim)
  File "/Users/amelio/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 1236, in _read_iteratively
    fp.buffer.write(data.decode('utf-8').replace(chr(1), delim).encode('utf8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 8191: unexpected end of data
$ pip freeze | grep qds
qds-sdk==1.9.4
$ python --version
Python 3.5.2 :: Continuum Analytics, Inc.
@amelio-vazquez-reina
Copy link
Author

And the above works without problem in Python 2.7.12

mcarlsen pushed a commit to mcarlsen/qds-sdk-py that referenced this issue Mar 28, 2017
@mcarlsen
Copy link
Contributor

mcarlsen commented Mar 28, 2017

Because of the 8192 bytes read block size, a utf-8 character can possibly be cut in two, causing the block to be invalid utf-8.

Can be fixed by not decoding the block, but instead encode the delimiter and do the replace operation with bytes instead of str.

See commit above.

@amelio-vazquez-reina
Copy link
Author

@mcarlsen did you try opening a PR?

@mcarlsen
Copy link
Contributor

mcarlsen commented May 24, 2017

Sorry, never got that far. I have now started #208

@amelio-vazquez-reina amelio-vazquez-reina changed the title Keep running into utf-8 errors in Python 3.5 Keep running into utf-8 errors in Python 3.x May 24, 2017
@msumit msumit closed this as completed in 3648855 Mar 19, 2019
chattarajoy pushed a commit that referenced this issue May 14, 2019
Caused by block reads chopping multibyte utf-8 sequences in half.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants