Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode decoding errors with qds-sdk #125

Closed
amelio-vazquez-reina opened this issue Oct 15, 2015 · 6 comments
Closed

Unicode decoding errors with qds-sdk #125

amelio-vazquez-reina opened this issue Oct 15, 2015 · 6 comments

Comments

@amelio-vazquez-reina
Copy link

Hi, I have started running into Unicode decoding errors recently. When running:

     delimiter=chr(9)
     hc_params = ['--query', query]
     hc_params += ['--tags', 'Team=opt']

     hive_args = HiveCommand.parse(hc_params)
      cmd = HiveCommand.run(**hive_args)
      if (HiveCommand.is_success(cmd.status)):
          with open(out_file, 'wt') as writer:
            cmd.get_results(writer, delim=delimiter, inline=False)

I ended up with:

  File "/some_path/log-index/logindex/ qubole_query.py", line 54, in run_query
    cmd.get_results(writer, delim=delimiter, inline=False)
  File "/some_path/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 206, in get_results
    _download_to_local(boto_conn, s3_path, fp, num_result_dir, delim=delim)
  File "/some_path/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 1179, in _download_to_local
    _read_iteratively(one_path, fp, delim=delim)
  File "/some_path/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 1071, in _read_iteratively
    fp.buffer.write(data.decode('utf-8').replace(chr(1), delim).encode('utf8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 8191: unexpected end of data

Here is the Job ID with the query mentioend above.

@amelio-vazquez-reina
Copy link
Author

This is all with Python 3.5.

@amelio-vazquez-reina
Copy link
Author

We run into the same exact error twice again last night in our team.

  • Here is my pip freeze

I have also verified I have the latest for qds_sdk:

$ pip install qds_sdk --upgrade
Requirement already up-to-date: qds-sdk in /some_path/anaconda/envs/py35/lib/python3.5/site-packages
Requirement already up-to-date: requests>=1.0.3 in /some_path/anaconda/envs/py35/lib/python3.5/site-packages (from qds-sdk)
Requirement already up-to-date: boto>=2.1.1 in /some_path/anaconda/envs/py35/lib/python3.5/site-packages (from qds-sdk)
Requirement already up-to-date: six>=1.2.0 in /some_path/anaconda/envs/py35/lib/python3.5/site-packages (from qds-sdk)
Requirement already up-to-date: urllib3>=1.0.2 in /some_path/anaconda/envs/py35/lib/python3.5/site-packages (from qds-sdk)

The two jobs last night that run into this problem:

@rohitagarwal003
Copy link
Contributor

HI @amelio-vazquez-reina

When you specify a delimiter, we need to decode the bytes fetched from s3 and replace the existing delimiter in there with the delimiter you want. We decode the bytes assuming 'utf-8' encoding.

From the error message:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 8191: unexpected end of data

It looks like the encoding of the result is not in 'utf-8'. Did anything changed recently at your end which changed the encoding from utf-8 to something else. Or are these new types of queries which encode result in some other encoding and don't use utf-8?

One workaround I can thing of is to not specify the delimiter while calling get_results. And then read the file and replace the delimiter yourself by using the encoding in which the results are.

@amelio-vazquez-reina
Copy link
Author

Thanks @mindprince I tried not using the delimeter option, but am now running into a different problem

That is, using:

 if (HiveCommand.is_success(cmd.status)):
    with open(out_file, 'wt') as writer:
    cmd.get_results(writer, inline=False)

I run into the following problem:

Traceback (most recent call last):
  File "/avazquez/code/repositories/optimization-sandbox/utils/py_packages/qubole/query.py", line 54, in run_query
    cmd.get_results(writer, inline=False)
  File "/avazquez/code/bin/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 208, in get_results
    _download_to_local(boto_conn, s3_path, fp, num_result_dir, delim=delim)
  File "/avazquez/code/bin/anaconda/envs/py35/lib/python3.5/site-packages/qds_sdk/commands.py", line 1157, in _download_to_local
    key_instance.get_contents_to_file(fp)  # cb=_callback
  File "/avazquez/code/bin/anaconda/envs/py35/lib/python3.5/site-packages/boto/s3/key.py", line 1650, in get_contents_to_file
    response_headers=response_headers)
  File "/avazquez/code/bin/anaconda/envs/py35/lib/python3.5/site-packages/boto/s3/key.py", line 1482, in get_file
    query_args=None)
  File "/avazquez/code/bin/anaconda/envs/py35/lib/python3.5/site-packages/boto/s3/key.py", line 1536, in _get_file_internal
    fp.write(bytes)
TypeError: write() argument must be str, not bytes

Some example queries that produced this error:

They all completed correctly, the problem was in downloading the data.

@rohitagarwal003
Copy link
Contributor

@amelio-vazquez-reina

Try,

with open(out_file, 'wb') as writer:
    cmd.get_results(writer, inline=False)

https://github.com/qubole/qds-sdk-py/blob/v1.7.0/qds_sdk/commands.py#L201-L205

@rohitagarwal003
Copy link
Contributor

Hi @amelio-vazquez-reina,

Please close this issue if the above solution works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants