-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When returning an empty JSON, '{}' turns into a Chinese character #111
Comments
I am not familiar with 'pack', but if the problem seems specific to that tool, perhaps that tool is misunderstanding the textual encoding of the response? If that were the case though, it seems like it ought to misunderstand the encoding regardless of response content. It wouldn't be specific to "empty" responses. |
@chisholm by |
I'm not sure what you mean by taxii-client "returning" something. It's a library with classes and methods, and some methods do return things. It's not clear where the line you quoted came from (looks like a line of logging?). Can you provide a small code sample to reproduce the error? I tried my own experiment which would produce an empty result, where I enabled a simple logging config to see what logging would get printed out, to compare to your output. It was run against the Medallion server: import logging
import taxii2client
logging.basicConfig(level="DEBUG")
coll = taxii2client.Collection(
"http://127.0.0.1:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/",
user="(user)", password="(password)"
)
envelope = coll.get_objects(
type="foo"
)
print(envelope) Notice I had to add my own I got:
The first four lines are logging output; the last line is my print statement. It shows the use of the taxii-client API to send a request to a TAXII server. There is no Chinese in the output. |
Thank you for the reply, I'll try to recreate it again and update (I cant use the same server where we first saw it as we don't have creds to use it). Maybe its something with CISA server that causes the weird character. |
Hi @chisholm, I am working with @Ni-Knight and wanted to share what we did.
When using a code that works the same as the code you added, we are getting an def v21_get_objects(self, accept="application/taxii+json;version=2.1", **filter_kwargs):
collection = self.collection_to_fetch
collection._verify_can_read()
query_params = _filter_kwargs_to_query_params(filter_kwargs)
merged_headers = collection._conn._merge_headers({"Accept": accept, "Content-Type": "application/taxii+json"})
resp = collection._conn.session.get(collection.objects_url, headers=merged_headers, params=query_params)
print(f'GOT RESPONSE {resp.content=} {resp.text=} {resp.status_code=} {resp.headers=}')
if len(resp.text) <= len('{}'): # in case it is not a json that has to have {}
return {}
return _to_json(resp) We tried to reproduce it on another server, but it returns |
Using your code, slightly modified as: def v21_get_objects(collection, accept="application/taxii+json;version=2.1", **filter_kwargs):
collection._verify_can_read()
query_params = _filter_kwargs_to_query_params(filter_kwargs)
merged_headers = collection._conn._merge_headers({"Accept": accept, "Content-Type": "application/taxii+json"})
resp = collection._conn.session.get(collection.objects_url, headers=merged_headers, params=query_params)
print(f'GOT RESPONSE {resp.content=} {resp.text=} {resp.status_code=} {resp.headers=} {resp.encoding=}')
if len(resp.text) <= len('{}'): # in case it is not a json that has to have {}
return {}
return _to_json(resp)
coll = taxii2client.Collection(
"http://127.0.0.1:5000/trustgroup1/collections/91a7b528-80eb-42ed-a74d-c6fbd5a26116/",
user="(user)", password="(password)"
)
v21_get_objects(coll, type="foo") Run against the Medallion server, I get as output:
Again, you can see there is no Chinese. The (Chinese) text you see comes from the The TAXII 2.1 spec looks to require implementers to use UTF-8. |
We added the encoding, and it also shows
Weird that in your case it guesses right, and in ours it guesses Chinese. |
Checking the requests implementation, looks like if
And that shows "ascii" for me. Maybe that will show a Chinese encoding for you. |
When we add printing of |
The bug is in the TAXII server, if it is not setting the response encoding to UTF-8. |
Well, Looks like by default, requests uses charset_normalizer to detect encodings. It calls a detect() method, but that is a legacy wrapper around from_bytes(). The latter has an interesting
|
What an odd bug :) I think we can try and ask them which server did they spin up. However you are right this is definitely not an issue with the client itself, It also seems like |
When receiving an empty response from a server the string '{}' is somewhere translated to Unicode so:
"{" = U+007B
"}" = U+007D
Those are somewhere concatenated to return:
"筽" = U+7B7D
To reproduce just send a query that returns an empty response from a TAXII server, curl and postman returns '{}' but taxii-client returns: '筽'.
For example:
2022-08-15T11:21:48.17891632Z info: (TAXII 2 Feed test_instance_1_TAXII 2 Feed test_taxii2-get-indicators) python logging: DEBUG [urllib3.connectionpool] - [https://ais2.cisa.dhs.gov:443](https://ais2.cisa.dhs.gov/) "GET /public/collections/---/objects/?limit=25&match%5Btype%5D=campaign HTTP/1.1" 200 2 2022-08-15T11:21:48.180585842Z debug: (TAXII 2 Feed test_instance_1_TAXII 2 Feed test_taxii2-get-indicators) GOT RESPONSE resp.content=b'{}' resp.text='筽' resp.status_code=200 resp.headers={'x-transaction-id': '124a663c-e7c5-48c0-a4ba-6fff95cab122', 'Strict-Transport-Security': 'max-age=31536000 ; includeSubDomains', 'Date': 'Mon, 15 Aug 2022 11:21:47 GMT', 'Keep-Alive': 'timeout=60', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Cache-Control': 'no-cache, no-store, max-age=0, must-revalidate', 'Pragma': 'no-cache', 'Expires': '0', 'X-Frame-Options': 'DENY', 'Content-Type': 'application/taxii+json;version=2.1', 'Content-Length': '2', 'Connection': 'keep-alive'}
The text was updated successfully, but these errors were encountered: