Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

processes metedata continuous spaces #314

Merged
merged 1 commit into from
Sep 3, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pypicloud/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def normalize_metadata_value(value: Union[str, bytes]) -> str:
value = value.decode("utf-8")
if isinstance(value, str):
value = "".join(c for c in unicodedata.normalize("NFKD", value) if ord(c) < 128)
return value
return re.sub(r"\s+", " ", value)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will replace all whitespace characters with " ". Could we instead do

re.sub(r"  +", " ", value)

to only replace instances of two-or-more spaces with a single space?

Copy link
Contributor Author

@QSummerY QSummerY Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the presence of other whitespace characters, the same problem would occur if \t existed, and the presence of \n and \r would succeed, but the characters after the whitespace were removed.
And observing that only X-AMZ-meta-summary affects the request, the values processed include the following.

 'x-amz-meta-hash-sha256': '71eccb33ac8b2584c86f36f8ebfbe72c0a98022b576f411ab05773343e4e2cac', 
 'x-amz-meta-hash-md5': 'f562366338d015df143c7afacabdee44', 
 'x-amz-meta-summary': ' Python3\tweb , WSGI, Web ', 
 'x-amz-meta-name': 'frontgate', 
 'x-amz-meta-version': '0.6.39'

So I think it's going to be implemented the same way as before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the COS of Tencent Cloud.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, last request:
This is going to replace all kinds of whitespace with spaces. It would be better if we could preserve the original whitespace if it's not causing issues. I believe this should do it

re.sub(r'(\s)\s+', r'\1', value)

Copy link
Contributor Author

@QSummerY QSummerY Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I have tried to only exist /t will also appear, and if you keep the original, it will cause the same problem. So I think it should look like this

return re.sub(r"\s+", " ", value)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow COS really doesn't like whitespace, huh? I guess this is fine. The worst that we expect is some mangling of the summary formatting.



def normalize_metadata(metadata: Dict[str, Union[str, bytes]]) -> Dict[str, str]:
Expand Down