Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check encode type for doanwload task #269

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

DisaPadla
Copy link

Fixes #268

core/download.ts Outdated Show resolved Hide resolved
Copy link
Contributor

Pull request auto-reviewer

  • Think about code testing.

  • Think about making types more precise. Can you better explain data relations by type?

@github-actions github-actions bot temporarily deployed to preview-269 October 20, 2024 12:55 Destroyed
@github-actions github-actions bot temporarily deployed to preview-269 October 20, 2024 13:31 Destroyed
@github-actions github-actions bot temporarily deployed to preview-269 October 20, 2024 14:34 Destroyed

expectRequest('https://example.com').andRespond(
200,
'Hi',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will pass even if we haven’t encoding feature.

What do you think if we will use CP1251 and put that broken (when CP1251 was parsed as UTF-8) encoding text here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is too long review, I can do it myself

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will check later

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, if i just cope paste broken text (like �), it doesnt work correctly. Continue research

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try binary format for symbol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to get JS code for UTF-8→CP1251 convertation and put:

to1251('тест')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what do you mean "Try binary format for symbol"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, what do you mean "Try binary format for symbol"?

String.fromCodePoint

core/download.ts Outdated Show resolved Hide resolved
core/download.ts Outdated
function detectEncodeType(response: Partial<Response>): string {
let headers = response.headers ?? new Headers()
let contentType = headers.get('content-type')?.toLowerCase() ?? ''
return contentType.match(/charset=([a-zA-Z0-9-]+)/)?.[1] ?? 'utf-8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return contentType.match(/charset=([a-zA-Z0-9-]+)/)?.[1] ?? 'utf-8'
return contentType.match(/charset=(\w+)/)?.[1] ?? 'utf-8'

Can we do like this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think, no. in this case we get utf instead of utf-8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right. What about [\w-]+?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need _

Here is the full list of variants
https://www.iana.org/assignments/character-sets/character-sets.xhtml

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better include all symbols except space? something like that?
.match(/charset=([^\s]+)/)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.match(/;\s*charset=([^\s;]+)/)

Technically it could be like Content-Type: text/html;charset=utf-8;boundary=ExampleBoundaryString

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you are right. thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unpopular encoding
2 participants