-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check encode type for doanwload task #269
base: main
Are you sure you want to change the base?
Check encode type for doanwload task #269
Conversation
|
|
||
expectRequest('https://example.com').andRespond( | ||
200, | ||
'Hi', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will pass even if we haven’t encoding feature.
What do you think if we will use CP1251 and put that broken (when CP1251 was parsed as UTF-8) encoding text here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is too long review, I can do it myself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will check later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, if i just cope paste broken text (like �), it doesnt work correctly. Continue research
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try binary format for symbol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to get JS code for UTF-8→CP1251
convertation and put:
to1251('тест')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, what do you mean "Try binary format for symbol"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, what do you mean "Try binary format for symbol"?
core/download.ts
Outdated
function detectEncodeType(response: Partial<Response>): string { | ||
let headers = response.headers ?? new Headers() | ||
let contentType = headers.get('content-type')?.toLowerCase() ?? '' | ||
return contentType.match(/charset=([a-zA-Z0-9-]+)/)?.[1] ?? 'utf-8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return contentType.match(/charset=([a-zA-Z0-9-]+)/)?.[1] ?? 'utf-8' | |
return contentType.match(/charset=(\w+)/)?.[1] ?? 'utf-8' |
Can we do like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think, no. in this case we get utf
instead of utf-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You right. What about [\w-]+
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need _
Here is the full list of variants
https://www.iana.org/assignments/character-sets/character-sets.xhtml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better include all symbols except space? something like that?
.match(/charset=([^\s]+)/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.match(/;\s*charset=([^\s;]+)/)
Technically it could be like Content-Type: text/html;charset=utf-8;boundary=ExampleBoundaryString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you are right. thx
Fixes #268