-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Url: optimized unescape() performance #32
Conversation
5692048
to
60209fc
Compare
Could you also put those cases from the description in the test file? |
You mean the ugly long ones? Another thing to consider: for ($i = 1; $i < count($parts); $i += 2) {
$parts[$i] = strtoupper($parts[$i]);
} The performance impact if we choose to do this:
|
Yeah, I think the extreme cases should also be there. It's still a win :) Much better than before |
Btw. I still don't understand the time complexity of those implementations. If I change in the It also means that if your web server does not refuse 600KB long URL, you are currently pretty much screwed. Edit: 1e5 * 2 takes over 120s and a lot over of memory as well (I had to increase the limit from 128M to 512M). Note that my implementation (split) is still vulnerable to I though about it a bit more and the memory could be optimized by using if ($reserved === '') {
$s = rawurldecode($s);
} else {
$pattern = '#((?:%(?:' . implode('|', str_split(bin2hex($reserved), 2)) . '))++)#i';
$res = '';
do {
$parts = preg_split($pattern, $s, 1024, PREG_SPLIT_DELIM_CAPTURE);
$s = isset($parts[2046]) ? $parts[2046] : '';
unset($parts[2046]);
for ($j = 0; $j < count($parts); $j += 2) {
$parts[$j] = rawurldecode($parts[$j]);
}
$res .= implode('', $parts);
} while ($s !== '');
$s = $res;
} This makes it for the 2MB worst-case input ( I'll continue talking to myself. I don't think that we should ever exceed 1 second runtime and 10 MB memory usage on this. Therefore the hard limit on URL length should be 256 KiB (which takes 71 ms and 10 MB of memory). |
9426286
to
cab8785
Compare
What I love about improving performance is that after you finished one fast implementation – you got an idea how to make it even faster for ($i = 0; $i < $testCount; $i++) {
if ($reserved === '') {
$s = rawurldecode($s);
} else {
$pattern = '#(?:%(?!' . implode('|', str_split(bin2hex($reserved), 2)) . ')[0-9a-f][0-9a-f])++#i';
$s = preg_replace_callback(
$pattern,
function ($m) { return rawurldecode($m[0]); },
$s
);
}
} |
I don't think it should be Nette's responsibility to limit the URL length. All of the major webservers (apache, nginx, IIS) have by default a limit on URL length, all in the range of couple of kilobytes, therefore your suggested hard limit of 256 KiB is much higher than the defaults. And I assume that if someone changes these defaults then he knows what he's doing. |
cab8785
to
f4c06de
Compare
That is intentionally, this limit should be never reached. The question is whether we should rely server configuration or whether Nette should enforce its own limit (large enough to not affect any non-dos requests) as a security fallback. |
f4c06de
to
8805236
Compare
8805236
to
b05aa50
Compare
If I'm not mistaken, making it 256KiB means having the limit ~128x larger than what a browser can handle. I have no problem with such a limit. |
@fprochazka See http://technomanor.wordpress.com/2012/04/03/maximum-url-size/, every browser except for IE can handle URL with 100 KiB in size. |
@JanTvrdik good to know, In that case, making it 256KiB is still a good limit. I would make it a static property (I can't believe I'm suggesting this) or an argument in that function, so it can be changed if neccesary. |
@fprochazka If we choose to not trust the web servers and enforce the limit, it should certainly be in Therefore this can be merged regardless of whether we choose to impose a limit on the URL length. cc @dg |
In that case, this optimalization should be solved by it's own and merged. And new issue for the limit should be created. And after the issues we're discussing in #30 are resolved, then we can add the limit. |
Maybe micro-optimization for empty |
Its not so much a micro-optimization as that the code can not handle empty Regarding framework currently not using empty |
I didn't realize that it will not work with empty What about unify (It can be maybe done by replacing all reserved chars |
Regarding unification – That could be easily done with the The only way I currently see with It could be done with a single call but it is still slow: $xx = implode('|', str_split(bin2hex($reserved), 2));
$actual = preg_replace_callback(
'#(?:(?:%(?!' . $xx . ')[0-9a-f]{2})++)|((?:%(?:' . $xx . '))++)#i',
function ($m) {
return !empty($m[1]) ? strtoupper($m[0]) : rawurldecode($m[0]);
},
$s
); |
@JanTvrdik this way (#32 (comment)) if ($reserved !== '') {
$s = preg_replace_callback(
'#%(' . substr(chunk_split(bin2hex($reserved), 2, '|'), 0, -1) . ')#i',
function($m) { return '%25' . strtoupper($m[1]); },
$s
);
}
$s = rawurldecode($s); |
Good job! I didn't thought about double escaping. |
Currently unescape() can be used to perform DoS on Nette application if HTTP server does not restrict URL length.
Benchmark
Results: