-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance is now extremely slow on modern browsers #359
Comments
I'll take a closer look at the JSPerf later |
I added a new test case where I'm caching glMatrix's Float32Arrays, which greatly increases the performance, however vanilla JS objects are still faster: https://jsperf.com/webgl-math-library-comparisson/21 However, when using those rather limited test cases, glMatrix seems to outperform vanilla JS objects: https://jsperf.com/glmatrix-getclosesttonline/ Feel free to expand them with more libraries. I'd love it, if you could prove me wrong. A WebAssembly port of glMatrix is planned, however WebAssembly still has a few limitations. |
This is weird. You have any ideas on why the results in the first comparison are so different from the second? |
@stefnotch OK, I witnessed the previous performance of your sample with literal objects being slower. I updated the sample to use a small Vector class and the performance is back to what I witnessed in the other samples. Maybe Chrome/Firefox are more aggressively optimizing when they can see functions being called with instances sharing a consistent prototype. (I'm guessing that those object literal samples end up using a slow lookup dictionary internally in Chrome and the newer class instances are turned into efficient hidden classes and the function is optimized with them in mind) Vector classes are doing 235 K-ops/sec vs 78 optimized in gl-matrix. That's even when noting that the objects are pure functional and are doing 2 new allocations within that function (in the .subtract implementation). On Firefox the difference is even more significant. 1.2 million ops per second (which makes me think that it may be optimizing the whole function away. That's why I added some side effect to other samples.) https://jsperf.com/glmatrix-getclosesttonline/12 Please check it out and see if you can replicate my findings. |
I added back your older object test so we can compare the difference between using literal objects and using class instances and added Array as well which is also very fast. |
Thank you very much for taking your time to write those tests. I can replicate your findings. Apparently objects are now significantly faster than Float32Arrays. I guess as it stands, glMatrix will have to improve, either by using objects or by using WebAssembly. Since I currently don't have time to rewrite this library with objects and other libraries that use objects already exist, I suppose I'll wait for WebAssembly to become better. |
@stefnotch I actually wrote a C implementation AND a WebAssembly implementation of the add/cross/dot sample both with and without linear memory access (to make it more realistic) and my findings indicate that going to WebAssembly will NOT gain you any significant performance benefits (except in Safari which is an outlier with excellent Wasm, but bad everything else). Here is the document. I can send you my source code for the C and the Web Assembly and any of the other node versions if you like. https://docs.google.com/spreadsheets/d/1SQPU4OwV5QeA8_peuTk49PDvT_djVVrq1TGd-OvWtzA/edit?usp=sharing |
A quick improvement you could do or recommend to others is just to set the array type to use normal arrays and not Float32Arrays, only converting to Float32Arrays before assigning to a WebGL API that expects them. Other than that, not much can be done without being an entirely new API. If you do write one I recommend treating all objects as immutable (while not actually calling Object.freeze because that's super slow). Or not convert to Float32Array at all since setUniform seems fine with normal arrays. |
I recently did a similar evaluation, but rather approaching this from the memory consumption side instead of the performance perspective. My findings also point in the direction of using normal arrays instead of In short, In contrast I measured ~96 bytes for a normal array. |
Small update to the above, a class with x, y & z fields uses ~56 bytes, which also matches with other sources:
|
@karhu There is a lot of overhead for allocating a typed array. You would not want to create them for small objects as happens currently in gl-matrix. You STILL would want to use them though when you are allocating a very large array of data as it keeps it much more tightly packed than when using javascript objects. So, use idiomatic javascript classes for your vectors and matrixes but then store the results in a large typed buffer (which you are probably going to use as a webgl vertexbuffer) |
@krisnye Yes of course, that's how we mainly use them in our code base. Unfortunately they tend to lead to less idiomatic and maintainable code, so I started to investigate alternatives ;) Was just a bit surprised that the typed arrays had that much overhead and figured it might be useful information in the discussion here. |
Do you have any information about garbage collection? Creating many objects led to animation hickups when the garbage collector stepped in in older browser versions. Of course we could also reuse objects, but I would still like to know if the garbage collection of objects is as free as the allocation of those in modern browsers. |
I didn't do any testing of garbage collection in older browsers. As the title says, "performance is now extremely slow on modern browsers". I think most of us writing high performance graphics or games are primarily concerned about modern browsers. |
Agreed, but I was asking if "garbage collection of objects is as free as the allocation of those in modern browsers". The fact that allocation is faster does not imply that garbage collection is faster too. I am not asking about Typed Arrays vs. Objects allocation performance, but about your statement "all functions can return new instances instead of using awkward out parameters". The question is: If you do not reuse your Objects or Arrays or Typed Arrays the garbage collector has to clean them up. So even if allocation is super fast, in an asynchronous game loop every few seconds when the garbage collector kicks in this could create hickups - theoretically - at least it was like that in older browsers. Do you know if this is no problem anymore in new browsers if you create e.g. thousands of objects each frame for particle effects? |
The times I was seeing led me to suspect that they either optimized away the allocations entirely or else they were allocating the objects as if they were structs right on the stack. It's a good question though. When I get a chance I will re-run some of the tests over a longer period of time and inspect the memory usage. |
Just wanna throw in a small test regarding memory allocation: https://jsperf.com/typed-array-allocation-pool The fastest seems to be to pre-allocate a chunk of memory, then generate Float32Array sub-views on it and cache them. One problem here is, that once the pool gets OOM, all sub-views etc. have to be refreshed. I'm not sure if moving to normal arrays is good, as in order to upload them it would duplicate memory usage. AFAIK V8 can take the pointer of the raw memory an ArrayBuffer holds and send it directly to OpenGL. Edit: Using a pool allocator would allow using a class based interface, e.g.: class Vec3 {
constructor() {
this._memory = MemoryPool.allocatePortion(Vec3.byteLength);
}
get raw() {
return this._memory;
}
get x() {
return this._memory[0];
}
set x(v) {
this._memory[0] = v;
}
..
}
Vec3.byteLength = 3 * Float32Array.BYTES_PER_ELEMENT;
Vec3.prototype.add = function(b) {
let memA = this._memory;
let memB = b._memory;
memA[0] += memB[0];
memA[1] += memB[1];
memA[2] += memB[2];
};
let a = new Vec3();
let b = new Vec3();
a.add(b);
console.log(a.x);
console.log(a.raw); |
Another way to avoid the overhead of creating an arraybuffer every time is to write functions like this Line 716 in fbaca33
I wonder how this solution fares performance wise. |
@stefnotch that is the pattern I generally use when working with gl-matrix. The
They are intended to be pooled and reused to restrict allocations which will minimize garbage collection hits that will disrupt the frame rate.
This thread is confusing as its arguing that the gl-matrix performance is slow but actually pointing to allocation performance. The gl-matrix interface, though awkward, discourages from rapid and redundant allocations by using the |
@kevzettler is exactly right. A high frequency game/simulation loop that allocates a lot of mem absolute thrashes the garbage collector, whether you're talking about today's v8 or the one from 4 years ago. Ditto for objects, and proxy getters/setters. These are not "free" syntaxes. They carry very real performance costs. The web is filled with a lot of toy math libraries that don't understand this, and that's why a lot of games built on them suffer horrendous GC pauses. It's definitely a good question as to whether plain vs typed javascript arrays offer better peformance, but if we're really concerned about perf, making everything expensive objects is going to hurt, not help the perf. I don't want to simply trash the concept, I would definitely appreciate another performance pass. Ideally we would have some kind of vanilla js syntax for static allocations: function doStuff () {
static const mv = vec2.create() // non-existent feature in today's javascript
// ...
}
doStuff(); doStuff(); doStuff(); // despite being called thrice, mv would only allocate once But since this doesn't exist, some kind of built-in, optional pooling module would be nice (right now I have my own very naive but effective implementation here for vec2s: https://github.com/mreinstein/vec2-gap/blob/main/pool.js) |
Can you confirm that this is the case when allocating short lived objects? These Vectors etc aren't designed to be retained, they are just allocated for calculations and then discarded. I've ran more extensive tests, including examining the memory allocation and garbage collection and this pattern doesn't appear to put any significant stress on the garbage collector. They don't seem to make it past the first stage of garbage collection. I think it would be helpful if you can prove your assertions more definitively. I won't argue with good evidence, and I would like to know either way. |
I made a very simple benchmark that shows 2 things:
Here it is: https://jsben.ch/FgKVi This is where my research has brought me. Did I miss something? Note: The biggest speed advantages do not help if your game stutters every 5 seconds when garbage collection kicks in. Also, vector math is probably not the bottleneck of most games. Anyway, I think a class is the cleanest and most readable solution. If this really has the best performance nowadays, should it not be the way to go? |
I also want to point out that there is a similar discussion for three.js's vector math. It explains for example why |
@tobx the (reuse) benchmarks are definitely interesting and unexpected. A few notes on the way the benchmark is setup:
const count = 10000;
const inputsArr = new Array(count);
const inputsObj = new Array(count);
// realistic test data setup
for (let i=0; i < count; i++) {
const x = randomFloat();
const y = randomFloat();
const z = randomFloat();
inputsArr[i] = [ x, y, z ];
inputsObj[i] = new Vec3(x, y, z);
}
// tests here...
Anyway, none of these points "refute" your benchmark, just additional points of interest maybe. Would welcome feedback on these. @kevzettler would def be curious to see what you think of both the benchmark and these follow up points. |
I'm confused whats unexpected here, the (reuse) benchmarks performed better? Thats what the conversation has been about until this point? why is that unexpected? You've even shared this opinion in previous post? Unless you're talking about array vs object reuse. Which I think is important to compare the 2 add functions
Which indicates there is some over head to the Separately, I agree the benchmark is not realistic with the consistent adding of the same values. In fact I believe the V8 JIT compiler maybe able to detect that and optimize it away into a cached operation. The benchmark you proposed with with the randomFloats would be much more beneficial. Additionally, the benchmark doesn't even use gl-matrix methods so we can't with 100% certainty use it as an example. |
No, that part is expected. Obviously not allocating in a tight loop is going to be faster. Ignoring the non-
I agree with that, but I would be surprised that difference would be largely responsible for the significant perf gap shown in the benchmark. |
Ok I agree with that. I was also surprised to see that perf gap between |
Good points, I will give that a try soon.
Hm, it would be rather simple to replace all summations with multiplications, but I think this should not affect array vs. object performance at all, do you? But yes, one should never expect the results but measure it. Maybe give it a try.
Good point, I expected some different results and had them in the first run, but after a few more runs it seems this has no performance impact at all, see: |
I made the benchmark as proposed by @mreinstein: At my system reusing Objects is still the fastest after multiple runs, but the difference is less. In this version every test has more array accesses anyway so I guess the difference between Object and Array becomes less relevant. EDIT: @kevzettler The glMatrix |
Array reading is slightly faster in Chrome, but slower in Firefox (macOS): More interestingly (for me), Object allocation is slightly faster than Array allocation (Chrome & Firefox): EDIT: I did not find enough information about this and was interested in it and thought I share my results. I would be very happy though if the people with more experience in this topic would adapt the benchmarks to more realistic versions. |
I'm starting to think this is discussion is a bit of a moot point. The fact that this library is specifically designed to interact with webgl means that regardless of how @stefnotch I know you've mentioned hesitation in offloading cacheing/pooling to the end user. I'd be curious to see what you think of this very simple pool implementation I've got here https://github.com/mreinstein/vec2-gap#pool I'm using this in some of my own stuff, it's got a few tests and both single/bulk |
It is true that you have to eventually write your results to a Float32Array, but the final conversion to that is trivial. Math libraries are designed as an intermediate representation for doing calculations. The intermediate format needs to be fast and easy to use. All of the tests I have done recently show that immutable classes are very fast and mutable, classes are only 10% faster. Because of this, I use an immutable math library because it's semantically so clean and I haven't yet seen GC pressure due to quick calculations with them. I don't know enough about the Chrome runtime, but it seems that it might as well be allocating these on the stack in some cases or else realizing it can completely eliminate them from some functions. |
The things that'd currently be interesting to benchmark would probably be
I sadly currently don't have a whole lot of time to look into any of those things. |
I have been having issues with my setup and I still didn't go through resolving to use as-bind to have an encapsulated interface that allows full compatibility with gl-matrix.js on the webassembly port. |
This library used to be fast, but modern browsers are now allocating simple objects so fast that they are effectively free. Array allocation is still much slower and ArrayBuffer allocation is extremely slow.
A new math library is needed to take advantage of this. It should use simple classes, all functions can return new instances instead of using awkward out parameters and there should only be conversion to ArrayBuffers as a final function when assigning to webgl. (technically those arraybuffers can be reused as webgl doesn't retain references to them after you assign)
Check the performance of different approaches here:
https://jsperf.com/webgl-math-library-comparisson/20
The text was updated successfully, but these errors were encountered: