-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiled wasm crashes on Rust 1.78.0, but not on 1.77.2 #443
Comments
Thanks for the report! I can reproduce this myself but I've moved this now from the wit-bindgen repository to the jco repository. I believe that this is a bug in jco's string encoding implementation currently. The allocation at the end needs to precisely fit the length of the string so the over-allocated size is what's causing the issue here. This is basically the exact same bug as what came up in wasm-bindgen, just in a different context. The Canonical ABI is pretty strict about the allowable behaviors here in terms of how strings are lowered. @guybedford I think you'll want to transcode the That being said I know in the past in my work on wasm-bindgen it's much more beneficial to assume ascii most of the time and then pessimize later with another allocation. In that sense the best algorithm to implement I don't think is allowed by the component model right now. See Also cc @lukewagner on this since this is an interesting case coming up spec-wise. The summary is that Rust's debug assertions are catching a case where jco didn't shrink the size of a string allocation during lowering to the exact size of a string. That's easily fixable but the point I'd moreso want to raise here is that it's not clear to me how to implement "the most efficient way of transferring a string to wasm from js" given the current component model spec. For example in wasm-bindgen the current implementation looks like this (originally from rustwasm/wasm-bindgen#1736 I believe) and looks like (a) assume ascii and copy byte-by-byte and failing that (b) use |
Thanks for pointing this out. So the optimistic initial allocation does assume that each code point is ASCII and fits in a byte (hence allocating |
Sort of yeah but not precisely. It's mostly that the rigidity of writing down the exact method of how string translation should happen can be limiting in some contexts. JS here is an example where, according to wasm-bindgen, the fastest way to copy a string is to (a) copy manually in JS without This specific algorithm for JS can't be implemented for components as it's not specified in the canonical ABI. That's not a break-the-bank situation per se but I mostly wanted to highlight this as a potential weakness. |
If we were to change the Canonical ABI to match what JS wants to do, it seems like it would be no worse, and possibly better, for all the other host embeddings, so we could just do that (and probably this is what I should've started with). If we found a fundamental, unresolvable tension between two observably different string-copying algorithms, we could add more non-determinism to the spec to give hosts more wiggle room, but given the general portability benefits of deterministic semantics, I think we should build up that motivation with some concrete examples first. |
That's true yeah, and I think concretely this would change the utf16 -> utf8 path to assume ascii first and then only start allocating one non-ascii was hit? |
Yep! It'd be a pretty small tweak to the existing Python code, I think; I can work up a patch in a week. |
Ok, filed component-model/#366 |
To reproduce:
Clone this repository https://github.com/lucasavila00/bindgencrash
It contains a
wit
world that has a function that receives a string and returnsi32
The function just returns 1 and never reads the string.
When I call the compiled wasm with some specific string, the wasm code crashes.
I wonder if this is related to the dlmalloc change alexcrichton/dlmalloc-rs#32
This change broke wasm-bindgen (see rust-lang/rust#124671, rustwasm/wasm-bindgen#3801, rustwasm/wasm-bindgen#3808)
Is wit-bindgen also affected by this?
It looks similar to the related issues, as this crash happens if a specific input String is used (eg: https://github.com/lucasavila00/bindgencrash/blob/main/tests/test-e2e.js#L14-L17) and the issue would happen when deallocating such strings.
I know if happens with wit-bindgen 0.16.0 and 0.26.0
The text was updated successfully, but these errors were encountered: