Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for passing complex data types #475

Merged
merged 5 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions docs/complex-data-types-in-wasm-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Using complex data types in Wasm functions

Core WebAssembly currently only supports using numbers for arguments and return values for exported and imported functions. This presents a problem when you want to pass strings, byte arrays, or structured data to and from hostcalls. This document will provide an overview for one approach to consider.
jeffcharles marked this conversation as resolved.
Show resolved Hide resolved

At a high level, byte arrays can be passed using a pair of integers with the first integer representing the address of the start of the byte array in the instance's linear memory and the second integer representing the length of the byte array. Strings can be passed by encoding the string into a UTF-8 byte array and using the previous solution to pass the byte array. Structured data can be encoded to a JSON string and that string can be passed by encoding it into a UTF-8 byte array and using the previous solution. Other serialization formats can also be used to encode the structured data to a byte array.

The examples below use Rust and Wasmtime to on the host however any programming language and WebAssembly runtime should support using the same approach.

## For exported functions

Passing a byte array to an exported function from the WebAssembly host:

```rust
use anyhow::Result;

fn call_the_export(bytes: &[u8], instance: wasmtime::Instance, store: &mut wasmtime::Store<WasiCtx>) -> Result<()> {
let memory = instance.get_memory(&mut store, "memory");
let realloc_fn = instance
.get_typed_func::<(u32, u32, u32, u32), u32>(&mut store, "canonical_abi_realloc")?;
let len = bytes.len().try_into()?;

let original_ptr = 0;
let original_size = 0;
let alignment = 1;
let ptr =
realloc_fn.call(&mut store, (original_ptr, original_size, alignment, len))?;

memory.write(&mut store, ptr.try_into()?, bytes)?;

let your_fn = instance.get_typed_func::<(u32, u32), ()>(&mut store, "your_fn")?;
your_fn.call(&mut store, (ptr, len))?;

Ok(())
}
```

In the WebAssembly instance when receiving a byte array from an exported function, you can use the `std::slice::from_raw_parts` function to get the slice.

```rust
#[export_name = "your_fn"]
pub unsafe extern "C" fn your_fn(ptr: *const u8, len: usize) {
let bytes = std::slice::from_raw_parts(ptr, len);
todo!(); // use `bytes` for something
}
```

To return a byte array from an exported function in a WebAssembly instance, you need to leak the byte array and we recommend using a static wide pointer for storing the pointer and length.

```rust
static mut BYTES_RET_AREA: [u32; 2] = [0; 2];

#[export_name = "your_fn"]
pub unsafe extern "C" fn your_fn() -> *const u32 {
let bytes = todo!(); // fill in your own logic
let len = bytes.len();
let ptr = Box::leak(bytes.into_boxed_slice()).as_ptr();
BYTES_RET_AREA[0] = ptr as u32;
BYTES_RET_AREA[1] = len.try_into().unwrap();
BYTES_RET_AREA.as_ptr()
}
```

On the host, you can use `memory.read` to populate a vector with the byte array. WebAssembly uses little-endian integers so we read 32-bit integers using `from_le_bytes`.

```rust
fn get_slice(instance: wasmtime::Instance, store: &mut wasmtime::Store) -> Result<Vec<u8>> {
let your_fn = instance.get_typed_func::<(), u32>(&mut store, "your_fn")?;
let ret_ptr = your_fn.call(&mut store, (ptr, len))?;

let memory = instance.get_memory(&mut store, "memory")?;

let mut ret_buffer = [0; 8];
memory.read(&mut store, ret_ptr.try_into()?, &mut ret_buffer)?;

let bytecode_ptr = u32::from_le_bytes(ret_buffer[0..4].try_into()?);
let bytecode_len = u32::from_le_bytes(ret_buffer[4..8].try_into()?);

let mut bytecode = vec![0; bytecode_len.try_into()?];
memory.read(&mut store, bytecode_ptr.try_into()?, &mut bytecode)?;

Ok(bytecode)
}
```

## For imported functions

When passing a byte array to the host from the WebAssembly instance, we pass the pointer and length to the imported function:

```rust
use anyhow::Result;

#[link(name = "host")]
extern "C" {
fn my_import(ptr: *const u32, len: u32);
}

fn call_the_import(bytes: &[u8]) -> Result<()> {
unsafe { my_import(bytes.as_ptr(), bytes.len().try_into()?) };
}
```

When receiving a byte array from the WebAssembly instance on the host, we use `memory.read` along with the pointer and length to get the byte array:

```rust
use anyhow::Result;

struct StoreContext {
bytes: Vec<u8>,
wasi: wasmtime_wasi::WasiCtx,
}

fn setup(linker: &mut wasmtime::Linker<StoreContext>) -> Result<()> {
wasmtime_wasi::sync::add_to_linker(&mut linker, |ctx: &mut StoreContext| &mut ctx.wasi)?;

linker
.func_wrap(
"host",
"my_import",
|mut caller: wasmtime::Caller<'_, StoreContext>, ptr: u32, len: u32| {
let mut bytes = Vec::with_capacity(len.try_into()?);
caller
.get_export("memory")?
.into_memory()
.unwrap()
.read(&caller, ptr.try_into().unwrap(), &mut bytes)?;
caller.data_mut().bytes = bytes;
},
)?;
}
```

When returning a byte array from the host, things get a little more complicated. Below we use a wide pointer to return the byte array. This requires two memory allocations in the instance, one for the byte array and one for the wide pointer, and using `memory.write` to place the array and wide pointer into the allocated memory. Since the byte array is copied into the instance's memory, there is no need to leak the original byte array.

```rust
fn setup(linker: &mut wasmtime::Linker<wasmtime_wasi::WasiContext>) -> Result<()> {
wasmtime_wasi::sync::add_to_linker(&mut linker, |ctx: &mut wasmtime_wasi::WasiContext| &mut ctx)?;

linker
.func_wrap(
"host",
"my_import",
|mut caller: wasmtime::Caller<'_, StoreContext>| -> Result<u32> {
let memory = caller.get_export("memory").unwrap().into_memory().unwrap();
let realloc = caller
.get_export("canonical_abi_realloc")
.unwrap()
.into_func()
.unwrap()
.typed::<(u32, u32, u32, u32), u32>(&caller)?;

let bytes = todo!();
let original_ptr = 0;
let original_size = 0;
let alignment = 1;
let ptr = realloc.call(
&mut caller,
(
original_ptr,
original_size,
alignment,
bytes.len().try_into()?,
),
)?;

memory.write(&mut caller, ptr.try_into().unwrap(), &bytes)?;

const LEN: usize = 8;
let mut wide_ptr_buffer = [0u8; LEN];
wide_ptr_buffer[0..4].copy_from_slice(&ptr.to_le_bytes());
wide_ptr_buffer[4..8]
.copy_from_slice(&TryInto::<u32>::try_into(bytes.len())?.to_le_bytes());
let wide_ptr = realloc.call(
&mut caller,
(original_ptr, original_size, alignment, LEN.try_into()?),
)?;
memory.write(&mut caller, wide_ptr.try_into()?, &wide_ptr_buffer)?;
Ok(wide_ptr)
},
)
.unwrap();
}
```

When reading a returned byte array from the host, we extract the pointer and length from the wide pointer and then use the pointer and length to read a slice from memory:

```rust
#[link(wasm_import_module = "host")]
extern "C" {
fn my_import() -> *const u32;
}

fn main() {
let bytes = unsafe {
let wide_ptr = my_import();
let [ptr, len] = std::slice::from_raw_parts(wide_ptr, 2) else {
unreachable!()
};
std::slice::from_raw_parts(*ptr as *const u8, (*len).try_into().unwrap())
};
todo!();
}
```
2 changes: 2 additions & 0 deletions docs/extending.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Extending Javy

If you want to use Javy for your own project, you may find that the existing code is not sufficient since you may want to offer custom APIs or use different branding for the CLI. The approach we'd recommend taking is to fork and create your own version of the `javy-cli` and `javy-core` crates and depend on the upstream version of the `javy` and `javy-apis` crates. You can add your own implementations of custom JS APIs in your fork of the `javy-core` crate or in a different crate that you depend on in your `javy-core` fork. If you find that something is missing in the `javy` crate that you require to implement something in your fork, we would appreciate it if you would open a GitHub issue and consider making the change upstream instead of in your fork so all users of the `javy` crate can benefit.

See our documentation on [using complex data types in Wasm functions](complex-data-types-in-wasm-functions.md) for how to support Wasm functions that need to use byte arrays, strings, or structured data.