You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a sandboxed Linux process runs, all of its inputs are written to disk first so that they can be bind-mounted into the directory used as the rootfs for the process. Even though we use hardlinks as much as possible, the process of creating these local directory structures can take a significant amount of time.
Building this project locally takes about 20 seconds when uncached for me (this assumes std.toolchain() already exists as a local but large_input does not). The majority of that time is just creating the directory structure for large_input, even though all files within that directory are hardlinks to existing blobs.
We definitely need to work on optimizing this, and there are probably lots of low-hanging fruit in how we create these directory structures.
But, one future optimization we can do is to use a FUSE filesystem to avoid needing to create this directory structure entirely. That is, we have a small FUSE filesystem (likely a custom implementation) where, when reading a path like ${large_input}/a/bin/gcc, it forwards that request to ~/.local/share/brioche/blobs/SOME_HASH. This has several advantages:
The directory structure never needs to get written to disk
Avoid creating executable/non-executable copies of blobs (e.g. identical blobs that differ only by the .x prefix and the executable permission)
As a future improvement, we can even map other paths not in the blobs directory into the filesystem, e.g. mapping Brioche.includeFile files directly without copying into a blob
FUSE has some issues, namely that I/O performance is worse than a normal filesystem, it's Linux-only (effectively), and it's not enabled by default in all Linux distros. For these reasons, if/when we implement FUSE, we'll still need to fallback to the current "create then bind-mount a local directory" implementation that we use today.
Update: I recently learned that overlayfs was enabled by default when using Linux 5.11+. This is another option, which would save a lot of legwork needed for a custom FUSE implementation, and should work out-of-the-box in more distros I think (plus, FUSE is relatively slow IIRC, so using something provided by the kernel might be faster than anything we could do ourselves)
The text was updated successfully, but these errors were encountered:
kylewlacy
changed the title
Use FUSE filesystem to speed up sandboxed Linux builds
Use overlayfs or FUSE filesystem to speed up sandboxed Linux builds
Oct 19, 2024
When a sandboxed Linux process runs, all of its inputs are written to disk first so that they can be bind-mounted into the directory used as the rootfs for the process. Even though we use hardlinks as much as possible, the process of creating these local directory structures can take a significant amount of time.
Here's a test case today:
Building this project locally takes about 20 seconds when uncached for me (this assumes
std.toolchain()
already exists as a local butlarge_input
does not). The majority of that time is just creating the directory structure forlarge_input
, even though all files within that directory are hardlinks to existing blobs.We definitely need to work on optimizing this, and there are probably lots of low-hanging fruit in how we create these directory structures.
But, one future optimization we can do is to use a FUSE filesystem to avoid needing to create this directory structure entirely. That is, we have a small FUSE filesystem (likely a custom implementation) where, when reading a path like
${large_input}/a/bin/gcc
, it forwards that request to~/.local/share/brioche/blobs/SOME_HASH
. This has several advantages:.x
prefix and the executable permission)blobs
directory into the filesystem, e.g. mappingBrioche.includeFile
files directly without copying into a blobFUSE has some issues, namely that I/O performance is worse than a normal filesystem, it's Linux-only (effectively), and it's not enabled by default in all Linux distros. For these reasons, if/when we implement FUSE, we'll still need to fallback to the current "create then bind-mount a local directory" implementation that we use today.
Update: I recently learned that
overlayfs
was enabled by default when using Linux 5.11+. This is another option, which would save a lot of legwork needed for a custom FUSE implementation, and should work out-of-the-box in more distros I think (plus, FUSE is relatively slow IIRC, so using something provided by the kernel might be faster than anything we could do ourselves)The text was updated successfully, but these errors were encountered: