Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of time spent in memory subsystem #153

Open
jonashaag opened this issue Sep 4, 2023 · 4 comments
Open

Lots of time spent in memory subsystem #153

jonashaag opened this issue Sep 4, 2023 · 4 comments

Comments

@jonashaag
Copy link

Not sure if this is expected, and if anyone is interested in optimizing this.

I have a real world workload that spends a lot of time in the memory subsystem. macOS Instruments profile:
Screenshot 2023-09-04 at 09 23 20

Unfortunately I can't share the workload itself but I can do more profiling or try patching some stuff. I've already figured out that caching some of the machine-related checks (if (m->foobar ...)) speeds up things by 10%

@jart
Copy link
Owner

jart commented Sep 4, 2023

Are you using the linear memory optimization? It should be enabled on most platforms by default, unless you're disabling it by passing the '-m' flag. If I'm running on Linux x86-64 then I have a near certain chance of getting fast memory. Profiling will look like this:

image

But if I pass the -m flag to blink to disable the linear memory optimization, then profiling will look the way yours does:

image

@jonashaag
Copy link
Author

Sorry, should have specified the exact command. Yes, I'm using -m because other the program won't work.

@jart
Copy link
Owner

jart commented Sep 4, 2023

Have you read these sections of the readme?

The reason why -m is costly is because it does full memory virtualization. It has to indirect memory accesses through a translation lookaside buffer and a four-level radix trie. It's about as optimized as it can be.

The best bet for you would probably be to find some way to get the linear memory optimization working for you. For example, we could find some other formula for mapping guest addresses onto host addresses.

  1. Are you using Apple Silicon? Because if so, Apple doesn't let us mmap() addresses beneath 4gb
  2. Is your program a static binary? Which fixed addresses was it compiled to use?

@jart
Copy link
Owner

jart commented Sep 4, 2023

You're also invited to join our Discord https://discord.gg/Hb4QHYj2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants