Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use code_server path cache in code:where_is_file/1 #8078

Merged
merged 2 commits into from
Aug 20, 2024

Conversation

the-mikedavis
Copy link
Contributor

@the-mikedavis the-mikedavis commented Jan 31, 2024

application:load/1 slows down as the number of directories in the code path increases because the call to code:where_is_file/1 for the '.app' file must scan each directory for the app. The code_server maintains a cache of the contents of directories in the path depending on their code:cache() setting since #6729 / #8049. Re-using and updating that cache when searching for '.app' files in application:load/1 can improve its runtime, especially when loading multiple applications.

Internally to make this possible I've added a code_server:where_is_file/1 function that the application_controller uses instead of code:where_is_file/1. This re-uses the code_server's existing cache helper functions to find the file and update the caches along the way, respecting the code:cache() option for each dir.

For a reasonably large project (RabbitMQ) with a high number of code path dirs (111 at time of writing), loading an application might take upwards of 20ms. With the code_server's cache, loading each application takes at most around half of a millisecond instead. application:load/1 is used for loading plugins in Rabbit's CLI for example, and this change can save as much as 750ms for a run of rabbitmqctl --help (30% of the total run time). It also improves boot time for the broker since application:load/1 is used by application:start/2.

Copy link
Contributor

github-actions bot commented Jan 31, 2024

CT Test Results

    2 files     70 suites   1h 2m 59s ⏱️
1 544 tests 1 299 ✅ 241 💤 4 ❌
1 716 runs  1 438 ✅ 274 💤 4 ❌

For more details on these failures, see this check.

Results for commit c8687e3.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Feb 5, 2024
@josevalim
Copy link
Contributor

@the-mikedavis perhaps it would make sense to change code:where_is_file/1 itself to go through the code server and use the cache? This would optimize both which and where_is_file as well.

@the-mikedavis the-mikedavis changed the title Use code_server path cache for finding .app files Use code_server path cache in code:where_is_file/1 Feb 13, 2024
@kikofernandez
Copy link
Contributor

Does it make sense if which/1:

which(Module) when is_atom(Module) ->
    case is_loaded(Module) of
	false ->
            which(Module, get_path());
	{file, File} ->
	    File
    end.

Gets updated to:

which(Module) when is_atom(Module) ->
    case is_loaded(Module) of
      false ->
        File = atom_to_list(Module) ++ objfile_extension(),
        where_is_file(File);
      {file, File} ->
        File
    end.

The get_path() function in the first snippet of code calls back to #state.path in code_server, which is the same behaviour as caling where_is_file/1. The where_is_file/1 performs a check to see if something is already in the cache (and updates it if needed). I think this makes sense? Any reason for not doing this?

@josevalim
Copy link
Contributor

@kikofernandez I agree it makes sense. Perhaps we should get rid of where_is_file/2. It seems to be private and it seems we no longer use it anywhere?

@kikofernandez
Copy link
Contributor

kikofernandez commented Aug 7, 2024

I initially thought so, but where_is_file/2 seems to be used by which/2, and which/2 is used by module_status/2 (copied below), and the comment seems to imply that we cannot go via which/1.

I do not fully understand the "why" certain things are as they are. It could be that we simply move the implementation of where_is_file/2 elsewhere

%% Note that we don't want to go via which/1, since it doesn't look at the
%% disk contents at all if the module is already loaded.
module_status(Module, PathFiles) ->
    case is_loaded(Module) of
        false -> not_loaded;
        {file, preloaded} -> loaded;
        {file, cover_compiled} ->
            %% Cover compilation loads directly to memory and does not
            %% create a beam file, so report 'modified' if a file exists.
            case which(Module, PathFiles) of
                non_existing -> removed;
                _File -> modified
            end;

Copy link
Contributor

@kikofernandez kikofernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Unless someone objects, I will merge in a few days.

Thanks!

@kikofernandez kikofernandez added the testing currently being tested, tag is used by OTP internal CI label Aug 12, 2024
the-mikedavis and others added 2 commits August 16, 2024 12:46
`application:load/1` slows down as the number of directories in the code
path increases because the call to `code:where_is_file/1` for the '.app'
file must scan each directory for the app. The code_server maintains
a cache of the contents of directories in the path depending on their
`code:cache()` setting. Re-using and updating that cache when searching
for '.app' files in `application:load/1` can improve its runtime,
especially when loading multiple applications.

For a reasonably large project (RabbitMQ) with a high number of code
path dirs (111 at time of writing), loading an application might take
upwards of 20ms. With the code_server's cache, loading each application
takes at most around half of a millisecond instead. `application:load/1`
is used for loading plugins in Rabbit's CLI for example, and this change
can save as much as 750ms for a run of `rabbitmqctl --help` (30% of the
total run time). It also improves the boot time for the broker since
`application:load/1` is used by `application:start/2`.
@kikofernandez kikofernandez added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Aug 16, 2024
@kikofernandez kikofernandez merged commit 44a56d6 into erlang:master Aug 20, 2024
14 of 16 checks passed
@the-mikedavis the-mikedavis deleted the code-server-where-is-file branch August 20, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants