Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format very slow when --path not used; confusing UX for formatting multiple files #3593

Closed
fzakaria opened this issue Jan 15, 2025 · 4 comments
Labels
Bug Something isn't working

Comments

@fzakaria
Copy link

fzakaria commented Jan 15, 2025

GitHub repository with your minimal reproducible example (do not leave this field blank or fill out this field with "github.com/bufbuild/buf" or we will automatically close your issue, see the instructions above!)

Please let me know if it's needed and I can provide one; right now I'm trying to understand the UX of --path vs. providing positional argument <source> and the discrepancy on time.

Commands

buf seems to take ~1-3s to startup and process 1 file.

The time seems to have dropped dramatically as of 1.41.0 to ~1s. When we were on 1.34.0 it was closer to ~3s

> time /private/var/tmp/_bazel_fzakaria/1e65f71ad5377a6871390147418ed439/external/rules_buf~~buf~rules_buf_toolchains/buf format sdk/src/main/protobuf/sdk.proto -d
 format sdk/src/main/protobuf/sdk.proto -d  0.94s user 2.27s system 99% cpu 3.233 total

Consequently, if you run it on the whole directory (including bazel-out symlinks which has all the external protocol files)... it also takes ~3s total

> time /private/var/tmp/_bazel_fzakaria/1e65f71ad5377a6871390147418ed439/external/rules_buf~~buf~rules_buf_toolchains/buf format -d 1>/dev/null
 format -d > /dev/null  0.98s user 2.31s system 105% cpu 3.105 total

We are using an implementation of a linter that iterates over each file since having looked at the --help buf seems to have only accepted a single <source>.

For us since it's running over each protobuf; it takes >1minute to format all the files (3s * N)

> time bazel run //tools/format:format_Protocol_Buffer_with_buf
INFO: Invocation ID: e405bda0-efa5-447c-b99a-b732d0ddde46
INFO: Analyzed target //tools/format:format_Protocol_Buffer_with_buf (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //tools/format:format_Protocol_Buffer_with_buf up-to-date:
  bazel-bin/tools/format/format_Protocol_Buffer_with_buf.bash
INFO: Elapsed time: 0.161s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/tools/format/format_Protocol_Buffer_with_buf.bash

Formatted Protocol Buffer in 1m7.848s

I did notice however that using --path seems to be a way to provide multiple sources; interestingly running format with --path is also way way faster.

> time /private/var/tmp/_bazel_fzakaria/1e65f71ad5377a6871390147418ed439/external/rules_buf~~buf~rules_buf_toolchains/buf format --path apps/cts/gc/src/main/protobuf/tasks.proto --path apps/cts/materializer/src/main/protobuf/materializer.proto -d
 format --path apps/cts/gc/src/main/protobuf/tasks.proto --path  -d  0.01s user 0.02s system 11% cpu 0.285 total

Here is a comparison of the two to show the delta (0.025s vs 1s):

# using --path
> time /private/var/tmp/_bazel_fzakaria/1e65f71ad5377a6871390147418ed439/external/rules_buf~~buf~rules_buf_toolchains/buf format --path apps/cts/gc/src/main/protobuf/tasks.proto -d
 format --path apps/cts/gc/src/main/protobuf/tasks.proto -d  0.01s user 0.01s system 54% cpu 0.025 total

# using positional argument
> time /private/var/tmp/_bazel_fzakaria/1e65f71ad5377a6871390147418ed439/external/rules_buf~~buf~rules_buf_toolchains/buf format apps/cts/gc/src/main/protobuf/tasks.proto -d
 format apps/cts/gc/src/main/protobuf/tasks.proto -d  0.30s user 0.77s system 104% cpu 1.017 total

I guess my questions are:

  • Why does --path run way faster?
  • Can you help explain --path vs. the <source> positional argument?
  • What's the recommended way to run over multiple sources in a repository?
@fzakaria fzakaria added the Bug Something isn't working label Jan 15, 2025
@doriable
Copy link
Member

doriable commented Jan 15, 2025

Provided some answers to the questions below, let me know if there are additional questions/concerns, thanks!

Why does --path run way faster?

So buf format, as with other buf commands, compiles modules/a workspace under the hood. This is the case even when given a single source file (e.g. format apps/cts/gc/src/main/protobuf/tasks.proto -d). So that overhead stays the same. When providing --path filters, this compilation process is filtered down.

Can you help explain --path vs. the positional argument?

The <source> argument is meant to take an input. There are several examples in the buf format help docs as well. By default, this <source> is the current directory. The --path flag is meant to filter down the given input to the path(s) provided.

What's the recommended way to run over multiple sources in a repository?

So the recommended way is that if your sources are organised in a single input (e.g. a module/workspace ideally, but the protobuf root directory, etc.), you would run a single invocation of format against the input that encapsulates all your proto sources.

@fzakaria
Copy link
Author

@doriable thank you for the response;

I will read more about modules/workspace;
Does the compilation of a module/workspace matter at all ? It seems to add serious overhead for the format command which shouldn't need it at all?
(I guess since the input might not be a directory/file maybe it does?)

This UX inquiry came about from looking into the integration of buf with https://github.com/aspect-build/rules_lint which I guess can't specify a single directory for buf as there are tons of unnecessary symlinks that might be followed.
The tool had taken the approach to iterate the files one-at-a-time which led me to observe the slowness.
(We've circumvented it for now with a PR to migrate it to use --path).

Not sure where this leaves the issue if there's any improvement to the code, documentation or neither.
🙇

@doriable
Copy link
Member

Does the compilation of a module/workspace matter at all ? It seems to add serious overhead for the format command which shouldn't need it at all?

The short answer is yes. The current implementation relies on a compiled input, we are working on some improvements to formatting, but this is something that will come further down the road.

I'm glad that there is a workaround, if it helps, buf commands do follow symlinks (there is actually a --disable-symlinks flag, but by default, symlinks are respected and followed for local filesystem sources).

I'll chat with the team to see if there are improvements we can make to documentation in the mean time. I am going to resolve this issue since I think the main questions are addressed, please feel free to re-open/respond if there is anything else I can help with.

@fzakaria
Copy link
Author

Thank you for the response on the issue even though I didn't put a reproducer GitHub repo :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants