[torch.compile] fast inductor #11108

youkaichao · 2024-12-11T16:46:12Z

directly bypass aot-autograd and inductor, and load from cache

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-12-11T16:46:25Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-11T21:24:09Z

vllm/config.py

@@ -2212,6 +2215,53 @@ class CompilationLevel:
    PIECEWISE = 3


+class InductorHashCache:


tried to place this class into vllm.compilation.backends , but then needs to be lazily imported, and pydantic will complain.

Why not put into a separate file?

youkaichao · 2024-12-11T21:30:23Z

for torch.compile with warm start, i.e. we run it once to wam up the compilation cache, and then reuse the compilation cache:

before this pr (main branch):

$ vllm serve meta-llama/Meta-Llama-3-8B --disable-log-requests -O "{'level': 3, 'candidate_compile_sizes': [1, 2]}"
Dynamo bytecode transform time: 4.62 s
Compiling a graph for general shape takes 14.72 s
Compiling a graph for shape 2 takes 14.63 s
Compiling a graph for shape 1 takes 11.01 s
torch.compile takes 44.98 s in total

this pr:

$ vllm serve meta-llama/Meta-Llama-3-8B --disable-log-requests -O "{'level': 3, 'candidate_compile_sizes': [1, 2]}"
Dynamo bytecode transform time: 4.58 s
Compiling a graph for general shape takes 2.76 s
Compiling a graph for shape 2 takes 0.44 s
Compiling a graph for shape 1 takes 1.50 s
torch.compile takes 9.29 s in total

should be close to optimal now.

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-11T22:07:05Z

now, even if we compile for all sizes, the compilation time becomes negligible:

$ vllm serve meta-llama/Meta-Llama-3-8B --disable-log-requests -O "{'level': 3, 'candidate_compile_sizes': [$(seq -s, 1 1 256)]}"
INFO 12-11 14:04:25 backends.py:363] Dynamo bytecode transform time: 4.59 s
INFO 12-11 14:04:28 backends.py:155] Compiling a graph for general shape takes 2.77 s
INFO 12-11 14:04:32 backends.py:158] Compiling a graph for shape 256 takes 0.48 s
INFO 12-11 14:04:33 backends.py:158] Compiling a graph for shape 248 takes 0.48 s
INFO 12-11 14:04:34 backends.py:158] Compiling a graph for shape 240 takes 0.49 s
INFO 12-11 14:04:35 backends.py:158] Compiling a graph for shape 232 takes 0.37 s
INFO 12-11 14:04:36 backends.py:158] Compiling a graph for shape 224 takes 0.38 s
INFO 12-11 14:04:37 backends.py:158] Compiling a graph for shape 216 takes 0.47 s
INFO 12-11 14:04:37 backends.py:158] Compiling a graph for shape 208 takes 0.32 s
INFO 12-11 14:04:38 backends.py:158] Compiling a graph for shape 200 takes 0.47 s
INFO 12-11 14:04:39 backends.py:158] Compiling a graph for shape 192 takes 0.39 s
INFO 12-11 14:04:39 backends.py:158] Compiling a graph for shape 184 takes 0.34 s
INFO 12-11 14:04:40 backends.py:158] Compiling a graph for shape 176 takes 0.36 s
INFO 12-11 14:04:41 backends.py:158] Compiling a graph for shape 168 takes 0.49 s
INFO 12-11 14:04:42 backends.py:158] Compiling a graph for shape 160 takes 0.52 s
INFO 12-11 14:04:43 backends.py:158] Compiling a graph for shape 152 takes 0.47 s
INFO 12-11 14:04:44 backends.py:158] Compiling a graph for shape 144 takes 0.41 s
INFO 12-11 14:04:44 backends.py:158] Compiling a graph for shape 136 takes 0.33 s
INFO 12-11 14:04:45 backends.py:158] Compiling a graph for shape 128 takes 0.53 s
INFO 12-11 14:04:46 backends.py:158] Compiling a graph for shape 120 takes 0.33 s
INFO 12-11 14:04:47 backends.py:158] Compiling a graph for shape 112 takes 0.48 s
INFO 12-11 14:04:48 backends.py:158] Compiling a graph for shape 104 takes 0.51 s
INFO 12-11 14:04:48 backends.py:158] Compiling a graph for shape 96 takes 0.53 s
INFO 12-11 14:04:49 backends.py:158] Compiling a graph for shape 88 takes 0.54 s
INFO 12-11 14:04:50 backends.py:158] Compiling a graph for shape 80 takes 0.52 s
INFO 12-11 14:04:51 backends.py:158] Compiling a graph for shape 72 takes 0.59 s
INFO 12-11 14:04:52 backends.py:158] Compiling a graph for shape 64 takes 0.57 s
INFO 12-11 14:04:53 backends.py:158] Compiling a graph for shape 56 takes 0.51 s
INFO 12-11 14:04:54 backends.py:158] Compiling a graph for shape 48 takes 0.42 s
INFO 12-11 14:04:55 backends.py:158] Compiling a graph for shape 40 takes 0.56 s
INFO 12-11 14:04:56 backends.py:158] Compiling a graph for shape 32 takes 0.44 s
INFO 12-11 14:04:57 backends.py:158] Compiling a graph for shape 24 takes 0.47 s
INFO 12-11 14:04:57 backends.py:158] Compiling a graph for shape 16 takes 0.47 s
INFO 12-11 14:04:58 backends.py:158] Compiling a graph for shape 8 takes 0.47 s
INFO 12-11 14:04:59 backends.py:158] Compiling a graph for shape 4 takes 0.52 s
INFO 12-11 14:05:00 backends.py:158] Compiling a graph for shape 2 takes 0.51 s
INFO 12-11 14:05:02 backends.py:158] Compiling a graph for shape 1 takes 1.43 s
INFO 12-11 14:05:02 monitor.py:31] torch.compile takes 24.54 s in total

Now that we can directly cache inductor compilation, we don't need to use piecewise compilation anymore:

vllm serve meta-llama/Meta-Llama-3-8B --disable-log-requests -O "{'level': 3, 'candidate_compile_sizes': [$(seq -s, 1 1 256)], 'splitting_ops': []}"
INFO 12-11 22:13:19 backends.py:371] Dynamo bytecode transform time: 4.81 s
INFO 12-11 22:13:22 backends.py:163] Compiling a graph for general shape takes 0.68 s
INFO 12-11 22:13:26 backends.py:166] Compiling a graph for shape 256 takes 0.41 s
INFO 12-11 22:13:27 backends.py:166] Compiling a graph for shape 248 takes 0.41 s
INFO 12-11 22:13:28 backends.py:166] Compiling a graph for shape 240 takes 0.48 s
INFO 12-11 22:13:29 backends.py:166] Compiling a graph for shape 232 takes 0.46 s
INFO 12-11 22:13:30 backends.py:166] Compiling a graph for shape 224 takes 0.47 s
INFO 12-11 22:13:31 backends.py:166] Compiling a graph for shape 216 takes 0.39 s
INFO 12-11 22:13:32 backends.py:166] Compiling a graph for shape 208 takes 0.44 s
INFO 12-11 22:13:32 backends.py:166] Compiling a graph for shape 200 takes 0.36 s
INFO 12-11 22:13:33 backends.py:166] Compiling a graph for shape 192 takes 0.41 s
INFO 12-11 22:13:34 backends.py:166] Compiling a graph for shape 184 takes 0.40 s
INFO 12-11 22:13:35 backends.py:166] Compiling a graph for shape 176 takes 0.46 s
INFO 12-11 22:13:36 backends.py:166] Compiling a graph for shape 168 takes 0.36 s
INFO 12-11 22:13:36 backends.py:166] Compiling a graph for shape 160 takes 0.30 s
INFO 12-11 22:13:37 backends.py:166] Compiling a graph for shape 152 takes 0.41 s
INFO 12-11 22:13:38 backends.py:166] Compiling a graph for shape 144 takes 0.50 s
INFO 12-11 22:13:39 backends.py:166] Compiling a graph for shape 136 takes 0.45 s
INFO 12-11 22:13:40 backends.py:166] Compiling a graph for shape 128 takes 0.47 s
INFO 12-11 22:13:41 backends.py:166] Compiling a graph for shape 120 takes 0.47 s
INFO 12-11 22:13:41 backends.py:166] Compiling a graph for shape 112 takes 0.43 s
INFO 12-11 22:13:42 backends.py:166] Compiling a graph for shape 104 takes 0.51 s
INFO 12-11 22:13:43 backends.py:166] Compiling a graph for shape 96 takes 0.57 s
INFO 12-11 22:13:44 backends.py:166] Compiling a graph for shape 88 takes 0.42 s
INFO 12-11 22:13:45 backends.py:166] Compiling a graph for shape 80 takes 0.44 s
INFO 12-11 22:13:46 backends.py:166] Compiling a graph for shape 72 takes 0.56 s
INFO 12-11 22:13:47 backends.py:166] Compiling a graph for shape 64 takes 0.39 s
INFO 12-11 22:13:48 backends.py:166] Compiling a graph for shape 56 takes 0.41 s
INFO 12-11 22:13:49 backends.py:166] Compiling a graph for shape 48 takes 0.39 s
INFO 12-11 22:13:50 backends.py:166] Compiling a graph for shape 40 takes 0.50 s
INFO 12-11 22:13:50 backends.py:166] Compiling a graph for shape 32 takes 0.43 s
INFO 12-11 22:13:51 backends.py:166] Compiling a graph for shape 24 takes 0.53 s
INFO 12-11 22:13:52 backends.py:166] Compiling a graph for shape 16 takes 0.34 s
INFO 12-11 22:13:53 backends.py:166] Compiling a graph for shape 8 takes 0.52 s
INFO 12-11 22:13:54 backends.py:166] Compiling a graph for shape 4 takes 0.52 s
INFO 12-11 22:13:55 backends.py:166] Compiling a graph for shape 2 takes 0.45 s
INFO 12-11 22:13:56 backends.py:166] Compiling a graph for shape 1 takes 0.68 s
INFO 12-11 22:13:56 monitor.py:31] torch.compile takes 21.25 s in total

Therefore, I decided to remove piecewise compile for v0, while still keeping piecewise compile for v1. cc @bnellnm

youkaichao · 2024-12-11T23:16:54Z

cc @ProExpertProg @bnellnm

youkaichao · 2024-12-11T23:28:57Z

throughput benchmark:

# run baseline first, to find out the number of scheduler steps to keep gpus busy
$ python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B
Throughput: 30.52 requests/s, 12620.07 total tokens/s, 6052.88 output tokens/s

$ python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 8
Throughput: 43.57 requests/s, 18018.38 total tokens/s, 8642.04 output tokens/s

$ python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 10
Throughput: 44.04 requests/s, 18212.79 total tokens/s, 8735.28 output tokens/s

$ python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 12
Throughput: 44.65 requests/s, 18464.38 total tokens/s, 8855.95 output tokens/s

$ python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 16
Throughput: 45.03 requests/s, 18622.54 total tokens/s, 8931.81 output tokens/s

$ python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 20
Throughput: 44.18 requests/s, 18269.12 total tokens/s, 8762.30 output tokens/s

# the best number of scheduler step is 16, run this setting with `torch.compile`
python benchmarks/benchmark_throughput.py --dataset ShareGPT_V3_unfiltered_cleaned_split.json --model meta-llama/Meta-Llama-3-8B --num-scheduler-steps 16 -O "{'level': 3, 'candidate_compile_sizes': [$(seq -s, 1 1 256)]}"
Throughput: 46.93 requests/s, 19405.53 total tokens/s, 9307.35 output tokens/s

Now we don't need to profile the batchsize distribution anymore. We can just compile for all sizes.

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-12T19:20:09Z

I think there might be something wrong inside Inductor when I try llama 3 70B model (with tp 4). Even when I directly cache the graph hash, every shape takes 90 seconds to load the graph.

I will stop here as the results on llama 3 8B are quite good, compilation time is greatly reduced. The llama 3 70B case should be a bug to fix in the future.

youkaichao · 2024-12-12T19:39:28Z

vllm/config.py

+        if self.model_config is not None and \
+            not self.compilation_config.cache_dir:
+            # generate a cache directory based on the model information
+            # TODO: consider more factors that will affect model forward,


I think I missed some quantization args that can affect model execution, but I don't know how to pull out all factors that affect quantization.

vLLM version? We can add the git SHA to the key

This is going to be a large source of potential bugs so definitely should be careful here. Most quantization related stuff from NM goes in the model_config but there's a lot of arguments to LLM that can affect things like dtype and quantization. Are these in the key already?

not yet. that's why I want to ask for reviews.

one direction is we consider all factors affecting compilation, and we can use compilation cache by default.

another approach is we don't cache by default, but tell user the cache directory, and users can specify the cache directory if they know nothing changed.

which one would you prefer?

I think we should always check the known factors when we cache, and expose an accessible switch for enabling/disabling caching. And then it's less important whether it's on by default or not. And for that decision @robertgshaw2-neuralmagic should chime in.

I added more factors to consider in 2a7f729 . Let me know if I miss anything.

youkaichao · 2024-12-12T19:41:01Z

vllm/config.py

+        self.cache_dir = os.path.join(
+            self.cache_dir, f"rank_{vllm_config.parallel_config.rank}")
+        os.makedirs(self.cache_dir, exist_ok=True)
+        self.inductor_hash_cache_path = os.path.join(self.cache_dir,


it would be better if we also save a serialized form of the config, but we need to design the serialized format.

Which config is not serializable? Isn't CompilationConfig serializable?

it is serializable, but i want a human-readable form, so that we can also manually check the config.

ProExpertProg

Two main notes:

Figuring out what the "key" is for the cache should be a method on config, likely on each sub-config as well. That way it's clear when developers modify config they might need to modify the key as well.
I think InductorHashCache is a nice abstraction but it shouldn't live inside config (both file- and structure-wise). It should also take on more responsibilities, I think we can make it cleaner, so that wrap_inductor doesn't contain caching logic - it just calls the appropriate methods on the cache (or cache manager) object

ProExpertProg · 2024-12-12T22:47:53Z

vllm/config.py

@@ -2212,6 +2215,53 @@ class CompilationLevel:
    PIECEWISE = 3


+class InductorHashCache:


Why not put into a separate file?

vllm/config.py

ProExpertProg · 2024-12-12T22:51:08Z

vllm/config.py

+        self.cache_dir = os.path.join(
+            self.cache_dir, f"rank_{vllm_config.parallel_config.rank}")
+        os.makedirs(self.cache_dir, exist_ok=True)
+        self.inductor_hash_cache_path = os.path.join(self.cache_dir,


Which config is not serializable? Isn't CompilationConfig serializable?

vllm/config.py

vllm/compilation/backends.py

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-13T23:24:48Z

vllm/envs.py

@@ -71,6 +71,7 @@
    VLLM_USE_V1: bool = False
    VLLM_ENABLE_V1_MULTIPROCESSING: bool = False
    VLLM_LOG_BATCHSIZE_INTERVAL: float = -1
+    VLLM_DISABLE_COMPILE_CACHE: bool = False


added the flag to disable the compile cache. the compile cache is used by default.

youkaichao · 2024-12-13T23:25:32Z

vllm/compilation/backends.py

+        # set flags so that Inductor and Triton store their cache
+        # in the cache_dir, then users only need to copy the cache_dir
+        # to another machine to reuse the cache.
+        inductor_cache = os.path.join(cache_dir, "inductor_cache")
+        os.makedirs(inductor_cache, exist_ok=True)
+        os.environ["TORCHINDUCTOR_CACHE_DIR"] = inductor_cache
+        triton_cache = os.path.join(cache_dir, "triton_cache")
+        os.makedirs(triton_cache, exist_ok=True)
+        os.environ["TRITON_CACHE_DIR"] = triton_cache


redirect inductor/triton cache to the vllm cache location.

youkaichao · 2024-12-13T23:29:53Z

In the future, I plan to dump more information in the cache directory, including:

the transformed bytecode from Dynamo
the human-readable fx graph after every transform
the human-readable fx graph that inductor compiles
the human-readable executable python file inductor finally run

the goal is to make torch.compile crystally clear, we can know the exact correspondence of the compiled code and our original code, so that we can easily debug any potential correctness / performance issues.

ProExpertProg

Agree with Tyler's comments about config caching, otherwise LGTM!

Signed-off-by: youkaichao <[email protected]>

tlrmchlsmth · 2024-12-14T23:09:55Z

vllm/config.py

+        factors.append(self.inductor_compile_config)
+        factors.append(self.inductor_passes)


What about candidate_compile_sizes?

they do not affect the computation graph.

say we compile for candidate_compile_sizes = [1, 2, 4] , and then run again with candidate_compile_sizes = [1, 2, 4, 8] , we want them to share the same cache directory, so that we can directly load the cache for [1, 2, 4] , and only compile for a new shape 8 .

mergify · 2024-12-14T23:19:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: youkaichao <[email protected]>

tlrmchlsmth

Works for me now! Some suggestions on comments and logging, but LGTM otherwise

vllm/config.py

vllm/compilation/backends.py

Signed-off-by: youkaichao <[email protected]>

Signed-off-by: youkaichao <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>

show draft

02876a0

Signed-off-by: youkaichao <[email protected]>

youkaichao added 12 commits December 11, 2024 11:02

Merge branch 'main' into fast_inductor

de61c66

polish

d59d2d8

Signed-off-by: youkaichao <[email protected]>

fix

9f599cc

Signed-off-by: youkaichao <[email protected]>

change read

fc0f60f

Signed-off-by: youkaichao <[email protected]>

use InductorHashCache class

3a8e678

Signed-off-by: youkaichao <[email protected]>

rename key to hash_str

9515e63

Signed-off-by: youkaichao <[email protected]>

comment

82d60e6

Signed-off-by: youkaichao <[email protected]>

python-style

84aa0b3

Signed-off-by: youkaichao <[email protected]>

comments

56a03c7

Signed-off-by: youkaichao <[email protected]>

add comments

e3f0a14

Signed-off-by: youkaichao <[email protected]>

fix serialize

51a1efb

Signed-off-by: youkaichao <[email protected]>

fix

b20435c

Signed-off-by: youkaichao <[email protected]>

youkaichao commented Dec 11, 2024

View reviewed changes

fix

c41a8d4

Signed-off-by: youkaichao <[email protected]>

youkaichao marked this pull request as ready for review December 11, 2024 21:32

youkaichao added 2 commits December 11, 2024 22:10

fix high-order ops

eb0dba2

Signed-off-by: youkaichao <[email protected]>

fix splitting ops

f48a4f6

Signed-off-by: youkaichao <[email protected]>

youkaichao commented Dec 12, 2024

View reviewed changes

youkaichao requested a review from tlrmchlsmth December 12, 2024 20:23

ProExpertProg reviewed Dec 12, 2024

View reviewed changes

youkaichao added 2 commits December 12, 2024 17:29

move file writing inside InductorHashCache

955989c

Signed-off-by: youkaichao <[email protected]>

add comments

4f1c4a0

Signed-off-by: youkaichao <[email protected]>

youkaichao added 7 commits December 13, 2024 13:02

add vllm version

5c5eb2b

Signed-off-by: youkaichao <[email protected]>

bugfix

b346bd9

Signed-off-by: youkaichao <[email protected]>

add disable

f58b566

Signed-off-by: youkaichao <[email protected]>

redirect inductor

a264175

Signed-off-by: youkaichao <[email protected]>

add more logging

76fcc99

Signed-off-by: youkaichao <[email protected]>

add more logging

59365c4

Signed-off-by: youkaichao <[email protected]>

fix shape 1

aacf7c8

Signed-off-by: youkaichao <[email protected]>

youkaichao commented Dec 13, 2024

View reviewed changes

youkaichao requested review from bnellnm and tlrmchlsmth December 13, 2024 23:26

ProExpertProg approved these changes Dec 13, 2024

View reviewed changes

fix tests

37829dd

Signed-off-by: youkaichao <[email protected]>

tlrmchlsmth reviewed Dec 14, 2024

View reviewed changes

mergify bot added the needs-rebase label Dec 14, 2024

youkaichao added 2 commits December 14, 2024 17:20

fix inductor

c4bc393

Signed-off-by: youkaichao <[email protected]>

Merge branch 'main' into fast_inductor

4b31b4f

mergify bot removed the needs-rebase label Dec 15, 2024

youkaichao mentioned this pull request Dec 16, 2024

[Release]: v0.7.0 Release Tracker #11218

Open

2 tasks

tlrmchlsmth approved these changes Dec 16, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

vllm/compilation/backends.py Show resolved Hide resolved

vllm/compilation/backends.py Show resolved Hide resolved

youkaichao added 2 commits December 16, 2024 13:42

add warning

40f1355

Signed-off-by: youkaichao <[email protected]>

add info logging

c4478c8

Signed-off-by: youkaichao <[email protected]>

youkaichao enabled auto-merge (squash) December 16, 2024 22:42

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 16, 2024

youkaichao disabled auto-merge December 17, 2024 00:15

youkaichao merged commit 88a412e into vllm-project:main Dec 17, 2024
50 of 53 checks passed

youkaichao deleted the fast_inductor branch December 17, 2024 00:15

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[torch.compile] fast inductor (vllm-project#11108)

a5a89fc

Signed-off-by: youkaichao <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] fast inductor #11108

[torch.compile] fast inductor #11108

youkaichao commented Dec 11, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024

youkaichao Dec 11, 2024

ProExpertProg Dec 12, 2024

youkaichao commented Dec 11, 2024

youkaichao commented Dec 11, 2024 •

edited

Loading

youkaichao commented Dec 11, 2024

youkaichao commented Dec 11, 2024

youkaichao commented Dec 12, 2024

youkaichao Dec 12, 2024

tlrmchlsmth Dec 12, 2024

tlrmchlsmth Dec 12, 2024

youkaichao Dec 13, 2024

ProExpertProg Dec 13, 2024

youkaichao Dec 13, 2024

youkaichao Dec 12, 2024

ProExpertProg Dec 12, 2024

youkaichao Dec 13, 2024

ProExpertProg left a comment

ProExpertProg Dec 12, 2024

ProExpertProg Dec 12, 2024

youkaichao Dec 13, 2024

youkaichao Dec 13, 2024

youkaichao commented Dec 13, 2024

ProExpertProg left a comment

tlrmchlsmth Dec 14, 2024

youkaichao Dec 14, 2024

mergify bot commented Dec 14, 2024

tlrmchlsmth left a comment

		@@ -2212,6 +2215,53 @@ class CompilationLevel:
		PIECEWISE = 3


		class InductorHashCache:

		factors.append(self.inductor_compile_config)
		factors.append(self.inductor_passes)

[torch.compile] fast inductor #11108

[torch.compile] fast inductor #11108

Conversation

youkaichao commented Dec 11, 2024 • edited Loading

github-actions bot commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youkaichao commented Dec 11, 2024

youkaichao commented Dec 11, 2024 • edited Loading

youkaichao commented Dec 11, 2024

youkaichao commented Dec 11, 2024

youkaichao commented Dec 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ProExpertProg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youkaichao commented Dec 13, 2024

ProExpertProg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Dec 14, 2024

tlrmchlsmth left a comment

Choose a reason for hiding this comment

youkaichao commented Dec 11, 2024 •

edited

Loading

youkaichao commented Dec 11, 2024 •

edited

Loading