-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DDPWrapper #2479
Add DDPWrapper #2479
Commits on Sep 26, 2024
-
Revert "Trace enter/exit of TorchFunctionModes (#135422)" (#136590)
Summary: This reverts commit 7743149b2be4a9eba7e0997ccdc6abe552bec266. Reverts * pytorch/pytorch#135503 * pytorch/pytorch#135502 * pytorch/pytorch#135422 This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation. ``` import torch from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention from torch._inductor.lowering import make_pointwise, register_lowering from torch._inductor.virtualized import ops from torch.nn.attention.flex_attention import create_block_mask torch.set_default_device('cuda') flex_attention = torch.compile(flex_attention, dynamic=False) prefix_lengths = torch.arange(8) def prefix_lm(b, h, q, kv): return prefix_lengths[b] >= kv mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True) ``` X-link: pytorch/pytorch#136590 Approved by: https://github.com/Chillee Reviewed By: atalman Differential Revision: D63431470 Pulled By: anijain2305 fbshipit-source-id: 60915b30336121b845af71f423582c22a6c65c3f
Configuration menu - View commit details
-
Copy full SHA for a31c3fe - Browse repository at this point
Copy the full SHA a31c3feView commit details -
Summary: Add new metric `--metric nsys` to collect nsys trace. Reviewed By: htyu Differential Revision: D63274918 fbshipit-source-id: 0536310df6290ea5f5a02d85cc0ad6d342d45dbd
Configuration menu - View commit details
-
Copy full SHA for 2edf80c - Browse repository at this point
Copy the full SHA 2edf80cView commit details
Commits on Sep 28, 2024
-
Fix bug pytorch#2458 (pytorch#2459)
Summary: pytorch#2458 Pull Request resolved: pytorch#2459 Reviewed By: xuzhao9 Differential Revision: D63476542 Pulled By: kit1980 fbshipit-source-id: 01e9db9cb03d34e82a773897417df2ccda410634
Configuration menu - View commit details
-
Copy full SHA for 0f05015 - Browse repository at this point
Copy the full SHA 0f05015View commit details -
Restore FlexAttention and FlashV3 backward (pytorch#2473)
Summary: Pull Request resolved: pytorch#2473 Reviewed By: xuzhao9 Differential Revision: D63543625 Pulled By: bertmaher fbshipit-source-id: 1693e15875544bda0f5f6c69daa5597fffd80509
Configuration menu - View commit details
-
Copy full SHA for 611bf70 - Browse repository at this point
Copy the full SHA 611bf70View commit details
Commits on Oct 1, 2024
-
Fix hardcoded shape in low_mem_dropout benchmark (pytorch#2475)
Summary: Pull Request resolved: pytorch#2475 Reviewed By: htyu Differential Revision: D63653081 Pulled By: xuzhao9 fbshipit-source-id: 8d840986779b6124cbccc2425c24e2b892d55ce4
Configuration menu - View commit details
-
Copy full SHA for 252a3b1 - Browse repository at this point
Copy the full SHA 252a3b1View commit details -
Summary: We had the imports wrong for the internal port. Reviewed By: xuzhao9, adamomainz Differential Revision: D63643617 fbshipit-source-id: 04a49d419fede71d2681dedbfb55112a67cb4d55
Configuration menu - View commit details
-
Copy full SHA for b6b67a4 - Browse repository at this point
Copy the full SHA b6b67a4View commit details -
Skip loading triton.nvidia.cublas if not found
Summary: We have an old triton internally that doesn't have the cublasLt bindings Reviewed By: adamomainz Differential Revision: D63643619 fbshipit-source-id: 39aece74b52f7747fe2100d7bb905bad49ba1fa0
Configuration menu - View commit details
-
Copy full SHA for 0611c41 - Browse repository at this point
Copy the full SHA 0611c41View commit details -
Print TMA benchmark info to stderr
Summary: X-link: facebookresearch/FBGEMM#301 X-link: pytorch/FBGEMM#3202 Printing warnings to stdout mucks up the output of various tools/benchmarks Reviewed By: xuzhao9, htyu Differential Revision: D63643615 fbshipit-source-id: 1f34508a7fd36f5aa421e11bddd5ce77fc13038a
Configuration menu - View commit details
-
Copy full SHA for 0cb1e96 - Browse repository at this point
Copy the full SHA 0cb1e96View commit details -
Modernize cutlass call for fp8 blockwise
Summary: FBGEMM has changed how it declares its Cutlass-based blockwise gemm. Reviewed By: htyu, sijiac, adamomainz Differential Revision: D63643618 fbshipit-source-id: e46e3bbd2e07be0653f7c7fa6bd080b6c8db171e
Configuration menu - View commit details
-
Copy full SHA for 2d9ab0b - Browse repository at this point
Copy the full SHA 2d9ab0bView commit details -
CSV of extra shapes for gemm benchmarks
Summary: We have a big list of interesting shapes for blockwise/rowwise scaled gemm. A lot of these are variants of llama. We might want to use them for gemm and fp8_gemm (unscaled) as well, but for now just do blockwise/rowwise Reviewed By: xuzhao9, adamomainz Differential Revision: D63643616 fbshipit-source-id: 328961fe8c91e66428fcd1e5b72c89813f58a5a3
Configuration menu - View commit details
-
Copy full SHA for d512e67 - Browse repository at this point
Copy the full SHA d512e67View commit details -
Summary: We were only benchmarking `row-major x row-major` gemms (also called `TT` or `transpose-transpose`, because FORTRAN), which is actually not the common case; `nn.Linear` will use column-major layouts for weights, which means `TN` is actually much more common. Reviewed By: adamomainz Differential Revision: D63714661 fbshipit-source-id: 735c25c59ddeb6596afd9b19f463af92036a830b
Configuration menu - View commit details
-
Copy full SHA for 4445aa2 - Browse repository at this point
Copy the full SHA 4445aa2View commit details
Commits on Oct 2, 2024
-
Enable fp8 rowwise on AMDGPU (pytorch#2483)
Summary: Pull Request resolved: pytorch#2483 Reviewed By: karthik-man Differential Revision: D63726031 fbshipit-source-id: dc410e503f918d83362fb38005ac4a6db5dc1e68
Configuration menu - View commit details
-
Copy full SHA for f2932b7 - Browse repository at this point
Copy the full SHA f2932b7View commit details -
Ignore Torchbench CI on Tritonbench paths (pytorch#2481)
Summary: Right now, Tritonbench is still sharing codebase with Torchbench. Skip the Torchbench tests when the PR is on Tritonbench paths. Pull Request resolved: pytorch#2481 Reviewed By: kit1980 Differential Revision: D63695702 Pulled By: xuzhao9 fbshipit-source-id: cc88e0a987ecca1daf09d35ddeca18f07bef9077
Configuration menu - View commit details
-
Copy full SHA for a8ce4b5 - Browse repository at this point
Copy the full SHA a8ce4b5View commit details
Commits on Oct 3, 2024
-
Add _dynamo.config inline_inbuilt_nn_modules and specialize_float log…
…ging (#137139) Summary: X-link: pytorch/pytorch#137139 Approved by: https://github.com/ezyang Reviewed By: PaliC Differential Revision: D63783497 Pulled By: jovianjaison fbshipit-source-id: 5abe70d558917a9807e33be8181d42ef240c5a95
Configuration menu - View commit details
-
Copy full SHA for 737084e - Browse repository at this point
Copy the full SHA 737084eView commit details -
Add non-persistent fp8 triton_rowwise kernel (pytorch#2484)
Summary: Pull Request resolved: pytorch#2484 X-link: pytorch/FBGEMM#3212 X-link: facebookresearch/FBGEMM#308 triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning. Reviewed By: htyu Differential Revision: D63741099 fbshipit-source-id: c276415ddf8f5d24ffeba70b8ee6493011b393e1
Configuration menu - View commit details
-
Copy full SHA for 6b4f339 - Browse repository at this point
Copy the full SHA 6b4f339View commit details
Commits on Oct 4, 2024
-
Bump transformer version (pytorch#2488)
Summary: Bump transformer version to enable linger-kernels Pull Request resolved: pytorch#2488 Reviewed By: FindHao Differential Revision: D63860019 Pulled By: xuzhao9 fbshipit-source-id: f607c5553169c61270e4f5271d8375d7f227bd82
Configuration menu - View commit details
-
Copy full SHA for 12820bc - Browse repository at this point
Copy the full SHA 12820bcView commit details -
Add multiple ops support for --op argument (pytorch#2490)
Summary: Allow users benchmark multiple ops in a single run. The ops can be split by commas, `--op fp8_gemm,addmm` Example output: ``` % python run_benchmark.py triton --op fp8_gemm,addmm --num-inputs 1 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.12s/it] x_val torch_fp8_gemm-gbps torch_fp8_gemm-gbps torch_fp8_gemm-latency torch_fp8_gemm-tflops triton_fp8_gemm-gbps triton_fp8_gemm-gbps triton_fp8_gemm-latency triton_fp8_gemm-tflops ------------------ --------------------- --------------------- ------------------------ ----------------------- ---------------------- ---------------------- ------------------------- ------------------------ (1024, 1024, 1024) 462.202 462.202 0.00907462 236.647 630.43 630.43 0.00665309 322.78 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00, 5.90s/it] (M, N, K) aten_addmm-best_config aten_addmm-gbps aten_addmm-tflops triton_addmm-best_config triton_addmm-gbps triton_addmm-tflops pt2_triton_matmul-best_config pt2_triton_matmul-gbps pt2_triton_matmul-tflops ------------------ ------------------------ ----------------- ------------------- ------------------------------------------------------------------------------------------------------------- ------------------- --------------------- ------------------------------- ------------------------ -------------------------- (20120, 512, 1536) 818.112 247.544 {'BLOCK_M': 128, 'BLOCK_N': 256, 'BLOCK_K': 64, 'GROUP_M': 8, 'num_warps': 8, 'num_ctas': 1, 'num_stages': 3} 911.569 275.823 889.125 269.031 ``` Pull Request resolved: pytorch#2490 Reviewed By: xuzhao9 Differential Revision: D63862548 Pulled By: FindHao fbshipit-source-id: 9d4afa6051d4191bc2e3288f59e2820627647b91
Configuration menu - View commit details
-
Copy full SHA for a1f4b2e - Browse repository at this point
Copy the full SHA a1f4b2eView commit details -
Add FusedLinearCrossEntropy (pytorch#2485)
Summary: As discussed in pytorch/pytorch#136168, I'm going to migrate implementations of operator benchmarking. This PR adds different implementations for FusedLinearCrossEntropy as a starting example. Execution command: ``` python run_benchmark.py triton --op FusedLinearCrossEntropy ``` Example output: ``` x_val LMHeadCE-latency LigerLMHeadCE-latency inductor_fused_linear_cross_entropy-latency ------- ------------------ ----------------------- --------------------------------------------- 0 98.0041 389.87 95.0412 1 196.12 652.619 193.219 2 417.242 1248.75 416.725 3 824.906 2356.25 809.56 ``` Pull Request resolved: pytorch#2485 Reviewed By: xuzhao9 Differential Revision: D63859871 Pulled By: FindHao fbshipit-source-id: 4b73a2144702c1f8f3ae5ed15e76112d03f12b87
Configuration menu - View commit details
-
Copy full SHA for dde8528 - Browse repository at this point
Copy the full SHA dde8528View commit details
Commits on Oct 5, 2024
-
Add user release benchmark so that we can run it on pull request (pyt…
…orch#2489) Summary: Pull Request resolved: pytorch#2489 Reviewed By: xuzhao9 Differential Revision: D63898689 Pulled By: atalman fbshipit-source-id: 3cd430911aadd5972f1393e3548ef7d52b93b661
Configuration menu - View commit details
-
Copy full SHA for bde2401 - Browse repository at this point
Copy the full SHA bde2401View commit details
Commits on Oct 7, 2024
-
Summary: Remove nvidia-cuda-nvcc-cu12 as not required. Install time. Pull Request resolved: pytorch#2493 Reviewed By: xuzhao9 Differential Revision: D63987509 Pulled By: atalman fbshipit-source-id: 07298ddb569da7f7c3fe22d73da72a4ceab256f5
Configuration menu - View commit details
-
Copy full SHA for eae9e50 - Browse repository at this point
Copy the full SHA eae9e50View commit details
Commits on Oct 8, 2024
-
Add Tritonbench CI (pytorch#2494)
Summary: Add a PR CI on Tritonbench that installs the latest Triton nightly package Pull Request resolved: pytorch#2494 Reviewed By: chenyang78 Differential Revision: D63998525 Pulled By: xuzhao9 fbshipit-source-id: a26633de040bdf324e9ae5c9b130ec1a58dfd409
Configuration menu - View commit details
-
Copy full SHA for 1ac701f - Browse repository at this point
Copy the full SHA 1ac701fView commit details -
Log compile ids to pt2_remote_cache and pt2_compile_events
Summary: X-link: pytorch/pytorch#137431 Log the current compilation id for all relevant samples for these two tables, so we can have a 1:1 analog with dynamo_compile. ghstack-source-id: 246618821 exported-using-ghexport Reviewed By: oulgen Differential Revision: D63900826 fbshipit-source-id: 3f2896287777c94344960e7cad131f71aaf0210f
Configuration menu - View commit details
-
Copy full SHA for 85c33e5 - Browse repository at this point
Copy the full SHA 85c33e5View commit details
Commits on Oct 9, 2024
-
Trace enter/exit of TorchFunctionModes (#135422) (#137114)
Summary: This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode) Typically the bytecode for a context manager looks like this during a graph break: 1. graph call 2. enter context 3. unsupported code 4. exit context 5. resume call resume fn structure: 1. enter context 2. jump ... 3. exit context The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack). So for torch function modes the structure of our output code is this: 1. graph call 2. mutate tf mode stack to replay mutations 4. unsupported code 5. on exception restore stack 6. resume function Then our resume fn looks like this: 1. no-op enter torch function mode 2. jump 3. exit tf mode To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context). Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly. Approved by: https://github.com/williamwen42 ghstack dependencies: #134732, #133137, #135443, #135444 X-link: pytorch/pytorch#137114 Approved by: https://github.com/yanboliang Reviewed By: jovianjaison Differential Revision: D64088005 Pulled By: mlazos fbshipit-source-id: 156b9bf38a535933f8dd966ee96ed3099d7b4be2
Configuration menu - View commit details
-
Copy full SHA for 79043be - Browse repository at this point
Copy the full SHA 79043beView commit details -
Remove ignored modes workaround (#135502) (#137115)
Summary: Approved by: https://github.com/anijain2305 ghstack dependencies: #134732, #133137, #135443, #135444, #135422 X-link: pytorch/pytorch#137115 Approved by: https://github.com/yanboliang ghstack dependencies: #137114 Reviewed By: jovianjaison Differential Revision: D64088016 Pulled By: mlazos fbshipit-source-id: 53efb5a6e689d4fb6112a6462851ee7e81b28c24
Configuration menu - View commit details
-
Copy full SHA for 39d65a4 - Browse repository at this point
Copy the full SHA 39d65a4View commit details -
Handle torch function subclass/mode dispatch on generic tensor method…
…s (#137119) Summary: X-link: pytorch/pytorch#137119 Approved by: https://github.com/williamwen42, https://github.com/anijain2305 ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227 Reviewed By: jovianjaison Differential Revision: D64088048 Pulled By: mlazos fbshipit-source-id: 34fe09f7fa6292d89a438b780852f00e042ec950
Configuration menu - View commit details
-
Copy full SHA for 4fd7c74 - Browse repository at this point
Copy the full SHA 4fd7c74View commit details -
adding new configs for servicelab
Summary: adding new configs for servicelab + logging to scuba Follow up diff coming up to add aggregates into logging (ie harmonic mean) Reviewed By: xuzhao9 Differential Revision: D64126688 fbshipit-source-id: 0c3705e82071f1399cfc53ff496d130adf237b73
Configuration menu - View commit details
-
Copy full SHA for 533d258 - Browse repository at this point
Copy the full SHA 533d258View commit details -
Improve release benchmark suites with a lower value of epoch (pytorch…
…#2482) Summary: pytorch#2468 Pull Request resolved: pytorch#2482 Reviewed By: xuzhao9 Differential Revision: D64139543 Pulled By: atalman fbshipit-source-id: 2d030c66d856387b6a2451b26c89fd40e79e0e53
Configuration menu - View commit details
-
Copy full SHA for 7742ef2 - Browse repository at this point
Copy the full SHA 7742ef2View commit details
Commits on Oct 10, 2024
-
Check dyno and dcgm existence before disable them (pytorch#2496)
Summary: For systems without dyno or dcgm installed and running without sudo, the `ncu_rep` metric will get stuck asking for a sudo password. This PR checks the command or service existence before disabling them to avoid getting stuck. Pull Request resolved: pytorch#2496 Reviewed By: xuzhao9 Differential Revision: D64141793 Pulled By: FindHao fbshipit-source-id: 8d52468f04e7e5a0e8d23f3562a14c83d4a5934c
Configuration menu - View commit details
-
Copy full SHA for dcd3d31 - Browse repository at this point
Copy the full SHA dcd3d31View commit details -
combining CI and servicelab configs
Summary: instead of having 1 list of configs for OSS ci and another set for servicelab we combine both here into one common dictionary Reviewed By: danzimm Differential Revision: D64183688 fbshipit-source-id: fa47780a3bf3ba8669c6e8fd406cff5542fd06e6
Configuration menu - View commit details
-
Copy full SHA for 3a7a4fe - Browse repository at this point
Copy the full SHA 3a7a4feView commit details -
Summary: put an extra `m` by mistake in one of the configs and it is breaking in OSS Reviewed By: plotfi Differential Revision: D64208508 fbshipit-source-id: f1461da0a5e883ffd4266206f5e3b737f468c3b2
Configuration menu - View commit details
-
Copy full SHA for b56e2ee - Browse repository at this point
Copy the full SHA b56e2eeView commit details
Commits on Oct 11, 2024
-
use 3.13 multiline traceback in get_instruction_source_311 (#137617)
Summary: X-link: pytorch/pytorch#137617 Approved by: https://github.com/jansel Reviewed By: jovianjaison Differential Revision: D64202324 Pulled By: williamwen42 fbshipit-source-id: 526f32cabeb891c8c9481799f45436cfd19e7dc2
Configuration menu - View commit details
-
Copy full SHA for f3921ca - Browse repository at this point
Copy the full SHA f3921caView commit details -
differentiating between some Fbsource only targets and OSS for CI
Summary: TSIA Reviewed By: danzimm, aakhundov Differential Revision: D64268555 fbshipit-source-id: e380f9401b08c2b7d9a48bedc6d791b9b39cd533
Configuration menu - View commit details
-
Copy full SHA for 3900904 - Browse repository at this point
Copy the full SHA 3900904View commit details
Commits on Oct 12, 2024
-
Format
.ci/
/.github/
/benchmarks/
/functorch/
/tools/
……/ `torchgen/` with `ruff format` (#132577) Summary: X-link: pytorch/pytorch#132577 Approved by: https://github.com/malfet Reviewed By: jovianjaison Differential Revision: D64256966 fbshipit-source-id: e9725ccc5a814ef3b30e244e988ed9b7238b6ccb
Configuration menu - View commit details
-
Copy full SHA for f9f52f6 - Browse repository at this point
Copy the full SHA f9f52f6View commit details -
Add AtenOp Benchmarking (pytorch#2495)
Summary: As described in pytorch/pytorch#136168, I'm trying to migrate native PyTorch implementation comparison([the original operatorbench](https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/microbenchmarks/operatorbench.py)) to TritonBench. This PR adds an Operator Loader which can load aten ops used in TorchBench, HuggingFace, and TIMM models. The benchmark classes are dynamically created. Then benchmark them between aten and inductor implementations. Files `torchbenchmark/operator_loader/operator_inp_utils.py`, `torchbenchmark/operator_loader/operatorbench.py`, and all configs files in `torchbenchmark/operator_loader/operator_inp_logs/` are copied from original operatorbench. Example commands: ```bash python run_benchmark.py triton --op aten._softmax.default --num-inputs 1 --operator-loader --precision fp16 ``` Exampled Output: ``` Evaluating an op name into an OpOverload: The underlying op of 'aten.upsample_nearest2d_backward' has no overload name 'vec' Evaluating an op name into an OpOverload: '_OpNamespace' 'aten' object has no attribute 'im2col_backward' Evaluating an op name into an OpOverload: '_OpNamespace' 'aten' object has no attribute 'col2im_backward' Evaluating an op name into an OpOverload: '_OpNamespace' 'aten' object has no attribute 'im2col_backward' Evaluating an op name into an OpOverload: The underlying op of 'aten.upsample_bilinear2d_backward' has no overload name 'vec' Evaluating an op name into an OpOverload: The underlying op of 'aten.upsample_nearest2d_backward' has no overload name 'vec' 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.20s/it] x_val eager-latency inductor-latency ------- --------------- ------------------ 0 0.090592 0.089632 1 0.055808 0.038112 ``` Pull Request resolved: pytorch#2495 Reviewed By: xuzhao9 Differential Revision: D64200358 Pulled By: FindHao fbshipit-source-id: f0121168b33247224bc905a1a88af69e4b13def6
Configuration menu - View commit details
-
Copy full SHA for 34d4f94 - Browse repository at this point
Copy the full SHA 34d4f94View commit details -
change GPT2ForSequenceClassification inference accuracy tolerance (#1…
…36749) Summary: Fixes pytorch/pytorch#123503. pytorch/pytorch#121866 makes GPT2ForSequenceClassification hit the SDPA pattern 18 and then encounter the accuracy issue. The issue only happens with BF16 inference single thread. This PR tends to increase the model tolerance from 4e-3 to 5e-3 and make the check pass. Note that the issue is due to some small implementation diff. For example, the sdpa math backend scales q, k before matmul for stability; the flash attention backend has more diffs as a new algorithm. X-link: pytorch/pytorch#136749 Approved by: https://github.com/jgong5, https://github.com/jansel Reviewed By: jovianjaison Differential Revision: D64290722 fbshipit-source-id: a3e7248f57a97cd767257354d410b3508b5e0325
Configuration menu - View commit details
-
Copy full SHA for 680d64e - Browse repository at this point
Copy the full SHA 680d64eView commit details
Commits on Oct 14, 2024
-
making CI more flexible for extra data in tritonbench
Summary: TSIA Reviewed By: danzimm Differential Revision: D64334048 fbshipit-source-id: d01b20161407400d0afd28460bce8095c91d9056
Configuration menu - View commit details
-
Copy full SHA for 28d301a - Browse repository at this point
Copy the full SHA 28d301aView commit details -
Add entire _dynamo.config as a json for logging (#137216)
Summary: X-link: pytorch/pytorch#137216 Approved by: https://github.com/ezyang Reviewed By: clee2000 Differential Revision: D64290696 Pulled By: jovianjaison fbshipit-source-id: 06886bfb7e3f37895e3a8bf567366e4c4cc1d248 Co-authored-by: Aaron Gokaslan <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7cb1c0a - Browse repository at this point
Copy the full SHA 7cb1c0aView commit details -
sakipping null values in scribe message
Summary: since we have added flexibility for different sets of metrics per operater we want to skip messages for empty metrics Reviewed By: nmacchioni Differential Revision: D64345289 fbshipit-source-id: d5b1fff90c6acd530867d0b6ef3ea97bc6f41cf5
Configuration menu - View commit details
-
Copy full SHA for 509e94f - Browse repository at this point
Copy the full SHA 509e94fView commit details
Commits on Oct 16, 2024
-
Add fbscribelogger to Dynamo benchmark runner (#137867)
Summary: Signed-off-by: Edward Z. Yang <[email protected]> X-link: pytorch/pytorch#137867 Approved by: https://github.com/bobrenjc93 Reviewed By: clee2000 Differential Revision: D64418349 Pulled By: ezyang fbshipit-source-id: 265e07753a3549e6866d45fbdb8a435b6e7dc787
Configuration menu - View commit details
-
Copy full SHA for 12e1d26 - Browse repository at this point
Copy the full SHA 12e1d26View commit details -
Update the flash-attention submodule (pytorch#2500)
Summary: We need https://github.com/Dao-AILab/flash-attention/pull/1053/files to externally import `flash_attn_interface` for FA3. Pull Request resolved: pytorch#2500 Reviewed By: bertmaher Differential Revision: D64190441 Pulled By: xuzhao9 fbshipit-source-id: ff20f0a28514b645c828853e7f15808ed1597ae6
Configuration menu - View commit details
-
Copy full SHA for ea4433f - Browse repository at this point
Copy the full SHA ea4433fView commit details -
Add host-side Triton TMA support to Dynamo (#137677)
Summary: This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes: - Here we assume the availability of the host-side TMA API added to upstream Triton in triton-lang/triton#4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024. - To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created. - The newly introduced variables have `reconstruct` methods used in case of graph breaks. - The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like. - In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors. - In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required. - JIT Inductor and AOT Inductor support will be implemented in follow-up PRs. X-link: pytorch/pytorch#137677 Approved by: https://github.com/zou3519 Reviewed By: clee2000 Differential Revision: D64404928 Pulled By: aakhundov fbshipit-source-id: c812cea3867c55800d5fe213bf07bf21292345e3
Configuration menu - View commit details
-
Copy full SHA for db41e77 - Browse repository at this point
Copy the full SHA db41e77View commit details -
Add ncu report analyzer (pytorch#2497)
Summary: This PR adds a ncu report analyzer to analyze the profiled ncu report. It also adds two metrics `memory_traffic` and `arithmetic_intensity`. To avoid excessive profiling overhead, we only profile with necessary ncu metrics. This PR is a part of [operator benchmarking plan](pytorch/pytorch#136168) Example commands: ``` python run_benchmark.py triton --op gather_gemv --num-inputs 1 --metrics memory_traffic,arithmetic_intensity --csv ``` Example output: ``` 0%| | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 508958 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) ==PROF== Profiling "index_elementwise_kernel" - 0: 0%....50%....100% - 3 passes ==PROF== Profiling "unrolled_elementwise_kernel" - 1: 0%....50%....100% - 3 passes ==PROF== Profiling "gemv2T_kernel_val" - 2: 0%....50%....100% - 3 passes 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.89s/it] x_val;test_eager-_ncu_trace_in_task 2048;success ==PROF== Disconnected from process 508958 ==WARNING== No source files were imported. Check that the target application was compiled with -lineinfo. ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_eager_0/ncu_output.ncu-rep 0%| | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 509121 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) ==PROF== Profiling "triton_red_fused_mv_0" - 0: 0%....50%....100% - 3 passes 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.79s/it] x_val;test_0-_ncu_trace_in_task 2048;success ==PROF== Disconnected from process 509121 ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_0_0/ncu_output.ncu-rep 0%| | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 509285 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) ==PROF== Profiling "triton_red_fused_mv_0" - 0: 0%....50%....100% - 3 passes ==PROF== Connected to process 509433 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.07s/it] x_val;test_inductor-_ncu_trace_in_task 2048;success ==PROF== Disconnected from process 509285 ==PROF== Disconnected from process 509433 ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_inductor_0/ncu_output.ncu-rep 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.99s/it] x_val;test_eager-arithmetic_intensity;test_eager-memory_traffic;test_eager-weighted_fp32_arithmetic_intensity;test_0-arithmetic_intensity;test_0-memory_traffic;test_0-weighted_fp32_arithmetic_intensity;test_inductor-arithmetic_intensity;test_inductor-memory_traffic;test_inductor-weighted_fp32_arithmetic_intensity 2048;(0.14937214493924472, 0.0);(29467392.0, 505856.0);0.14937214493924472;(4.364079147640791, 0.0);(4204544.0, 256.0);4.364079147640791;(9.97989888530182, 0.0);(4202752.0, 0.0);9.97989888530182 ``` according to ncu, there can be multiple roofline charts on different granularity, such as single precision, double precision, tensorcore, and half precision. Pull Request resolved: pytorch#2497 Reviewed By: xuzhao9 Differential Revision: D64359055 Pulled By: FindHao fbshipit-source-id: a02a4ebfcac5c5209f4196aac5a8eb4f91b3de87
Configuration menu - View commit details
-
Copy full SHA for 21cc30d - Browse repository at this point
Copy the full SHA 21cc30dView commit details -
Change default gpu metric backend (pytorch#2501)
Summary: The current GPU memory metric backend includes dcgm and nvml. They are reported from hardware and should be accurate. This PR adds the native torch way to collect GPU memory usage. It uses `torch.cuda.max_memory_allocated()`. The benefit is that it has lower overhead and is accurate on a shared GPU server when there are mutliple GPU processes from other users. It is because we don't implement the process filter for the other two backends. Use `--metrics-gpu-backend torch` to set the backend. Pull Request resolved: pytorch#2501 Reviewed By: xuzhao9 Differential Revision: D64253410 Pulled By: FindHao fbshipit-source-id: 09b0579846a6830e0e9735e8daeba4abd88bab17
Configuration menu - View commit details
-
Copy full SHA for c396191 - Browse repository at this point
Copy the full SHA c396191View commit details -
Update 2.5.0.yaml (pytorch#2498)
Summary: Pull Request resolved: pytorch#2498 Reviewed By: kit1980 Differential Revision: D64407151 Pulled By: atalman fbshipit-source-id: 0637d812144f13dad41b640e70fd65619a183c67
Configuration menu - View commit details
-
Copy full SHA for 9e670cd - Browse repository at this point
Copy the full SHA 9e670cdView commit details
Commits on Oct 17, 2024
-
Add --op-collection option (pytorch#2503)
Summary: This PR add `--op-collection` to tritonbench. It can run multiple ops in defined operator collections. The default collection includes all ops not included in other collections. Operator collections are defined in `torchbenchmark/operators_collection/`. For each collection, you should define a `get_operators` function to return operators included in this collection. Pull Request resolved: pytorch#2503 Reviewed By: xuzhao9 Differential Revision: D64359380 Pulled By: FindHao fbshipit-source-id: c66dd254a3c8b70c112d9b7774482813e0236789
Configuration menu - View commit details
-
Copy full SHA for 58f3b1f - Browse repository at this point
Copy the full SHA 58f3b1fView commit details -
Summary: Update imports for latest updates + silu_mul interface change Reviewed By: jianyuh Differential Revision: D64516452 fbshipit-source-id: b9b98a6eda45a093661e8b23f6b8ec300b559960
Configuration menu - View commit details
-
Copy full SHA for 2feadb6 - Browse repository at this point
Copy the full SHA 2feadb6View commit details -
Add doc for adding custom ops (pytorch#2509)
Summary: Add documentation for adding custom ops. Pull Request resolved: pytorch#2509 Reviewed By: xuzhao9 Differential Revision: D64497281 Pulled By: FindHao fbshipit-source-id: 20f4096ebbce53c7d9a713cacbde016c521aa7c3
Configuration menu - View commit details
-
Copy full SHA for d933ced - Browse repository at this point
Copy the full SHA d933cedView commit details -
Summary: As the title goes. Reviewed By: bertmaher Differential Revision: D64480822 fbshipit-source-id: ec1d17be0619fb35d4d8f774eab2858e75afe2e3
Configuration menu - View commit details
-
Copy full SHA for 384a43d - Browse repository at this point
Copy the full SHA 384a43dView commit details -
Test backward pass in unit test.
Summary: In unit test, run both forward and backward pass. If the backward pass throws `NotImplementedError`, skip the test since the operator does not support backward pass. Reviewed By: int3 Differential Revision: D64471087 fbshipit-source-id: c9d0c43544314fc11305f271e8e80f7ba07b2675
Configuration menu - View commit details
-
Copy full SHA for eec8612 - Browse repository at this point
Copy the full SHA eec8612View commit details -
Make sure all ci-enabled impls are in the output
Summary: In the CI, we will check that all registered impls are available in the output, unless they are specified as`ci=False`. We add the `ci=` flag because right now we don't have lazy imports to import optional backend modules while we want different behavior between flags `enabled` and `ci`. For `enabled` flag, we want "best-effort". If a module is not available (e.g. flash attention 3 is not available on A100), we should check if it is not available, then skip it automatically instead of error out for the best user experience. For `ci` flag, we want to make sure that things are already setup in fbcode CI, and if flash attention 3 is not available, it is a red flag and we have to report it in the unit test. Reviewed By: bertmaher Differential Revision: D64473609 fbshipit-source-id: 320255f73942705038d50aac1f14d318b62a4765
Configuration menu - View commit details
-
Copy full SHA for 00c9b9e - Browse repository at this point
Copy the full SHA 00c9b9eView commit details
Commits on Oct 18, 2024
-
Update AOTEagerandRecordGraphs backend (#138231)
Summary: X-link: pytorch/pytorch#138231 Approved by: https://github.com/StrongerXi, https://github.com/mlazos, https://github.com/aakhundov Reviewed By: clee2000 Differential Revision: D64581452 Pulled By: anijain2305 fbshipit-source-id: 3b9ff53abf2c4e1c525d7e62a52285279d2d4109
Configuration menu - View commit details
-
Copy full SHA for 04f0e6c - Browse repository at this point
Copy the full SHA 04f0e6cView commit details -
Log is_forward field to dynamo_compile scuba table (pytorch#2511)
Summary: Pull Request resolved: pytorch#2511 X-link: pytorch/pytorch#138097 ^^ Reviewed By: ezyang Differential Revision: D64438144 fbshipit-source-id: 87a5518d4d9318132d269302c93a285bf86f3a46
Configuration menu - View commit details
-
Copy full SHA for e89c1b3 - Browse repository at this point
Copy the full SHA e89c1b3View commit details -
Revamp PT2 Compile/chromium event logging [1/?]
Summary: X-link: pytorch/pytorch#138093 This diff is the starting steps of https://docs.google.com/document/u/2/d/1kAEBt4AyW7HTAhXHbjoz8FBFHNyyEA2Qo2mPn7v3WUQ/edit?usp=drive_web&ouid=113555078003219714709 It implements the following changes: - Only log spans to scuba, so no start events are ever logged - Log events as the full event name, without "START" or "END" - Only log to scuba major phases from chromium events. These are: - entire_frame_compile (dynamo) - backend_compile (aotdispatch) - inductor_compile (inductor) - codegen (inductor codegen) Tlparse chromium events stay basically the same. But I implemented a few changes to clean that up as well: - When there's a phase name available, log the phase name instead of the function name as the event name. This simplifies the trace to not have two identical rows. The fn_name is avaliable as metadata on the chromium event, if interested - Log new events for pre and post grad passes. These do *not* log to scuba. By making the phases much simpler in Scuba, with only categories for major phases of PT2 Compilation, we pave the way to add **much** more metadata and information to each individual event type. Diffs for that will come later. **IMPLEMENTATION NOTES:** - The logic for `log_chromium_event_internal` (which is the function that logs to Scuba) lives in chromium_events for now, but in the future as we add more metadata, it may belong independently in dynamo_timed or even outside of dynamo_timed. I haven't explored in detail what the refactor will look like. Once we start logging metadata for dynamo, aotdispatch, inductor, I suspect we will call log_pt2_compile_event directly, instead of making chromium event logger handle the pt2_compile_event logic. But that refactor is left for another PR on top of this one. - There's an interesting space after pre grad passes within AOT autograd logic, that's between create_aot_dispatcher_function and pre grad passes. I'm not sure what we're spending time doing in that time, but I'll find out with a profile later. ghstack-source-id: 248790387 Reviewed By: oulgen Differential Revision: D64479033 fbshipit-source-id: 1f30e734160bfed2f664063b5b2f4df1b661dfa4
Configuration menu - View commit details
-
Copy full SHA for 8358f92 - Browse repository at this point
Copy the full SHA 8358f92View commit details -
Revert D64438144: Log is_forward field to dynamo_compile scuba table
Differential Revision: D64438144 Original commit changeset: 87a5518d4d93 Original Phabricator Diff: D64438144 fbshipit-source-id: 3acb559a632ce345a1c3c88edc9007c0a9e5d40c
Configuration menu - View commit details
-
Copy full SHA for f7dc0c7 - Browse repository at this point
Copy the full SHA f7dc0c7View commit details -
adding aggregates to servicelab
Summary: current aggregation does not seem to be working as expected. Adding another aggregation field before changing the previous over Reviewed By: xuzhao9 Differential Revision: D64616616 fbshipit-source-id: 676f09035e0d4427e9b60e9ed8f8c790782f0aec
Configuration menu - View commit details
-
Copy full SHA for 0a9cd8f - Browse repository at this point
Copy the full SHA 0a9cd8fView commit details -
specifying logged benchmark name for tritonBench servicelab logging
Summary: more specific logging in our logging table based on servicelab benchmark names Reviewed By: nmacchioni Differential Revision: D64627855 fbshipit-source-id: 47e250c5d8a34a912e7885e1f997a90a9dd8bc10
Configuration menu - View commit details
-
Copy full SHA for e737b8f - Browse repository at this point
Copy the full SHA e737b8fView commit details
Commits on Oct 19, 2024
-
replace uses of np.ndarray with npt.NDArray
Summary: X-link: pytorch/opacus#681 X-link: pytorch/captum#1389 X-link: pytorch/botorch#2586 X-link: pytorch/audio#3846 This replaces uses of `numpy.ndarray` in type annotations with `numpy.typing.NDArray`. In Numpy-1.24.0+ `numpy.ndarray` is annotated as generic type. Without template parameters it triggers static analysis errors: ```counterexample Generic type `ndarray` expects 2 type parameters. ``` `numpy.typing.NDArray` is an alias that provides default template parameters. Reviewed By: ryanthomasjohnson Differential Revision: D64619891 fbshipit-source-id: dffc096b1ce90d11e73d475f0bbcb8867ed9ef01
Configuration menu - View commit details
-
Copy full SHA for 06e35fc - Browse repository at this point
Copy the full SHA 06e35fcView commit details
Commits on Oct 21, 2024
-
Disable torch function compilation during guard execution and in comp…
…iled bytecode (#137669) Summary: Fixes pytorch/pytorch#114369 X-link: pytorch/pytorch#137669 Approved by: https://github.com/anijain2305 Reviewed By: wdvr Differential Revision: D64675139 Pulled By: mlazos fbshipit-source-id: a5e4501eaa781fcbd9423c99c555949182bd9f24
Configuration menu - View commit details
-
Copy full SHA for 0562040 - Browse repository at this point
Copy the full SHA 0562040View commit details -
fixing key error in aggregate data
Summary: for some reason OSS isnt happy with dict.get so im moving to this slightly less pythonic but more exact approach Reviewed By: bertmaher, sfzhu93 Differential Revision: D64698791 fbshipit-source-id: 48cc4b6f7df61287efdc71c30176c2830dfde110
Configuration menu - View commit details
-
Copy full SHA for a21b30e - Browse repository at this point
Copy the full SHA a21b30eView commit details
Commits on Oct 22, 2024
-
Replace __str__ with __repr__ in some places (#136316)
Summary: ## The problem In a typical debugger, `repr()` is used to display variables and not `str()`. Several classes in Dynamo have a `__str__()` method that returns useful information and a `__repr__()` that does not. Having to call `str(x)` or `[str(i) for i in x]` in the debugger all the time is a chore. `str()` should be ["informal, nicely printable"](https://docs.python.org/3/library/stdtypes.html#str) and `repr()` should ["attempt to return a string that would yield an object with the same value when passed to eval()](https://docs.python.org/3/library/functions.html#repr)". ## The solution In the Python object model, if there is no `__str__` method, `__repr__` is used instead (but not the other way around). So renaming `__str__` to `__repr__` in a few cases where no `__repr__` method exists now should not change observable behavior, and should make debugging easier. The specific classes changed were all in `torch._dynamo.variables`: * `builtin.BuiltinVariable` * `constant.ConstantVariable` * `constant.EnumVariable` * `functions.UserMethodVariable` * `lazy.LazyVariableTracker` * `lazy.LazySymNodeFormatString` * `misc.GetAttrVariable` * `misc.NullVariable` * `user_defined.UserDefinedObjectVariable` X-link: pytorch/pytorch#136316 Approved by: https://github.com/XuehaiPan, https://github.com/jansel Reviewed By: wdvr Differential Revision: D64714511 fbshipit-source-id: 322f2f0110e5b45afe6a27c52a0bcc91d91d1d6a
Configuration menu - View commit details
-
Copy full SHA for 173774d - Browse repository at this point
Copy the full SHA 173774dView commit details -
Update requirements.txt (pytorch#2523)
Summary: attempt to fix dependencies - this is no longer compatible with the latest huggingface_hub, see failing test at https://github.com/pytorch/pytorch/actions/runs/11445304501/job/31843081598 Pull Request resolved: pytorch#2523 Reviewed By: huydhn Differential Revision: D64711662 Pulled By: wdvr fbshipit-source-id: eed9143e6e0531840a53ba5ab3fad04894727272
Configuration menu - View commit details
-
Copy full SHA for a45e0db - Browse repository at this point
Copy the full SHA a45e0dbView commit details -
Fixes to prep for weights_only default flip (pytorch#2514)
Summary: Some fixes for pytorch/pytorch#137602 Pull Request resolved: pytorch#2514 Reviewed By: xuzhao9 Differential Revision: D64628614 Pulled By: mikaylagawarecki fbshipit-source-id: edebf25cc6648919d5673a3baeaffdac26e5b91f
Configuration menu - View commit details
-
Copy full SHA for fb590d9 - Browse repository at this point
Copy the full SHA fb590d9View commit details -
typing compile_fx.py (#138033)
Summary: Type annotations for compile_fx. - Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed. - There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py. X-link: pytorch/pytorch#138033 Approved by: https://github.com/Skylion007 Reviewed By: wdvr Differential Revision: D64714765 Pulled By: aorenste fbshipit-source-id: 262f5cb9b2171e96ce9f895772bd5778ddb4ebe0
Configuration menu - View commit details
-
Copy full SHA for 1154318 - Browse repository at this point
Copy the full SHA 1154318View commit details -
Add metadata to events in progress, new
dynamo
eventSummary: X-link: pytorch/pytorch#138477 This diff does a few things: ## Add metadata to events in progress Adds the ability to add extra metadata to Chromium Events via `add_event_data`. Metadata can only be added to chromium events that have started, but not ended (so, in progress events) - When you add the data, the metadata is appended to the metadata when you call log_event_end(). - The metadata appears in chromium events in tlparse. It also gets logged to scuba. ## New `dynamo` chromium event We add a new `dynamo` chromium event to the top of the stack, where we collect various metadata found in dynamo_compile. So the new order of events goes: ``` __start__ -> dynamo (dynamo compile metrics) -> entire_frame_compile (compile.inner) -> backend_compile (i.e. aotdispatch) -> create_aot_dispatch_function -> inductor_compile -> ... ``` BackwardCompilationMetrics doesn't have any dynamo specific information (as it's mostly inductor timings). So we don't include that here. *FAQ: Why can't we use `entire_frame_compile` as the event?* This is mostly due to backward compatibility with `dynamo_compile`. `dynamo_compile` collects CompilationMetrics outside of `compile.compile_inner`, and uses `dynamo_timed` to grab timings from phases of the compiler, including `entire_frame_compile`. So we don't have a CompilationMetric object until after an `entire_frame_compile` event ends! Separately, `dynamo` as a name for all of dynamo compile is more descriptive than `entire_frame_compile`, imo. ## Log metadata as separate columns (Meta only): Separately, this also changes the `metadata` column in PT2 Compile Events. Instead of logging a single metadata column in JSON, it separates the JSON into separate columns. This is much better for data analysis. Now that this table is more mature, I think logging keys to separate columns is a better system. ghstack-source-id: 249373269 Reviewed By: aorenste Differential Revision: D64696287 fbshipit-source-id: 441f57e2d1c0210e81c06eb86d4482e95bed4971
Configuration menu - View commit details
-
Copy full SHA for 8fce9c1 - Browse repository at this point
Copy the full SHA 8fce9c1View commit details
Commits on Oct 23, 2024
-
Log is_forward field to dynamo_compile scuba table (#138505)
Summary: X-link: pytorch/pytorch#138505 Approved by: https://github.com/oulgen Reviewed By: oulgen Differential Revision: D64711721 Pulled By: masnesral fbshipit-source-id: 488dd527d0b9179644ae5d6d45d88bdab0224032
Configuration menu - View commit details
-
Copy full SHA for e57bbe2 - Browse repository at this point
Copy the full SHA e57bbe2View commit details -
Compiled autograd configs in TLS (#137821)
Summary: Multithreaded doesn't work yet, this adds python side TLS only for the python side state X-link: pytorch/pytorch#137821 Approved by: https://github.com/jansel, https://github.com/yf225 ghstack dependencies: #137953 Reviewed By: wdvr Differential Revision: D64796212 Pulled By: xmfan fbshipit-source-id: aa1d9ef8f6e61207dfb352866e37d5e7cc98df42
Configuration menu - View commit details
-
Copy full SHA for 0e03831 - Browse repository at this point
Copy the full SHA 0e03831View commit details -
Summary: X-link: pytorch/pytorch#138061 Approved by: https://github.com/yf225 ghstack dependencies: #137953, #137821 Reviewed By: wdvr Differential Revision: D64796226 Pulled By: xmfan fbshipit-source-id: 9bf80c1492d7a800a308cb1e99fac63c4752fc52
Configuration menu - View commit details
-
Copy full SHA for 405ba75 - Browse repository at this point
Copy the full SHA 405ba75View commit details -
adding fp32 strict and tf32x3 benchmarks for gemm
Summary: TSIA draft diff while I move this to its own op Reviewed By: danzimm Differential Revision: D64781204 fbshipit-source-id: c3ddd956230c1e4c8166867f03b5a28e8d6586e9
Configuration menu - View commit details
-
Copy full SHA for 036012f - Browse repository at this point
Copy the full SHA 036012fView commit details
Commits on Oct 24, 2024
-
Support range_iterator as a function input (#138657)
Summary: Fixes pytorch/pytorch#138654 X-link: pytorch/pytorch#138657 Approved by: https://github.com/williamwen42, https://github.com/jansel Reviewed By: wdvr Differential Revision: D64881833 Pulled By: anijain2305 fbshipit-source-id: 46bcffa12ef2bec0ff47a1b60323aacbb3a90872
Configuration menu - View commit details
-
Copy full SHA for 367b6ef - Browse repository at this point
Copy the full SHA 367b6efView commit details -
Support overridden __call__ on nn modules (#138619)
Summary: X-link: pytorch/pytorch#138619 Approved by: https://github.com/williamwen42 ghstack dependencies: #138657 Reviewed By: wdvr Differential Revision: D64881836 Pulled By: anijain2305 fbshipit-source-id: 1974dbc228618e8597eb6ab293272ee985964f52
Configuration menu - View commit details
-
Copy full SHA for b5b342b - Browse repository at this point
Copy the full SHA b5b342bView commit details -
updating hardware and device columns
Summary: currently device and hardware are flipped in logging table due to args mismatch Reviewed By: xuzhao9 Differential Revision: D64911847 fbshipit-source-id: 2d75b17046eae2eed0d83f86140ad88dae26de29
Configuration menu - View commit details
-
Copy full SHA for 3245fde - Browse repository at this point
Copy the full SHA 3245fdeView commit details -
Release 2.5.1.yaml perf test (pytorch#2525)
Summary: Pull Request resolved: pytorch#2525 Reviewed By: kit1980 Differential Revision: D64912654 Pulled By: atalman fbshipit-source-id: 74cf57574c7ed5e1b6a4fee4b9c2de745deb21c0
Configuration menu - View commit details
-
Copy full SHA for 47ba1ed - Browse repository at this point
Copy the full SHA 47ba1edView commit details
Commits on Oct 25, 2024
-
Account for older numpy versions in pytorch#2514 (pytorch#2524)
Summary: Pull Request resolved: pytorch#2524 Reviewed By: kit1980 Differential Revision: D64771621 Pulled By: mikaylagawarecki fbshipit-source-id: 545f3d528cfbe2668c8d37e98e99423cd77a8e8e
Configuration menu - View commit details
-
Copy full SHA for 4f30c49 - Browse repository at this point
Copy the full SHA 4f30c49View commit details -
Summary: getting gemm operator to work for amd Reviewed By: danzimm, xuzhao9 Differential Revision: D64976612 fbshipit-source-id: 20aaf30732211848996a3575ca7356f514ed912c
Configuration menu - View commit details
-
Copy full SHA for 65e5f68 - Browse repository at this point
Copy the full SHA 65e5f68View commit details -
Add logger logging for remote fx graph cache get + put (pytorch#2512)
Summary: Pull Request resolved: pytorch#2512 X-link: pytorch/pytorch#138164 Capture the timing for the remote fx graph cache get and put operations and add them to the logger logging. Reviewed By: ezyang, oulgen Differential Revision: D64484025 fbshipit-source-id: 3ac8dad8f7083d7eefaa6f092d7703488a8bc41f
Configuration menu - View commit details
-
Copy full SHA for 2614ca9 - Browse repository at this point
Copy the full SHA 2614ca9View commit details
Commits on Oct 26, 2024
-
Reviewed By: xuzhao9 Differential Revision: D64683154 fbshipit-source-id: 70d359538572947c15184255fe5b2e69f61ab04a
Configuration menu - View commit details
-
Copy full SHA for f6f1249 - Browse repository at this point
Copy the full SHA f6f1249View commit details -
Reviewed By: xuzhao9 Differential Revision: D64683332 fbshipit-source-id: f132eda07a1cde19116ce18f5b400d896df53612
Configuration menu - View commit details
-
Copy full SHA for f8a4e51 - Browse repository at this point
Copy the full SHA f8a4e51View commit details
Commits on Oct 27, 2024
-
Update Typeguard to TypeIs for better type inference (#133814)
Summary: Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ X-link: pytorch/pytorch#133814 Approved by: https://github.com/ezyang Reviewed By: wdvr Differential Revision: D65030974 fbshipit-source-id: 6e04f555c9ac4a60d7f53ab23ad3b60b82de5d48
Configuration menu - View commit details
-
Copy full SHA for bd23811 - Browse repository at this point
Copy the full SHA bd23811View commit details -
Use guard_manager consistently instead of check_fn (#138896)
Summary: X-link: pytorch/pytorch#138896 Approved by: https://github.com/williamwen42, https://github.com/jansel ghstack dependencies: #138512 Reviewed By: wdvr Differential Revision: D65030963 Pulled By: anijain2305 fbshipit-source-id: 7423473e4c3613aea42e13a64eae9c417c876964
Configuration menu - View commit details
-
Copy full SHA for 34ea1a1 - Browse repository at this point
Copy the full SHA 34ea1a1View commit details
Commits on Oct 28, 2024
-
Fix naming for AMD in fp8 rowwise fbgemm
Summary: Select CK or Cutlass based on the arch. Reviewed By: xuzhao9 Differential Revision: D65060122 fbshipit-source-id: 3406e4852efe30883474d4bbb2315ffe4c54e211
Configuration menu - View commit details
-
Copy full SHA for 713f800 - Browse repository at this point
Copy the full SHA 713f800View commit details -
Back out "tls access helpers (#138061)" and Back out "[compiled autog…
…rad] Compiled autograd configs in TLS (#137821)" Summary: X-link: pytorch/pytorch#139086 Original commit changeset: 9bf80c1492d7 Original Phabricator Diff: D64796226 Original commit changeset: aa1d9ef8f6e6 Original Phabricator Diff: D64796212 Reviewed By: malfet, kflu Differential Revision: D65072644 fbshipit-source-id: 50ad138fc216653987a80ea6ae3efeaf5c04f949
Configuration menu - View commit details
-
Copy full SHA for 47e3138 - Browse repository at this point
Copy the full SHA 47e3138View commit details
Commits on Oct 29, 2024
-
Switch times to us in CompilationMetrics and improvements (#138975)
Summary: Companion logger diff: https://www.internalfb.com/diff/D65012523 * Using float seconds for timestamps is bad because our internal system defaults to float32 precision and you don't even get second precision for timestamps in float32 * We decide to use microseconds instead of milliseconds because millisecond granularity you can end up with the same timestamp if compilation is happening very quickly; much better to force non-overlapping spans * Because there are so many new fields and I don't feel like reimplementing each on BwdCompilationMetrics, BwdCompilationMetrics is no more, it's just that everything in CompilationMetrics is now optional. * The actual frame compile times collection is not modified (still float) to reduce blast radius, so I just convert to microseconds before making the record. At float64 precision (Python's default), you get about microsecond precision on timestamps so shouldn't be a data problem (https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html) * I rename some entries for clarity. In particular, whenever a timing contains all of the its lower phases (e.g., how Inductor also contains Triton compilation) we put "cumulative" in its name. If something doesn't happen at compile time but is delayed until we have actual real inputs, we put "runtime" in its name. X-link: pytorch/pytorch#138975 Approved by: https://github.com/masnesral Reviewed By: huydhn Differential Revision: D65088198 Pulled By: ezyang fbshipit-source-id: 0b901357ab649f052a3553fe8d0cc37fba80e197
Configuration menu - View commit details
-
Copy full SHA for 4ad2712 - Browse repository at this point
Copy the full SHA 4ad2712View commit details -
add some cpython debugging methods (#138030)
Summary: This PR enables you to inspect PyObjects in C using `INSPECT(...)` without requiring https://docs.python.org/3/howto/gdb_helpers.html. `torch._dynamo.eval_frame.raise_sigtrap` can also be used to set gdb breakpoints while running Python code, e.g. ```python x = x + 1 torch._dynamo.eval_frame.raise_sigtrap(); # can breakpoint on ceval.c:CALL to breakpoint the `sin` call in C. x = torch.sin(x) ``` X-link: pytorch/pytorch#138030 Approved by: https://github.com/jansel Reviewed By: huydhn Differential Revision: D65104659 Pulled By: williamwen42 fbshipit-source-id: aa2f3f9c34a1ee15160ccc82bf61c740b3ac6d20
Configuration menu - View commit details
-
Copy full SHA for 438f82b - Browse repository at this point
Copy the full SHA 438f82bView commit details -
Set use_cuda_graphs in fp8_gemm_rowwise
Summary: The default value for use_cuda_graphs was changed to False in D64471087 and this caused slowdowns in triton/ck kernels for fp8_gemm_rowwise. Reviewed By: danzimm Differential Revision: D65140285 fbshipit-source-id: 4ab77537afeb9108dab7cdef6cac34aaa39d7d73
Configuration menu - View commit details
-
Copy full SHA for 870be9b - Browse repository at this point
Copy the full SHA 870be9bView commit details -
Remove hammer/generative_recommenders (pytorch#2526)
Summary: Pull Request resolved: pytorch#2526 X-link: pytorch-labs/tritonbench#19 As title Reviewed By: xuzhao9, LinjianMa Differential Revision: D65069124 fbshipit-source-id: 1ee736396fecc76d606e637fee7a8127603d9d7e
Configuration menu - View commit details
-
Copy full SHA for 4d6e0fa - Browse repository at this point
Copy the full SHA 4d6e0faView commit details
Commits on Oct 31, 2024
-
Fix type for "--iter" flag (pytorch#2528)
Summary: Pull Request resolved: pytorch#2528 Reviewed By: xuzhao9 Differential Revision: D64935089 fbshipit-source-id: 8b0aa81513a3c6a58e8876475ec63041d362d42a
Configuration menu - View commit details
-
Copy full SHA for a0890b0 - Browse repository at this point
Copy the full SHA a0890b0View commit details -
Add start event metadata to collected metadata for PT2 Compile Events
Summary: X-link: pytorch/pytorch#139289 We should be logging metadata from event starts to PT2 Compile Events too. ghstack-source-id: 250444771 Reviewed By: oulgen Differential Revision: D65070086 fbshipit-source-id: 63b934bff4254871e15a615e5aa47112b032b143
Configuration menu - View commit details
-
Copy full SHA for 0c8a0f6 - Browse repository at this point
Copy the full SHA 0c8a0f6View commit details
Commits on Nov 1, 2024
-
Optimize PT2 Compile Events ingestion and column formats
Summary: X-link: pytorch/pytorch#139309 Per discussion from https://fb.workplace.com/groups/1286739428954016/posts/1360522894909002 This diff considerably changes the column format of PT2 Compile Events. We only log to scuba for a set of dynamo_timed() events that we actually care about aggregating. To do so, we add a boolean to dynamo_timed() that decides whether or not to log a pt2_compile_event. We'll always log a chromium_event for every dynamo_timed(), but only log a subset of those to scuba. Logging all metadata into a metadata column saves space and ingestion because for any new rows that are not the same event, you don't get N new empty column markers. It comes at the cost of having to create new derived columns in the Scuba UI for using all the extra metadata we care about. But that's a tradeoff we're willing to make here, considering other tables like dynamo_compile exists. ghstack-source-id: 251214365 exported-using-ghexport Reviewed By: oulgen Differential Revision: D65225598 fbshipit-source-id: 01569a79174ed3699063dbd8bb26b883c6a7b0c4
Configuration menu - View commit details
-
Copy full SHA for a66ce04 - Browse repository at this point
Copy the full SHA a66ce04View commit details -
Summary: When benchmarking across multiple operators, we can optionally isolate each operator run in a child process. Reviewed By: FindHao Differential Revision: D65154665 fbshipit-source-id: 9c9a21a76897084b061374cb3f7d8524a4aaac9b
Configuration menu - View commit details
-
Copy full SHA for cc094df - Browse repository at this point
Copy the full SHA cc094dfView commit details
Commits on Nov 4, 2024
-
Classify miss-inplaced tensors in logs.
Summary: X-link: pytorch/pytorch#139240 use signpost logs, a followup is to remove the field possibly_missed_reinplacing_opportunities form dynamo compile table. Reviewed By: zou3519 Differential Revision: D65180194 fbshipit-source-id: 20fe80f209a15573b2184e4cf7ed2be3c2a4ab94
Configuration menu - View commit details
-
Copy full SHA for 86a366e - Browse repository at this point
Copy the full SHA 86a366eView commit details
Commits on Nov 5, 2024
-
Switch OSS dashboard to use aoti_compile_and_package (#139597)
Summary: Reland pytorch/pytorch#139154 X-link: pytorch/pytorch#139597 Approved by: https://github.com/angelayi Reviewed By: ZainRizvi Differential Revision: D65455707 Pulled By: desertfire fbshipit-source-id: 691882e606754fc04cb826a14bdfe94cb465ece8
Configuration menu - View commit details
-
Copy full SHA for 4a42e06 - Browse repository at this point
Copy the full SHA 4a42e06View commit details
Commits on Nov 6, 2024
-
Specialize symfloats that flow through is_integer (#139572)
Summary: Fixes `python test/dynamo/test_dynamic_shapes.py DynamicShapesFunctionTests.test_number_method_method_is_integer_num_type6_dynamic_shapes` when specialize_float = False X-link: pytorch/pytorch#139572 Approved by: https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568 Reviewed By: ZainRizvi Differential Revision: D65492888 Pulled By: bobrenjc93 fbshipit-source-id: 9a9881caa5905686c44d8508ce5edab46ab03f28
Configuration menu - View commit details
-
Copy full SHA for 3d3b7bb - Browse repository at this point
Copy the full SHA 3d3b7bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5790e68 - Browse repository at this point
Copy the full SHA 5790e68View commit details -
Configuration menu - View commit details
-
Copy full SHA for 645c043 - Browse repository at this point
Copy the full SHA 645c043View commit details -
Configuration menu - View commit details
-
Copy full SHA for a78431f - Browse repository at this point
Copy the full SHA a78431fView commit details
Commits on Nov 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 79bc6af - Browse repository at this point
Copy the full SHA 79bc6afView commit details -
Configuration menu - View commit details
-
Copy full SHA for efb4b07 - Browse repository at this point
Copy the full SHA efb4b07View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4b5c733 - Browse repository at this point
Copy the full SHA 4b5c733View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ccc92a - Browse repository at this point
Copy the full SHA 2ccc92aView commit details