Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DDPWrapper #2479

Closed
wants to merge 99 commits into from
Closed

Commits on Sep 26, 2024

  1. Revert "Trace enter/exit of TorchFunctionModes (#135422)" (#136590)

    Summary:
    This reverts commit 7743149b2be4a9eba7e0997ccdc6abe552bec266.
    
    Reverts
    * pytorch/pytorch#135503
    * pytorch/pytorch#135502
    * pytorch/pytorch#135422
    
    This passes this test. Earlier, the getitem would stay like a getitem in the Fx graph. But now the fake tensor propagations fails saying that .item is called. It seems that torch function is not getting triggered while fake tensor propagation.
    
    ```
    import torch
    from torch.nn.attention.flex_attention import BlockMask, _mask_mod_signature, _score_mod_signature, flex_attention
    from torch._inductor.lowering import make_pointwise, register_lowering
    from torch._inductor.virtualized import ops
    from torch.nn.attention.flex_attention import create_block_mask
    
    torch.set_default_device('cuda')
    
    flex_attention = torch.compile(flex_attention, dynamic=False)
    
    prefix_lengths = torch.arange(8)
    def prefix_lm(b, h, q, kv):
        return prefix_lengths[b] >= kv
    
    mask = create_block_mask(prefix_lm, 8, None, 512, 512, _compile=True)
    ```
    
    X-link: pytorch/pytorch#136590
    Approved by: https://github.com/Chillee
    
    Reviewed By: atalman
    
    Differential Revision: D63431470
    
    Pulled By: anijain2305
    
    fbshipit-source-id: 60915b30336121b845af71f423582c22a6c65c3f
    anijain2305 authored and facebook-github-bot committed Sep 26, 2024
    Configuration menu
    Copy the full SHA
    a31c3fe View commit details
    Browse the repository at this point in the history
  2. Add nsys integration

    Summary: Add new metric `--metric nsys` to collect nsys trace.
    
    Reviewed By: htyu
    
    Differential Revision: D63274918
    
    fbshipit-source-id: 0536310df6290ea5f5a02d85cc0ad6d342d45dbd
    xuzhao9 authored and facebook-github-bot committed Sep 26, 2024
    Configuration menu
    Copy the full SHA
    2edf80c View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2024

  1. Fix bug pytorch#2458 (pytorch#2459)

    Summary:
    pytorch#2458
    
    Pull Request resolved: pytorch#2459
    
    Reviewed By: xuzhao9
    
    Differential Revision: D63476542
    
    Pulled By: kit1980
    
    fbshipit-source-id: 01e9db9cb03d34e82a773897417df2ccda410634
    ostrowskimarcin authored and facebook-github-bot committed Sep 28, 2024
    Configuration menu
    Copy the full SHA
    0f05015 View commit details
    Browse the repository at this point in the history
  2. Restore FlexAttention and FlashV3 backward (pytorch#2473)

    Summary: Pull Request resolved: pytorch#2473
    
    Reviewed By: xuzhao9
    
    Differential Revision: D63543625
    
    Pulled By: bertmaher
    
    fbshipit-source-id: 1693e15875544bda0f5f6c69daa5597fffd80509
    bertmaher authored and facebook-github-bot committed Sep 28, 2024
    Configuration menu
    Copy the full SHA
    611bf70 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2024

  1. Fix hardcoded shape in low_mem_dropout benchmark (pytorch#2475)

    Summary: Pull Request resolved: pytorch#2475
    
    Reviewed By: htyu
    
    Differential Revision: D63653081
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: 8d840986779b6124cbccc2425c24e2b892d55ce4
    mark14wu authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    252a3b1 View commit details
    Browse the repository at this point in the history
  2. Make FA3 work in fbcode

    Summary: We had the imports wrong for the internal port.
    
    Reviewed By: xuzhao9, adamomainz
    
    Differential Revision: D63643617
    
    fbshipit-source-id: 04a49d419fede71d2681dedbfb55112a67cb4d55
    bertmaher authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    b6b67a4 View commit details
    Browse the repository at this point in the history
  3. Skip loading triton.nvidia.cublas if not found

    Summary:
    We have an old triton internally that doesn't have the cublasLt
    bindings
    
    Reviewed By: adamomainz
    
    Differential Revision: D63643619
    
    fbshipit-source-id: 39aece74b52f7747fe2100d7bb905bad49ba1fa0
    bertmaher authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    0611c41 View commit details
    Browse the repository at this point in the history
  4. Print TMA benchmark info to stderr

    Summary:
    X-link: facebookresearch/FBGEMM#301
    
    X-link: pytorch/FBGEMM#3202
    
    Printing warnings to stdout mucks up the output of various tools/benchmarks
    
    Reviewed By: xuzhao9, htyu
    
    Differential Revision: D63643615
    
    fbshipit-source-id: 1f34508a7fd36f5aa421e11bddd5ce77fc13038a
    bertmaher authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    0cb1e96 View commit details
    Browse the repository at this point in the history
  5. Modernize cutlass call for fp8 blockwise

    Summary: FBGEMM has changed how it declares its Cutlass-based blockwise gemm.
    
    Reviewed By: htyu, sijiac, adamomainz
    
    Differential Revision: D63643618
    
    fbshipit-source-id: e46e3bbd2e07be0653f7c7fa6bd080b6c8db171e
    bertmaher authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    2d9ab0b View commit details
    Browse the repository at this point in the history
  6. CSV of extra shapes for gemm benchmarks

    Summary:
    We have a big list of interesting shapes for blockwise/rowwise scaled
    gemm.  A lot of these are variants of llama.  We might want to use them for
    gemm and fp8_gemm (unscaled) as well, but for now just do blockwise/rowwise
    
    Reviewed By: xuzhao9, adamomainz
    
    Differential Revision: D63643616
    
    fbshipit-source-id: 328961fe8c91e66428fcd1e5b72c89813f58a5a3
    bertmaher authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    d512e67 View commit details
    Browse the repository at this point in the history
  7. Add layout options to gemm

    Summary:
    We were only benchmarking `row-major x row-major` gemms (also called
    `TT` or `transpose-transpose`, because FORTRAN), which is actually not the
    common case; `nn.Linear` will use column-major layouts for weights, which means
    `TN` is actually much more common.
    
    Reviewed By: adamomainz
    
    Differential Revision: D63714661
    
    fbshipit-source-id: 735c25c59ddeb6596afd9b19f463af92036a830b
    bertmaher authored and facebook-github-bot committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    4445aa2 View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2024

  1. Enable fp8 rowwise on AMDGPU (pytorch#2483)

    Summary: Pull Request resolved: pytorch#2483
    
    Reviewed By: karthik-man
    
    Differential Revision: D63726031
    
    fbshipit-source-id: dc410e503f918d83362fb38005ac4a6db5dc1e68
    htyu authored and facebook-github-bot committed Oct 2, 2024
    Configuration menu
    Copy the full SHA
    f2932b7 View commit details
    Browse the repository at this point in the history
  2. Ignore Torchbench CI on Tritonbench paths (pytorch#2481)

    Summary:
    Right now, Tritonbench is still sharing codebase with Torchbench.
    Skip the Torchbench tests when the PR is on Tritonbench paths.
    
    Pull Request resolved: pytorch#2481
    
    Reviewed By: kit1980
    
    Differential Revision: D63695702
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: cc88e0a987ecca1daf09d35ddeca18f07bef9077
    xuzhao9 authored and facebook-github-bot committed Oct 2, 2024
    Configuration menu
    Copy the full SHA
    a8ce4b5 View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2024

  1. Add _dynamo.config inline_inbuilt_nn_modules and specialize_float log…

    …ging (#137139)
    
    Summary:
    X-link: pytorch/pytorch#137139
    Approved by: https://github.com/ezyang
    
    Reviewed By: PaliC
    
    Differential Revision: D63783497
    
    Pulled By: jovianjaison
    
    fbshipit-source-id: 5abe70d558917a9807e33be8181d42ef240c5a95
    jovianjaison authored and facebook-github-bot committed Oct 3, 2024
    Configuration menu
    Copy the full SHA
    737084e View commit details
    Browse the repository at this point in the history
  2. Add non-persistent fp8 triton_rowwise kernel (pytorch#2484)

    Summary:
    Pull Request resolved: pytorch#2484
    
    X-link: pytorch/FBGEMM#3212
    
    X-link: facebookresearch/FBGEMM#308
    
     triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning.
    
    Reviewed By: htyu
    
    Differential Revision: D63741099
    
    fbshipit-source-id: c276415ddf8f5d24ffeba70b8ee6493011b393e1
    karthik-man authored and facebook-github-bot committed Oct 3, 2024
    Configuration menu
    Copy the full SHA
    6b4f339 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2024

  1. Bump transformer version (pytorch#2488)

    Summary:
    Bump transformer version to enable linger-kernels
    
    Pull Request resolved: pytorch#2488
    
    Reviewed By: FindHao
    
    Differential Revision: D63860019
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: f607c5553169c61270e4f5271d8375d7f227bd82
    xuzhao9 authored and facebook-github-bot committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    12820bc View commit details
    Browse the repository at this point in the history
  2. Add multiple ops support for --op argument (pytorch#2490)

    Summary:
    Allow users benchmark multiple ops in a single run. The ops can be split by commas, `--op fp8_gemm,addmm`
    
    Example output:
    ```
    % python run_benchmark.py triton --op fp8_gemm,addmm --num-inputs 1
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.12s/it]
                 x_val    torch_fp8_gemm-gbps    torch_fp8_gemm-gbps    torch_fp8_gemm-latency    torch_fp8_gemm-tflops    triton_fp8_gemm-gbps    triton_fp8_gemm-gbps    triton_fp8_gemm-latency    triton_fp8_gemm-tflops
    ------------------  ---------------------  ---------------------  ------------------------  -----------------------  ----------------------  ----------------------  -------------------------  ------------------------
    (1024, 1024, 1024)                462.202                462.202                0.00907462                  236.647                  630.43                  630.43                 0.00665309                    322.78
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.90s/it]
             (M, N, K)    aten_addmm-best_config    aten_addmm-gbps    aten_addmm-tflops                                                                                       triton_addmm-best_config    triton_addmm-gbps    triton_addmm-tflops    pt2_triton_matmul-best_config    pt2_triton_matmul-gbps    pt2_triton_matmul-tflops
    ------------------  ------------------------  -----------------  -------------------  -------------------------------------------------------------------------------------------------------------  -------------------  ---------------------  -------------------------------  ------------------------  --------------------------
    (20120, 512, 1536)                                      818.112              247.544  {'BLOCK_M': 128, 'BLOCK_N': 256, 'BLOCK_K': 64, 'GROUP_M': 8, 'num_warps': 8, 'num_ctas': 1, 'num_stages': 3}              911.569                275.823                                                    889.125                     269.031
    ```
    
    Pull Request resolved: pytorch#2490
    
    Reviewed By: xuzhao9
    
    Differential Revision: D63862548
    
    Pulled By: FindHao
    
    fbshipit-source-id: 9d4afa6051d4191bc2e3288f59e2820627647b91
    FindHao authored and facebook-github-bot committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    a1f4b2e View commit details
    Browse the repository at this point in the history
  3. Add FusedLinearCrossEntropy (pytorch#2485)

    Summary:
    As discussed in pytorch/pytorch#136168, I'm going to migrate implementations of operator benchmarking. This PR adds different implementations for FusedLinearCrossEntropy as a starting example.
    
    Execution command:
    ```
    python run_benchmark.py triton --op FusedLinearCrossEntropy
    ```
    Example output:
    ```
    x_val    LMHeadCE-latency    LigerLMHeadCE-latency    inductor_fused_linear_cross_entropy-latency
    -------  ------------------  -----------------------  ---------------------------------------------
          0             98.0041                  389.87                                         95.0412
          1            196.12                    652.619                                       193.219
          2            417.242                  1248.75                                        416.725
          3            824.906                  2356.25                                        809.56
    ```
    
    Pull Request resolved: pytorch#2485
    
    Reviewed By: xuzhao9
    
    Differential Revision: D63859871
    
    Pulled By: FindHao
    
    fbshipit-source-id: 4b73a2144702c1f8f3ae5ed15e76112d03f12b87
    FindHao authored and facebook-github-bot committed Oct 4, 2024
    Configuration menu
    Copy the full SHA
    dde8528 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2024

  1. Add user release benchmark so that we can run it on pull request (pyt…

    …orch#2489)
    
    Summary: Pull Request resolved: pytorch#2489
    
    Reviewed By: xuzhao9
    
    Differential Revision: D63898689
    
    Pulled By: atalman
    
    fbshipit-source-id: 3cd430911aadd5972f1393e3548ef7d52b93b661
    atalman authored and facebook-github-bot committed Oct 5, 2024
    Configuration menu
    Copy the full SHA
    bde2401 View commit details
    Browse the repository at this point in the history

Commits on Oct 7, 2024

  1. Install time (pytorch#2493)

    Summary:
    Remove nvidia-cuda-nvcc-cu12 as not required. Install time.
    
    Pull Request resolved: pytorch#2493
    
    Reviewed By: xuzhao9
    
    Differential Revision: D63987509
    
    Pulled By: atalman
    
    fbshipit-source-id: 07298ddb569da7f7c3fe22d73da72a4ceab256f5
    atalman authored and facebook-github-bot committed Oct 7, 2024
    Configuration menu
    Copy the full SHA
    eae9e50 View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2024

  1. Add Tritonbench CI (pytorch#2494)

    Summary:
    Add a PR CI on Tritonbench that installs the latest Triton nightly package
    
    Pull Request resolved: pytorch#2494
    
    Reviewed By: chenyang78
    
    Differential Revision: D63998525
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: a26633de040bdf324e9ae5c9b130ec1a58dfd409
    xuzhao9 authored and facebook-github-bot committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    1ac701f View commit details
    Browse the repository at this point in the history
  2. Log compile ids to pt2_remote_cache and pt2_compile_events

    Summary:
    X-link: pytorch/pytorch#137431
    
    Log the current compilation id for all relevant samples for these two tables, so we can have a 1:1 analog with dynamo_compile.
    ghstack-source-id: 246618821
    exported-using-ghexport
    
    Reviewed By: oulgen
    
    Differential Revision: D63900826
    
    fbshipit-source-id: 3f2896287777c94344960e7cad131f71aaf0210f
    jamesjwu authored and facebook-github-bot committed Oct 8, 2024
    Configuration menu
    Copy the full SHA
    85c33e5 View commit details
    Browse the repository at this point in the history

Commits on Oct 9, 2024

  1. Trace enter/exit of TorchFunctionModes (#135422) (#137114)

    Summary:
    This PR implements tracing of with contexts with TorchFunction modes which have the default enter/exit behavior (ie pushing/popping the mode)
    
    Typically the bytecode for a context manager looks like this during a graph break:
    1. graph call
    2. enter context
    3. unsupported code
    4. exit context
    5. resume call
    
    resume fn structure:
    1. enter context
    2. jump
    ...
    3. exit context
    
    The issue with torch function modes is that side effects will replay any mutations to the torch function stack performed during tracing. So, we do not need to enter and exit around the unsupported code in the original function (doing so would result in a duplicate torch function mode entry during execution of the unsupported code), and we don't need to enter again in the resume function (the mode that was pushed from the side effects bytecode would still be on the stack).
    
    So for torch function modes the structure of our output code is this:
    
    1. graph call
    2. mutate tf mode stack to replay mutations
    4. unsupported code
    5. on exception restore stack
    6. resume function
    
    Then our resume fn looks like this:
    
    1. no-op enter torch function mode
    2. jump
    3.  exit tf mode
    
    To implement the no-op enter of the torch function mode I added torch function mode in polyfill which no-op enters, but normally exits. This is needed because we still want to trace the with context in the resume function, and exit properly (the exit instructions will still be in the function, so we need to generate instructions to set up the context).
    
    Separately from the bytecode, dynamo also tracks contexts on the block stack, which is how the SETUP_* instructions are implemented. Naturally at a graph break, we exit these block stacks to properly reset the contexts entirely, so that we can re-enter around the unsupported code soundly. However once again, in the torch function mode case, in the event of a graph we do not want to perform any exit side effects because we want to preserve the state of the mode stack as is so that we will properly update the stack with bytecode mentioned in the first section. If we exited here, dynamo would pop the mode off of the symbolic stack, and not update the true python torch function mode stack with the suffix bytecode. All in all, for torch function modes we enter exactly once, update the global torch function mode stack with side effects bytecode, re-read this stack when compiling the resume function, and exit exactly once in the resume function. This matches the semantics of eager exactly.
    Approved by: https://github.com/williamwen42
    ghstack dependencies: #134732, #133137, #135443, #135444
    
    X-link: pytorch/pytorch#137114
    Approved by: https://github.com/yanboliang
    
    Reviewed By: jovianjaison
    
    Differential Revision: D64088005
    
    Pulled By: mlazos
    
    fbshipit-source-id: 156b9bf38a535933f8dd966ee96ed3099d7b4be2
    mlazos authored and facebook-github-bot committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    79043be View commit details
    Browse the repository at this point in the history
  2. Remove ignored modes workaround (#135502) (#137115)

    Summary:
    Approved by: https://github.com/anijain2305
    ghstack dependencies: #134732, #133137, #135443, #135444, #135422
    
    X-link: pytorch/pytorch#137115
    Approved by: https://github.com/yanboliang
    ghstack dependencies: #137114
    
    Reviewed By: jovianjaison
    
    Differential Revision: D64088016
    
    Pulled By: mlazos
    
    fbshipit-source-id: 53efb5a6e689d4fb6112a6462851ee7e81b28c24
    mlazos authored and facebook-github-bot committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    39d65a4 View commit details
    Browse the repository at this point in the history
  3. Handle torch function subclass/mode dispatch on generic tensor method…

    …s (#137119)
    
    Summary:
    X-link: pytorch/pytorch#137119
    Approved by: https://github.com/williamwen42, https://github.com/anijain2305
    ghstack dependencies: #137114, #137115, #137116, #137117, #137120, #137227
    
    Reviewed By: jovianjaison
    
    Differential Revision: D64088048
    
    Pulled By: mlazos
    
    fbshipit-source-id: 34fe09f7fa6292d89a438b780852f00e042ec950
    mlazos authored and facebook-github-bot committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    4fd7c74 View commit details
    Browse the repository at this point in the history
  4. adding new configs for servicelab

    Summary:
    adding new configs for servicelab + logging to scuba
    
    Follow up diff coming up to add aggregates into logging (ie harmonic mean)
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64126688
    
    fbshipit-source-id: 0c3705e82071f1399cfc53ff496d130adf237b73
    adamomainz authored and facebook-github-bot committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    533d258 View commit details
    Browse the repository at this point in the history
  5. Improve release benchmark suites with a lower value of epoch (pytorch…

    …#2482)
    
    Summary:
    pytorch#2468
    
    Pull Request resolved: pytorch#2482
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64139543
    
    Pulled By: atalman
    
    fbshipit-source-id: 2d030c66d856387b6a2451b26c89fd40e79e0e53
    juliagmt-google authored and facebook-github-bot committed Oct 9, 2024
    Configuration menu
    Copy the full SHA
    7742ef2 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2024

  1. Check dyno and dcgm existence before disable them (pytorch#2496)

    Summary:
    For systems without dyno or dcgm installed and running without sudo, the `ncu_rep` metric will get stuck asking for a sudo password.
    
    This PR checks the command or service existence before disabling them to avoid getting stuck.
    
    Pull Request resolved: pytorch#2496
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64141793
    
    Pulled By: FindHao
    
    fbshipit-source-id: 8d52468f04e7e5a0e8d23f3562a14c83d4a5934c
    FindHao authored and facebook-github-bot committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    dcd3d31 View commit details
    Browse the repository at this point in the history
  2. combining CI and servicelab configs

    Summary: instead of having 1 list of configs for OSS ci and another set for servicelab we combine both here into one common dictionary
    
    Reviewed By: danzimm
    
    Differential Revision: D64183688
    
    fbshipit-source-id: fa47780a3bf3ba8669c6e8fd406cff5542fd06e6
    adamomainz authored and facebook-github-bot committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    3a7a4fe View commit details
    Browse the repository at this point in the history
  3. fixing typo in fp8_gemm

    Summary: put an extra `m` by mistake in one of the configs and it is breaking in OSS
    
    Reviewed By: plotfi
    
    Differential Revision: D64208508
    
    fbshipit-source-id: f1461da0a5e883ffd4266206f5e3b737f468c3b2
    adamomainz authored and facebook-github-bot committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b56e2ee View commit details
    Browse the repository at this point in the history

Commits on Oct 11, 2024

  1. use 3.13 multiline traceback in get_instruction_source_311 (#137617)

    Summary:
    X-link: pytorch/pytorch#137617
    Approved by: https://github.com/jansel
    
    Reviewed By: jovianjaison
    
    Differential Revision: D64202324
    
    Pulled By: williamwen42
    
    fbshipit-source-id: 526f32cabeb891c8c9481799f45436cfd19e7dc2
    williamwen42 authored and facebook-github-bot committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    f3921ca View commit details
    Browse the repository at this point in the history
  2. differentiating between some Fbsource only targets and OSS for CI

    Summary: TSIA
    
    Reviewed By: danzimm, aakhundov
    
    Differential Revision: D64268555
    
    fbshipit-source-id: e380f9401b08c2b7d9a48bedc6d791b9b39cd533
    adamomainz authored and facebook-github-bot committed Oct 11, 2024
    Configuration menu
    Copy the full SHA
    3900904 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2024

  1. Format .ci/ / .github/ / benchmarks/ / functorch/ / tools/

    …/ `torchgen/` with `ruff format` (#132577)
    
    Summary:
    X-link: pytorch/pytorch#132577
    Approved by: https://github.com/malfet
    
    Reviewed By: jovianjaison
    
    Differential Revision: D64256966
    
    fbshipit-source-id: e9725ccc5a814ef3b30e244e988ed9b7238b6ccb
    XuehaiPan authored and facebook-github-bot committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    f9f52f6 View commit details
    Browse the repository at this point in the history
  2. Add AtenOp Benchmarking (pytorch#2495)

    Summary:
    As described in pytorch/pytorch#136168, I'm trying to migrate native PyTorch implementation comparison([the original operatorbench](https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/microbenchmarks/operatorbench.py)) to TritonBench.
    
    This PR adds an Operator Loader which can load aten ops used in TorchBench, HuggingFace, and TIMM models. The benchmark classes are dynamically created. Then benchmark them between aten and inductor implementations.
    
    Files `torchbenchmark/operator_loader/operator_inp_utils.py`, `torchbenchmark/operator_loader/operatorbench.py`, and all configs files in `torchbenchmark/operator_loader/operator_inp_logs/` are copied from original operatorbench.
    
    Example commands:
    ```bash
    python run_benchmark.py triton --op aten._softmax.default --num-inputs 1 --operator-loader --precision fp16
    ```
    Exampled Output:
    ```
    Evaluating an op name into an OpOverload: The underlying op of 'aten.upsample_nearest2d_backward' has no overload name 'vec'
    Evaluating an op name into an OpOverload: '_OpNamespace' 'aten' object has no attribute 'im2col_backward'
    Evaluating an op name into an OpOverload: '_OpNamespace' 'aten' object has no attribute 'col2im_backward'
    Evaluating an op name into an OpOverload: '_OpNamespace' 'aten' object has no attribute 'im2col_backward'
    Evaluating an op name into an OpOverload: The underlying op of 'aten.upsample_bilinear2d_backward' has no overload name 'vec'
    Evaluating an op name into an OpOverload: The underlying op of 'aten.upsample_nearest2d_backward' has no overload name 'vec'
    100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.20s/it]
      x_val    eager-latency    inductor-latency
    -------  ---------------  ------------------
          0         0.090592            0.089632
          1         0.055808            0.038112
    ```
    
    Pull Request resolved: pytorch#2495
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64200358
    
    Pulled By: FindHao
    
    fbshipit-source-id: f0121168b33247224bc905a1a88af69e4b13def6
    FindHao authored and facebook-github-bot committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    34d4f94 View commit details
    Browse the repository at this point in the history
  3. change GPT2ForSequenceClassification inference accuracy tolerance (#1…

    …36749)
    
    Summary:
    Fixes pytorch/pytorch#123503.
    
    pytorch/pytorch#121866 makes GPT2ForSequenceClassification hit the SDPA pattern 18 and then encounter the accuracy issue. The issue only happens with BF16 inference single thread. This PR tends to increase the model tolerance from 4e-3 to 5e-3 and make the check pass. Note that the issue is due to some small implementation diff. For example, the sdpa math backend scales q, k before matmul for stability; the flash attention backend has more diffs as a new algorithm.
    
    X-link: pytorch/pytorch#136749
    Approved by: https://github.com/jgong5, https://github.com/jansel
    
    Reviewed By: jovianjaison
    
    Differential Revision: D64290722
    
    fbshipit-source-id: a3e7248f57a97cd767257354d410b3508b5e0325
    Valentine233 authored and facebook-github-bot committed Oct 12, 2024
    Configuration menu
    Copy the full SHA
    680d64e View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2024

  1. making CI more flexible for extra data in tritonbench

    Summary: TSIA
    
    Reviewed By: danzimm
    
    Differential Revision: D64334048
    
    fbshipit-source-id: d01b20161407400d0afd28460bce8095c91d9056
    adamomainz authored and facebook-github-bot committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    28d301a View commit details
    Browse the repository at this point in the history
  2. Add entire _dynamo.config as a json for logging (#137216)

    Summary:
    X-link: pytorch/pytorch#137216
    Approved by: https://github.com/ezyang
    
    Reviewed By: clee2000
    
    Differential Revision: D64290696
    
    Pulled By: jovianjaison
    
    fbshipit-source-id: 06886bfb7e3f37895e3a8bf567366e4c4cc1d248
    
    Co-authored-by: Aaron Gokaslan <[email protected]>
    2 people authored and facebook-github-bot committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    7cb1c0a View commit details
    Browse the repository at this point in the history
  3. sakipping null values in scribe message

    Summary: since we have added flexibility for different sets of metrics per operater we want to skip messages for empty metrics
    
    Reviewed By: nmacchioni
    
    Differential Revision: D64345289
    
    fbshipit-source-id: d5b1fff90c6acd530867d0b6ef3ea97bc6f41cf5
    adamomainz authored and facebook-github-bot committed Oct 14, 2024
    Configuration menu
    Copy the full SHA
    509e94f View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2024

  1. Add fbscribelogger to Dynamo benchmark runner (#137867)

    Summary:
    Signed-off-by: Edward Z. Yang <[email protected]>
    
    X-link: pytorch/pytorch#137867
    Approved by: https://github.com/bobrenjc93
    
    Reviewed By: clee2000
    
    Differential Revision: D64418349
    
    Pulled By: ezyang
    
    fbshipit-source-id: 265e07753a3549e6866d45fbdb8a435b6e7dc787
    ezyang authored and facebook-github-bot committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    12e1d26 View commit details
    Browse the repository at this point in the history
  2. Update the flash-attention submodule (pytorch#2500)

    Summary:
    We need https://github.com/Dao-AILab/flash-attention/pull/1053/files to externally import `flash_attn_interface` for FA3.
    
    Pull Request resolved: pytorch#2500
    
    Reviewed By: bertmaher
    
    Differential Revision: D64190441
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: ff20f0a28514b645c828853e7f15808ed1597ae6
    xuzhao9 authored and facebook-github-bot committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    ea4433f View commit details
    Browse the repository at this point in the history
  3. Add host-side Triton TMA support to Dynamo (#137677)

    Summary:
    This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes:
    
    - Here we assume the availability of the host-side TMA API added to upstream Triton in triton-lang/triton#4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024.
    - To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created.
    - The newly introduced variables have `reconstruct` methods used in case of graph breaks.
    - The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like.
    - In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors.
    - In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required.
    - JIT Inductor and AOT Inductor support will be implemented in follow-up PRs.
    
    X-link: pytorch/pytorch#137677
    Approved by: https://github.com/zou3519
    
    Reviewed By: clee2000
    
    Differential Revision: D64404928
    
    Pulled By: aakhundov
    
    fbshipit-source-id: c812cea3867c55800d5fe213bf07bf21292345e3
    aakhundov authored and facebook-github-bot committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    db41e77 View commit details
    Browse the repository at this point in the history
  4. Add ncu report analyzer (pytorch#2497)

    Summary:
    This PR adds a ncu report analyzer to analyze the profiled ncu report. It also adds two metrics `memory_traffic` and `arithmetic_intensity`. To avoid excessive profiling overhead, we only profile with necessary ncu metrics.
    
    This PR is a part of [operator benchmarking plan](pytorch/pytorch#136168)
    
    Example commands:
    ```
    python run_benchmark.py triton --op gather_gemv --num-inputs 1  --metrics memory_traffic,arithmetic_intensity --csv
    ```
    Example output:
    ```
      0%|                                                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 508958 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10)
    ==PROF== Profiling "index_elementwise_kernel" - 0: 0%....50%....100% - 3 passes
    ==PROF== Profiling "unrolled_elementwise_kernel" - 1: 0%....50%....100% - 3 passes
    ==PROF== Profiling "gemv2T_kernel_val" - 2: 0%....50%....100% - 3 passes
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.89s/it]
    x_val;test_eager-_ncu_trace_in_task
    2048;success
    ==PROF== Disconnected from process 508958
    ==WARNING== No source files were imported. Check that the target application was compiled with -lineinfo.
    ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_eager_0/ncu_output.ncu-rep
      0%|                                                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 509121 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10)
    ==PROF== Profiling "triton_red_fused_mv_0" - 0: 0%....50%....100% - 3 passes
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.79s/it]
    x_val;test_0-_ncu_trace_in_task
    2048;success
    ==PROF== Disconnected from process 509121
    ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_0_0/ncu_output.ncu-rep
      0%|                                                                                                                                                                                                                                                              | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 509285 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10)
    ==PROF== Profiling "triton_red_fused_mv_0" - 0: 0%....50%....100% - 3 passes
    ==PROF== Connected to process 509433 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10)
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00,  4.07s/it]
    x_val;test_inductor-_ncu_trace_in_task
    2048;success
    ==PROF== Disconnected from process 509285
    ==PROF== Disconnected from process 509433
    ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_inductor_0/ncu_output.ncu-rep
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.99s/it]
    x_val;test_eager-arithmetic_intensity;test_eager-memory_traffic;test_eager-weighted_fp32_arithmetic_intensity;test_0-arithmetic_intensity;test_0-memory_traffic;test_0-weighted_fp32_arithmetic_intensity;test_inductor-arithmetic_intensity;test_inductor-memory_traffic;test_inductor-weighted_fp32_arithmetic_intensity
    2048;(0.14937214493924472, 0.0);(29467392.0, 505856.0);0.14937214493924472;(4.364079147640791, 0.0);(4204544.0, 256.0);4.364079147640791;(9.97989888530182, 0.0);(4202752.0, 0.0);9.97989888530182
    ```
    
    according to ncu, there can be multiple roofline charts on different granularity, such as single precision, double precision, tensorcore, and half precision.
    
    Pull Request resolved: pytorch#2497
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64359055
    
    Pulled By: FindHao
    
    fbshipit-source-id: a02a4ebfcac5c5209f4196aac5a8eb4f91b3de87
    FindHao authored and facebook-github-bot committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    21cc30d View commit details
    Browse the repository at this point in the history
  5. Change default gpu metric backend (pytorch#2501)

    Summary:
    The current GPU memory metric backend includes dcgm and nvml. They are reported from hardware and should be accurate. This PR adds the native torch way to collect GPU memory usage. It uses `torch.cuda.max_memory_allocated()`. The benefit is that it has lower overhead and is accurate on a shared GPU server when there are mutliple GPU processes from other users. It is because we don't implement the process filter for the other two backends.
    
    Use `--metrics-gpu-backend torch` to set the backend.
    
    Pull Request resolved: pytorch#2501
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64253410
    
    Pulled By: FindHao
    
    fbshipit-source-id: 09b0579846a6830e0e9735e8daeba4abd88bab17
    FindHao authored and facebook-github-bot committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    c396191 View commit details
    Browse the repository at this point in the history
  6. Update 2.5.0.yaml (pytorch#2498)

    Summary: Pull Request resolved: pytorch#2498
    
    Reviewed By: kit1980
    
    Differential Revision: D64407151
    
    Pulled By: atalman
    
    fbshipit-source-id: 0637d812144f13dad41b640e70fd65619a183c67
    juliagmt-google authored and facebook-github-bot committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    9e670cd View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2024

  1. Add --op-collection option (pytorch#2503)

    Summary:
    This PR add `--op-collection` to tritonbench. It can run multiple ops in defined operator collections. The default collection includes all ops not included in other collections.
    
    Operator collections are defined in `torchbenchmark/operators_collection/`. For each collection, you should define a `get_operators` function to return operators included in this collection.
    
    Pull Request resolved: pytorch#2503
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64359380
    
    Pulled By: FindHao
    
    fbshipit-source-id: c66dd254a3c8b70c112d9b7774482813e0236789
    FindHao authored and facebook-github-bot committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    58f3b1f View commit details
    Browse the repository at this point in the history
  2. Fix imports

    Summary: Update imports for latest updates + silu_mul interface change
    
    Reviewed By: jianyuh
    
    Differential Revision: D64516452
    
    fbshipit-source-id: b9b98a6eda45a093661e8b23f6b8ec300b559960
    jasonjk-park authored and facebook-github-bot committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    2feadb6 View commit details
    Browse the repository at this point in the history
  3. Add doc for adding custom ops (pytorch#2509)

    Summary:
    Add documentation for adding custom ops.
    
    Pull Request resolved: pytorch#2509
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64497281
    
    Pulled By: FindHao
    
    fbshipit-source-id: 20f4096ebbce53c7d9a713cacbde016c521aa7c3
    FindHao authored and facebook-github-bot committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    d933ced View commit details
    Browse the repository at this point in the history
  4. Fix the broken gemm test

    Summary: As the title goes.
    
    Reviewed By: bertmaher
    
    Differential Revision: D64480822
    
    fbshipit-source-id: ec1d17be0619fb35d4d8f774eab2858e75afe2e3
    xuzhao9 authored and facebook-github-bot committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    384a43d View commit details
    Browse the repository at this point in the history
  5. Test backward pass in unit test.

    Summary:
    In unit test, run both forward and backward pass.
    
    If the backward pass throws `NotImplementedError`, skip the test since the operator does not support backward pass.
    
    Reviewed By: int3
    
    Differential Revision: D64471087
    
    fbshipit-source-id: c9d0c43544314fc11305f271e8e80f7ba07b2675
    xuzhao9 authored and facebook-github-bot committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    eec8612 View commit details
    Browse the repository at this point in the history
  6. Make sure all ci-enabled impls are in the output

    Summary:
    In the CI, we will check that all registered impls are available in the output, unless they are specified as`ci=False`.
    
    We add the `ci=` flag because right now we don't have lazy imports to import optional backend modules while we want different behavior between flags `enabled` and `ci`.
    
    For `enabled` flag, we want "best-effort". If a module is not available (e.g. flash attention 3 is not available on A100), we should check if it is not available, then skip it automatically instead of error out for the best user experience.
    
    For `ci` flag, we want to make sure that things are already setup in fbcode CI, and if flash attention 3 is not available, it is a red flag and we have to report it in the unit test.
    
    Reviewed By: bertmaher
    
    Differential Revision: D64473609
    
    fbshipit-source-id: 320255f73942705038d50aac1f14d318b62a4765
    xuzhao9 authored and facebook-github-bot committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    00c9b9e View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2024

  1. Update AOTEagerandRecordGraphs backend (#138231)

    Summary:
    X-link: pytorch/pytorch#138231
    Approved by: https://github.com/StrongerXi, https://github.com/mlazos, https://github.com/aakhundov
    
    Reviewed By: clee2000
    
    Differential Revision: D64581452
    
    Pulled By: anijain2305
    
    fbshipit-source-id: 3b9ff53abf2c4e1c525d7e62a52285279d2d4109
    anijain2305 authored and facebook-github-bot committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    04f0e6c View commit details
    Browse the repository at this point in the history
  2. Log is_forward field to dynamo_compile scuba table (pytorch#2511)

    Summary:
    Pull Request resolved: pytorch#2511
    
    X-link: pytorch/pytorch#138097
    
    ^^
    
    Reviewed By: ezyang
    
    Differential Revision: D64438144
    
    fbshipit-source-id: 87a5518d4d9318132d269302c93a285bf86f3a46
    masnesral authored and facebook-github-bot committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    e89c1b3 View commit details
    Browse the repository at this point in the history
  3. Revamp PT2 Compile/chromium event logging [1/?]

    Summary:
    X-link: pytorch/pytorch#138093
    
    This diff is the starting steps of https://docs.google.com/document/u/2/d/1kAEBt4AyW7HTAhXHbjoz8FBFHNyyEA2Qo2mPn7v3WUQ/edit?usp=drive_web&ouid=113555078003219714709
    
    It implements the following changes:
    
    - Only log spans to scuba, so no start events are ever logged
    - Log events as the full event name, without "START" or "END"
    - Only log to scuba major phases from chromium events. These are:
      - entire_frame_compile (dynamo)
      - backend_compile (aotdispatch)
      - inductor_compile (inductor)
      - codegen (inductor codegen)
    
    Tlparse chromium events stay basically the same. But I implemented a few changes to clean that up as well:
    - When there's a phase name available, log the phase name instead of the function name as the event name. This simplifies the trace to not have two identical rows. The fn_name is avaliable as metadata on the chromium event, if interested
    - Log new events for pre and post grad passes. These do *not* log to scuba.
    
    By making the phases much simpler in Scuba, with only categories for major phases of PT2 Compilation, we pave the way to add **much** more metadata and information to each individual event type. Diffs for that will come later.
    
    **IMPLEMENTATION NOTES:**
    - The logic for `log_chromium_event_internal` (which is the function that logs to Scuba) lives in chromium_events for now, but in the future as we add more metadata, it may belong independently in dynamo_timed or even outside of dynamo_timed. I haven't explored in detail what the refactor will look like. Once we start logging metadata for dynamo, aotdispatch, inductor, I suspect we will call log_pt2_compile_event directly, instead of making chromium event logger handle the pt2_compile_event logic. But that refactor is left for another PR on top of this one.
    
    - There's an interesting space after pre grad passes within AOT autograd logic, that's between create_aot_dispatcher_function and pre grad passes. I'm not sure what we're spending time doing in that time, but I'll find out with a profile later.
    ghstack-source-id: 248790387
    
    Reviewed By: oulgen
    
    Differential Revision: D64479033
    
    fbshipit-source-id: 1f30e734160bfed2f664063b5b2f4df1b661dfa4
    jamesjwu authored and facebook-github-bot committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    8358f92 View commit details
    Browse the repository at this point in the history
  4. Revert D64438144: Log is_forward field to dynamo_compile scuba table

    Differential Revision:
    D64438144
    
    Original commit changeset: 87a5518d4d93
    
    Original Phabricator Diff: D64438144
    
    fbshipit-source-id: 3acb559a632ce345a1c3c88edc9007c0a9e5d40c
    huydhn authored and facebook-github-bot committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    f7dc0c7 View commit details
    Browse the repository at this point in the history
  5. adding aggregates to servicelab

    Summary: current aggregation does not seem to be working as expected. Adding another aggregation field before changing the previous over
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64616616
    
    fbshipit-source-id: 676f09035e0d4427e9b60e9ed8f8c790782f0aec
    adamomainz authored and facebook-github-bot committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    0a9cd8f View commit details
    Browse the repository at this point in the history
  6. specifying logged benchmark name for tritonBench servicelab logging

    Summary: more specific logging in our logging table based on servicelab benchmark names
    
    Reviewed By: nmacchioni
    
    Differential Revision: D64627855
    
    fbshipit-source-id: 47e250c5d8a34a912e7885e1f997a90a9dd8bc10
    adamomainz authored and facebook-github-bot committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    e737b8f View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2024

  1. replace uses of np.ndarray with npt.NDArray

    Summary:
    X-link: pytorch/opacus#681
    
    X-link: pytorch/captum#1389
    
    X-link: pytorch/botorch#2586
    
    X-link: pytorch/audio#3846
    
    This replaces uses of `numpy.ndarray` in type annotations with `numpy.typing.NDArray`. In Numpy-1.24.0+ `numpy.ndarray` is annotated as generic type. Without template parameters it triggers static analysis errors:
    ```counterexample
    Generic type `ndarray` expects 2 type parameters.
    ```
    `numpy.typing.NDArray` is an alias that provides default template parameters.
    
    Reviewed By: ryanthomasjohnson
    
    Differential Revision: D64619891
    
    fbshipit-source-id: dffc096b1ce90d11e73d475f0bbcb8867ed9ef01
    igorsugak authored and facebook-github-bot committed Oct 19, 2024
    Configuration menu
    Copy the full SHA
    06e35fc View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2024

  1. Disable torch function compilation during guard execution and in comp…

    …iled bytecode (#137669)
    
    Summary:
    Fixes pytorch/pytorch#114369
    
    X-link: pytorch/pytorch#137669
    Approved by: https://github.com/anijain2305
    
    Reviewed By: wdvr
    
    Differential Revision: D64675139
    
    Pulled By: mlazos
    
    fbshipit-source-id: a5e4501eaa781fcbd9423c99c555949182bd9f24
    mlazos authored and facebook-github-bot committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    0562040 View commit details
    Browse the repository at this point in the history
  2. fixing key error in aggregate data

    Summary: for some reason  OSS isnt happy with dict.get so im moving to this slightly less pythonic but more exact approach
    
    Reviewed By: bertmaher, sfzhu93
    
    Differential Revision: D64698791
    
    fbshipit-source-id: 48cc4b6f7df61287efdc71c30176c2830dfde110
    adamomainz authored and facebook-github-bot committed Oct 21, 2024
    Configuration menu
    Copy the full SHA
    a21b30e View commit details
    Browse the repository at this point in the history

Commits on Oct 22, 2024

  1. Replace __str__ with __repr__ in some places (#136316)

    Summary:
    ## The problem
    
    In a typical debugger, `repr()` is used to display variables and not `str()`.
    
    Several classes in Dynamo have a `__str__()` method that returns useful information and a  `__repr__()` that does not. Having to call `str(x)` or `[str(i) for i in x]` in the debugger all the time is a chore.
    
    `str()` should be ["informal, nicely printable"](https://docs.python.org/3/library/stdtypes.html#str) and `repr()` should ["attempt to return a string that would yield an object with the same value when passed to eval()](https://docs.python.org/3/library/functions.html#repr)".
    
    ## The solution
    
    In the Python object model, if there is no `__str__` method, `__repr__`  is used instead (but not the other way around).
    
    So renaming `__str__` to `__repr__` in a few cases where no `__repr__` method exists now should not change observable behavior, and should make debugging easier.
    
    The specific classes changed were all in `torch._dynamo.variables`:
    
    * `builtin.BuiltinVariable`
    * `constant.ConstantVariable`
    * `constant.EnumVariable`
    * `functions.UserMethodVariable`
    * `lazy.LazyVariableTracker`
    * `lazy.LazySymNodeFormatString`
    * `misc.GetAttrVariable`
    * `misc.NullVariable`
    * `user_defined.UserDefinedObjectVariable`
    
    X-link: pytorch/pytorch#136316
    Approved by: https://github.com/XuehaiPan, https://github.com/jansel
    
    Reviewed By: wdvr
    
    Differential Revision: D64714511
    
    fbshipit-source-id: 322f2f0110e5b45afe6a27c52a0bcc91d91d1d6a
    rec authored and facebook-github-bot committed Oct 22, 2024
    Configuration menu
    Copy the full SHA
    173774d View commit details
    Browse the repository at this point in the history
  2. Update requirements.txt (pytorch#2523)

    Summary:
    attempt to fix dependencies - this is no longer compatible with the latest huggingface_hub, see failing test at  https://github.com/pytorch/pytorch/actions/runs/11445304501/job/31843081598
    
    Pull Request resolved: pytorch#2523
    
    Reviewed By: huydhn
    
    Differential Revision: D64711662
    
    Pulled By: wdvr
    
    fbshipit-source-id: eed9143e6e0531840a53ba5ab3fad04894727272
    wdvr authored and facebook-github-bot committed Oct 22, 2024
    Configuration menu
    Copy the full SHA
    a45e0db View commit details
    Browse the repository at this point in the history
  3. Fixes to prep for weights_only default flip (pytorch#2514)

    Summary:
    Some fixes for pytorch/pytorch#137602
    
    Pull Request resolved: pytorch#2514
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64628614
    
    Pulled By: mikaylagawarecki
    
    fbshipit-source-id: edebf25cc6648919d5673a3baeaffdac26e5b91f
    mikaylagawarecki authored and facebook-github-bot committed Oct 22, 2024
    Configuration menu
    Copy the full SHA
    fb590d9 View commit details
    Browse the repository at this point in the history
  4. typing compile_fx.py (#138033)

    Summary:
    Type annotations for compile_fx.
    - Some of the stuff here is pretty complicated (functions which return functions that take functions) so I bailed on those and used `Any` just to get the rest landed.
    - There are also changes to type signatures in other files which I did just to let mypy know more about the types in compile_fx.py.
    
    X-link: pytorch/pytorch#138033
    Approved by: https://github.com/Skylion007
    
    Reviewed By: wdvr
    
    Differential Revision: D64714765
    
    Pulled By: aorenste
    
    fbshipit-source-id: 262f5cb9b2171e96ce9f895772bd5778ddb4ebe0
    aorenste authored and facebook-github-bot committed Oct 22, 2024
    Configuration menu
    Copy the full SHA
    1154318 View commit details
    Browse the repository at this point in the history
  5. Add metadata to events in progress, new dynamo event

    Summary:
    X-link: pytorch/pytorch#138477
    
    This diff does a few things:
    
    ## Add metadata to events in progress
    Adds the ability to add extra metadata to Chromium Events via `add_event_data`.
    Metadata can only be added to chromium events that have started, but not ended (so, in progress events)
    - When you add the data, the metadata is appended to the metadata when you call log_event_end().
    - The metadata appears in chromium events in tlparse. It also gets logged to scuba.
    
    ## New `dynamo` chromium event
    We add a new `dynamo` chromium event to the top of the stack, where we collect various metadata found in dynamo_compile. So the new order of events goes:
    
    ```
    __start__
    -> dynamo (dynamo compile metrics)
    -> entire_frame_compile (compile.inner)
    -> backend_compile (i.e. aotdispatch)
    -> create_aot_dispatch_function
    -> inductor_compile
    -> ...
    ```
    
    BackwardCompilationMetrics doesn't have any dynamo specific information (as it's mostly inductor timings). So we don't include that here.
    
    *FAQ: Why can't we use `entire_frame_compile` as the event?*
    This is mostly due to backward compatibility with `dynamo_compile`. `dynamo_compile` collects CompilationMetrics outside of `compile.compile_inner`, and uses `dynamo_timed` to grab timings from phases of the compiler, including `entire_frame_compile`. So we don't have a CompilationMetric object until after an `entire_frame_compile` event ends! Separately, `dynamo` as a name for all of dynamo compile is more descriptive than `entire_frame_compile`, imo.
    
    ## Log metadata as separate columns
    (Meta only): Separately, this also changes the `metadata` column in PT2 Compile Events. Instead of logging a single metadata column in JSON, it separates the JSON into separate columns. This is much better for data analysis. Now that this table is more mature, I think logging keys to separate columns is a better system.
    ghstack-source-id: 249373269
    
    Reviewed By: aorenste
    
    Differential Revision: D64696287
    
    fbshipit-source-id: 441f57e2d1c0210e81c06eb86d4482e95bed4971
    jamesjwu authored and facebook-github-bot committed Oct 22, 2024
    Configuration menu
    Copy the full SHA
    8fce9c1 View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2024

  1. Log is_forward field to dynamo_compile scuba table (#138505)

    Summary:
    X-link: pytorch/pytorch#138505
    Approved by: https://github.com/oulgen
    
    Reviewed By: oulgen
    
    Differential Revision: D64711721
    
    Pulled By: masnesral
    
    fbshipit-source-id: 488dd527d0b9179644ae5d6d45d88bdab0224032
    masnesral authored and facebook-github-bot committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    e57bbe2 View commit details
    Browse the repository at this point in the history
  2. Compiled autograd configs in TLS (#137821)

    Summary:
    Multithreaded doesn't work yet, this adds python side TLS only for the python side state
    
    X-link: pytorch/pytorch#137821
    Approved by: https://github.com/jansel, https://github.com/yf225
    ghstack dependencies: #137953
    
    Reviewed By: wdvr
    
    Differential Revision: D64796212
    
    Pulled By: xmfan
    
    fbshipit-source-id: aa1d9ef8f6e61207dfb352866e37d5e7cc98df42
    xmfan authored and facebook-github-bot committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    0e03831 View commit details
    Browse the repository at this point in the history
  3. tls access helpers (#138061)

    Summary:
    X-link: pytorch/pytorch#138061
    Approved by: https://github.com/yf225
    ghstack dependencies: #137953, #137821
    
    Reviewed By: wdvr
    
    Differential Revision: D64796226
    
    Pulled By: xmfan
    
    fbshipit-source-id: 9bf80c1492d7a800a308cb1e99fac63c4752fc52
    xmfan authored and facebook-github-bot committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    405ba75 View commit details
    Browse the repository at this point in the history
  4. adding fp32 strict and tf32x3 benchmarks for gemm

    Summary:
    TSIA
    
    draft diff while I move this to its own op
    
    Reviewed By: danzimm
    
    Differential Revision: D64781204
    
    fbshipit-source-id: c3ddd956230c1e4c8166867f03b5a28e8d6586e9
    adamomainz authored and facebook-github-bot committed Oct 23, 2024
    Configuration menu
    Copy the full SHA
    036012f View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2024

  1. Support range_iterator as a function input (#138657)

    Summary:
    Fixes pytorch/pytorch#138654
    
    X-link: pytorch/pytorch#138657
    Approved by: https://github.com/williamwen42, https://github.com/jansel
    
    Reviewed By: wdvr
    
    Differential Revision: D64881833
    
    Pulled By: anijain2305
    
    fbshipit-source-id: 46bcffa12ef2bec0ff47a1b60323aacbb3a90872
    anijain2305 authored and facebook-github-bot committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    367b6ef View commit details
    Browse the repository at this point in the history
  2. Support overridden __call__ on nn modules (#138619)

    Summary:
    X-link: pytorch/pytorch#138619
    Approved by: https://github.com/williamwen42
    ghstack dependencies: #138657
    
    Reviewed By: wdvr
    
    Differential Revision: D64881836
    
    Pulled By: anijain2305
    
    fbshipit-source-id: 1974dbc228618e8597eb6ab293272ee985964f52
    anijain2305 authored and facebook-github-bot committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    b5b342b View commit details
    Browse the repository at this point in the history
  3. updating hardware and device columns

    Summary: currently device and hardware are flipped in logging table due to args mismatch
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64911847
    
    fbshipit-source-id: 2d75b17046eae2eed0d83f86140ad88dae26de29
    adamomainz authored and facebook-github-bot committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    3245fde View commit details
    Browse the repository at this point in the history
  4. Release 2.5.1.yaml perf test (pytorch#2525)

    Summary: Pull Request resolved: pytorch#2525
    
    Reviewed By: kit1980
    
    Differential Revision: D64912654
    
    Pulled By: atalman
    
    fbshipit-source-id: 74cf57574c7ed5e1b6a4fee4b9c2de745deb21c0
    atalman authored and facebook-github-bot committed Oct 24, 2024
    Configuration menu
    Copy the full SHA
    47ba1ed View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2024

  1. Account for older numpy versions in pytorch#2514 (pytorch#2524)

    Summary: Pull Request resolved: pytorch#2524
    
    Reviewed By: kit1980
    
    Differential Revision: D64771621
    
    Pulled By: mikaylagawarecki
    
    fbshipit-source-id: 545f3d528cfbe2668c8d37e98e99423cd77a8e8e
    mikaylagawarecki authored and facebook-github-bot committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    4f30c49 View commit details
    Browse the repository at this point in the history
  2. fixing gemm for amd

    Summary: getting gemm operator to work for amd
    
    Reviewed By: danzimm, xuzhao9
    
    Differential Revision: D64976612
    
    fbshipit-source-id: 20aaf30732211848996a3575ca7356f514ed912c
    adamomainz authored and facebook-github-bot committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    65e5f68 View commit details
    Browse the repository at this point in the history
  3. Add logger logging for remote fx graph cache get + put (pytorch#2512)

    Summary:
    Pull Request resolved: pytorch#2512
    
    X-link: pytorch/pytorch#138164
    
    Capture the timing for the remote fx graph cache get and put operations and add them to the logger logging.
    
    Reviewed By: ezyang, oulgen
    
    Differential Revision: D64484025
    
    fbshipit-source-id: 3ac8dad8f7083d7eefaa6f092d7703488a8bc41f
    masnesral authored and facebook-github-bot committed Oct 25, 2024
    Configuration menu
    Copy the full SHA
    2614ca9 View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2024

  1. pytorch/benchmark:bisection

    Reviewed By: xuzhao9
    
    Differential Revision: D64683154
    
    fbshipit-source-id: 70d359538572947c15184255fe5b2e69f61ab04a
    generatedunixname89002005287564 authored and facebook-github-bot committed Oct 26, 2024
    Configuration menu
    Copy the full SHA
    f6f1249 View commit details
    Browse the repository at this point in the history
  2. pytorch/benchmark:utils

    Reviewed By: xuzhao9
    
    Differential Revision: D64683332
    
    fbshipit-source-id: f132eda07a1cde19116ce18f5b400d896df53612
    generatedunixname89002005287564 authored and facebook-github-bot committed Oct 26, 2024
    Configuration menu
    Copy the full SHA
    f8a4e51 View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2024

  1. Update Typeguard to TypeIs for better type inference (#133814)

    Summary:
    Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/
    
    X-link: pytorch/pytorch#133814
    Approved by: https://github.com/ezyang
    
    Reviewed By: wdvr
    
    Differential Revision: D65030974
    
    fbshipit-source-id: 6e04f555c9ac4a60d7f53ab23ad3b60b82de5d48
    Skylion007 authored and facebook-github-bot committed Oct 27, 2024
    Configuration menu
    Copy the full SHA
    bd23811 View commit details
    Browse the repository at this point in the history
  2. Use guard_manager consistently instead of check_fn (#138896)

    Summary:
    X-link: pytorch/pytorch#138896
    Approved by: https://github.com/williamwen42, https://github.com/jansel
    ghstack dependencies: #138512
    
    Reviewed By: wdvr
    
    Differential Revision: D65030963
    
    Pulled By: anijain2305
    
    fbshipit-source-id: 7423473e4c3613aea42e13a64eae9c417c876964
    anijain2305 authored and facebook-github-bot committed Oct 27, 2024
    Configuration menu
    Copy the full SHA
    34ea1a1 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2024

  1. Fix naming for AMD in fp8 rowwise fbgemm

    Summary: Select CK or Cutlass based on the arch.
    
    Reviewed By: xuzhao9
    
    Differential Revision: D65060122
    
    fbshipit-source-id: 3406e4852efe30883474d4bbb2315ffe4c54e211
    karthik-man authored and facebook-github-bot committed Oct 28, 2024
    Configuration menu
    Copy the full SHA
    713f800 View commit details
    Browse the repository at this point in the history
  2. Back out "tls access helpers (#138061)" and Back out "[compiled autog…

    …rad] Compiled autograd configs in TLS (#137821)"
    
    Summary:
    X-link: pytorch/pytorch#139086
    
    Original commit changeset: 9bf80c1492d7
    
    Original Phabricator Diff: D64796226
    
    Original commit changeset: aa1d9ef8f6e6
    
    Original Phabricator Diff: D64796212
    
    Reviewed By: malfet, kflu
    
    Differential Revision: D65072644
    
    fbshipit-source-id: 50ad138fc216653987a80ea6ae3efeaf5c04f949
    xmfan authored and facebook-github-bot committed Oct 28, 2024
    Configuration menu
    Copy the full SHA
    47e3138 View commit details
    Browse the repository at this point in the history

Commits on Oct 29, 2024

  1. Switch times to us in CompilationMetrics and improvements (#138975)

    Summary:
    Companion logger diff: https://www.internalfb.com/diff/D65012523
    
    * Using float seconds for timestamps is bad because our internal system defaults to float32 precision and you don't even get second precision for timestamps in float32
    * We decide to use microseconds instead of milliseconds because millisecond granularity you can end up with the same timestamp if compilation is happening very quickly; much better to force non-overlapping spans
    * Because there are so many new fields and I don't feel like reimplementing each on BwdCompilationMetrics, BwdCompilationMetrics is no more, it's just that everything in CompilationMetrics is now optional.
    * The actual frame compile times collection is not modified (still float) to reduce blast radius, so I just convert to microseconds before making the record. At float64 precision (Python's default), you get about microsecond precision on timestamps so shouldn't be a data problem (https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html)
    * I rename some entries for clarity. In particular, whenever a timing contains all of the its lower phases (e.g., how Inductor also contains Triton compilation) we put "cumulative" in its name.  If something doesn't happen at compile time but is delayed until we have actual real inputs, we put "runtime" in its name.
    
    X-link: pytorch/pytorch#138975
    Approved by: https://github.com/masnesral
    
    Reviewed By: huydhn
    
    Differential Revision: D65088198
    
    Pulled By: ezyang
    
    fbshipit-source-id: 0b901357ab649f052a3553fe8d0cc37fba80e197
    ezyang authored and facebook-github-bot committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    4ad2712 View commit details
    Browse the repository at this point in the history
  2. add some cpython debugging methods (#138030)

    Summary:
    This PR enables you to inspect PyObjects in C using `INSPECT(...)` without requiring https://docs.python.org/3/howto/gdb_helpers.html. `torch._dynamo.eval_frame.raise_sigtrap` can also be used to set gdb breakpoints while running Python code, e.g.
    
    ```python
    x = x + 1
    torch._dynamo.eval_frame.raise_sigtrap();
    # can breakpoint on ceval.c:CALL to breakpoint the `sin` call in C.
    x = torch.sin(x)
    ```
    
    X-link: pytorch/pytorch#138030
    Approved by: https://github.com/jansel
    
    Reviewed By: huydhn
    
    Differential Revision: D65104659
    
    Pulled By: williamwen42
    
    fbshipit-source-id: aa2f3f9c34a1ee15160ccc82bf61c740b3ac6d20
    williamwen42 authored and facebook-github-bot committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    438f82b View commit details
    Browse the repository at this point in the history
  3. Set use_cuda_graphs in fp8_gemm_rowwise

    Summary: The default value for use_cuda_graphs was changed to False in D64471087 and this caused slowdowns in triton/ck kernels for fp8_gemm_rowwise.
    
    Reviewed By: danzimm
    
    Differential Revision: D65140285
    
    fbshipit-source-id: 4ab77537afeb9108dab7cdef6cac34aaa39d7d73
    karthik-man authored and facebook-github-bot committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    870be9b View commit details
    Browse the repository at this point in the history
  4. Remove hammer/generative_recommenders (pytorch#2526)

    Summary:
    Pull Request resolved: pytorch#2526
    
    X-link: pytorch-labs/tritonbench#19
    
    As title
    
    Reviewed By: xuzhao9, LinjianMa
    
    Differential Revision: D65069124
    
    fbshipit-source-id: 1ee736396fecc76d606e637fee7a8127603d9d7e
    xing-liu authored and facebook-github-bot committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    4d6e0fa View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2024

  1. Fix type for "--iter" flag (pytorch#2528)

    Summary: Pull Request resolved: pytorch#2528
    
    Reviewed By: xuzhao9
    
    Differential Revision: D64935089
    
    fbshipit-source-id: 8b0aa81513a3c6a58e8876475ec63041d362d42a
    Cao Gao authored and facebook-github-bot committed Oct 31, 2024
    Configuration menu
    Copy the full SHA
    a0890b0 View commit details
    Browse the repository at this point in the history
  2. Add start event metadata to collected metadata for PT2 Compile Events

    Summary:
    X-link: pytorch/pytorch#139289
    
    We should be logging metadata from event starts to PT2 Compile Events too.
    ghstack-source-id: 250444771
    
    Reviewed By: oulgen
    
    Differential Revision: D65070086
    
    fbshipit-source-id: 63b934bff4254871e15a615e5aa47112b032b143
    jamesjwu authored and facebook-github-bot committed Oct 31, 2024
    Configuration menu
    Copy the full SHA
    0c8a0f6 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2024

  1. Optimize PT2 Compile Events ingestion and column formats

    Summary:
    X-link: pytorch/pytorch#139309
    
    Per discussion from https://fb.workplace.com/groups/1286739428954016/posts/1360522894909002
    
    This diff considerably changes the column format of PT2 Compile Events. We only log to scuba for a set of dynamo_timed() events that we actually care about aggregating. To do so, we add a boolean to dynamo_timed() that decides whether or not to log a pt2_compile_event. We'll always log a chromium_event for every dynamo_timed(), but only log a subset of those to scuba.
    
    Logging all metadata into a metadata column saves space and ingestion because for any new rows that are not the same event, you don't get N new empty column markers. It comes at the cost of having to create new derived columns in the Scuba UI for using all the extra metadata we care about. But that's a tradeoff we're willing to make here, considering other tables like dynamo_compile exists.
    
    ghstack-source-id: 251214365
    exported-using-ghexport
    
    Reviewed By: oulgen
    
    Differential Revision: D65225598
    
    fbshipit-source-id: 01569a79174ed3699063dbd8bb26b883c6a7b0c4
    jamesjwu authored and facebook-github-bot committed Nov 1, 2024
    Configuration menu
    Copy the full SHA
    a66ce04 View commit details
    Browse the repository at this point in the history
  2. Add isolate mode

    Summary: When benchmarking across multiple operators, we can optionally isolate each operator run in a child process.
    
    Reviewed By: FindHao
    
    Differential Revision: D65154665
    
    fbshipit-source-id: 9c9a21a76897084b061374cb3f7d8524a4aaac9b
    xuzhao9 authored and facebook-github-bot committed Nov 1, 2024
    Configuration menu
    Copy the full SHA
    cc094df View commit details
    Browse the repository at this point in the history

Commits on Nov 4, 2024

  1. Classify miss-inplaced tensors in logs.

    Summary:
    X-link: pytorch/pytorch#139240
    
    use signpost logs, a followup is to remove the field possibly_missed_reinplacing_opportunities form dynamo compile table.
    
    Reviewed By: zou3519
    
    Differential Revision: D65180194
    
    fbshipit-source-id: 20fe80f209a15573b2184e4cf7ed2be3c2a4ab94
    laithsakka authored and facebook-github-bot committed Nov 4, 2024
    Configuration menu
    Copy the full SHA
    86a366e View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2024

  1. Switch OSS dashboard to use aoti_compile_and_package (#139597)

    Summary:
    Reland pytorch/pytorch#139154
    
    X-link: pytorch/pytorch#139597
    Approved by: https://github.com/angelayi
    
    Reviewed By: ZainRizvi
    
    Differential Revision: D65455707
    
    Pulled By: desertfire
    
    fbshipit-source-id: 691882e606754fc04cb826a14bdfe94cb465ece8
    desertfire authored and facebook-github-bot committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    4a42e06 View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2024

  1. Specialize symfloats that flow through is_integer (#139572)

    Summary:
    Fixes `python test/dynamo/test_dynamic_shapes.py DynamicShapesFunctionTests.test_number_method_method_is_integer_num_type6_dynamic_shapes` when specialize_float = False
    
    X-link: pytorch/pytorch#139572
    Approved by: https://github.com/ezyang
    ghstack dependencies: #139569, #139457, #139568
    
    Reviewed By: ZainRizvi
    
    Differential Revision: D65492888
    
    Pulled By: bobrenjc93
    
    fbshipit-source-id: 9a9881caa5905686c44d8508ce5edab46ab03f28
    bobrenjc93 authored and facebook-github-bot committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    3d3b7bb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5790e68 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    645c043 View commit details
    Browse the repository at this point in the history
  4. Update run.py

    juliagmt-google authored Nov 6, 2024
    Configuration menu
    Copy the full SHA
    a78431f View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2024

  1. Update run.py

    juliagmt-google authored Nov 11, 2024
    Configuration menu
    Copy the full SHA
    79bc6af View commit details
    Browse the repository at this point in the history
  2. Update run.py

    juliagmt-google authored Nov 11, 2024
    Configuration menu
    Copy the full SHA
    efb4b07 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4b5c733 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2ccc92a View commit details
    Browse the repository at this point in the history