Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xccl-p2p #3

Open
wants to merge 2,367 commits into
base: xccl-group
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2367 commits
Select commit Hold shift + click to select a range
c1c94cb
Build magma binary tarballs for various cuda (#139888)
afrittoli Nov 8, 2024
a33fa37
[ROCm] Support new AMD triton stream pipeliner (#139881)
jataylo Nov 8, 2024
838958d
[inductor] Support autotune restore_value for user-defined Triton ker…
aakhundov Nov 7, 2024
7167323
Fix type description of torch.chunk (#140089)
zeshengzong Nov 8, 2024
ee7c3db
Enable inductor-rocm workflow for all trunk commits AND inductor-rela…
jithunnair-amd Nov 8, 2024
1d2d9f0
Give the magma build job id-token write permissions (#140141)
afrittoli Nov 8, 2024
dd79d2f
Removing warning for Windows Arm64 (#139746)
iremyux Nov 8, 2024
22cd1ee
[CD] Enable 3.13 triton build (#140137)
atalman Nov 8, 2024
9d99dce
Fix split decomp returning self (#140065)
eellison Nov 7, 2024
fc6496c
Revert "Enable inductor-rocm workflow for all trunk commits AND induc…
pytorchmergebot Nov 8, 2024
1868fc6
[AOTI] Update C++ runner API to take a const vector (#139955)
desertfire Nov 7, 2024
8690f60
Extract value_type-generic NEON Vectorized<Half> functions to CRTP ba…
swolchok Nov 4, 2024
8d61add
Add Vectorized<c10::BFloat16> specialization for ARM (#139090)
swolchok Nov 4, 2024
dfcf740
Fix traceback.format_exception(...) positional arguments error. (#140…
etaf Nov 8, 2024
7fa94f0
Fix typo in associative_scan tests (#139929)
bohnstingl Nov 8, 2024
bbd427f
[dynamo] switch to get_framelocals_mapping for 3.11 (#139950)
williamwen42 Nov 7, 2024
d18bca4
[dynamo] switch to get_framelocals_mapping for 3.10 and below (#140037)
williamwen42 Nov 7, 2024
a140e65
[dynamo] Support method with different __self__ on user defined objec…
anijain2305 Nov 7, 2024
e6c5a77
[dynamo][guards] Profile guard manager in C++ (#140110)
anijain2305 Nov 8, 2024
07ad746
Revert "[Reland] Use static_assert to detect get_type_index used in d…
pytorchmergebot Nov 8, 2024
63a0d65
[AOTI] Update the OSS tutorial (#139956)
desertfire Nov 7, 2024
119e069
[ez] Add .lintrunner.private.toml to .gitignore (#140166)
clee2000 Nov 8, 2024
411203e
Revert D65490202 (#140142)
ZainRizvi Nov 8, 2024
3483f78
Revert "Fix typo in associative_scan tests (#139929)"
pytorchmergebot Nov 8, 2024
80d0356
Revert "Make Context to be Device-agnostic Step by Step (2/N) (#136526)"
pytorchmergebot Nov 8, 2024
a772451
Revert "[Inductor][CPU] Fuse SmoothQuant int8 linear pattern (#139595)"
pytorchmergebot Nov 8, 2024
347f960
Revert "[cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136…
pytorchmergebot Nov 8, 2024
95198f8
Remove uses of deleted operations (#139447)
exclamaforte Nov 8, 2024
f3cbf67
[CD] Build aarch64 wheels without conda (#140093)
malfet Nov 8, 2024
1cdaf1d
correctly keep track of processed tensors for foreach reductions (#14…
ngimel Nov 8, 2024
ac6b6c6
[BE][CI] Use `pip3` instead of `pip` (#140185)
malfet Nov 8, 2024
44f6d14
Unbreak vec128_half_neon comparison without FP16 hardware support (#1…
swolchok Nov 4, 2024
7f0bf9f
Move bf16_gemv_trans to ReducedPrecisionFloatGemvFastPathKernel (#139…
swolchok Nov 4, 2024
25c469b
Build bf16 gemv fast path & entry points for non-ARM architectures to…
swolchok Nov 4, 2024
cc44b55
Hook up bf16_gemv_trans to x86 bf16 GEMM (#139220)
swolchok Nov 4, 2024
e3b2f04
[c10d][Logging] Remove args and kwargs from c10d logging (#140169)
fduwjj Nov 8, 2024
1659e24
[experimental] async-tp impl with cutlass-based, progress aware kerne…
yifuwang Nov 6, 2024
2af5172
fix dynamo tracking numpy 2 ops (#138686)
haifeng-jin Nov 8, 2024
beae772
Revert "Tighten type hints for tensor arithmetic (#135392)"
pytorchmergebot Nov 8, 2024
ea0f60e
[Dynamo] allow dynamic callables on tensor variables (#137940)
mlazos Nov 8, 2024
1400fed
Revert "add supports_coalescing property in c10d::Backend to determi…
pytorchmergebot Nov 8, 2024
a02e88d
[miniz] Bump miniz version to 3.0.2 and add patch for zip64 (#140041)
larryliu0820 Nov 9, 2024
7eb6617
Revert "Fix split decomp returning self (#140065)"
pytorchmergebot Nov 9, 2024
090b778
Clarify meaning of rate parameter in Gamma distribution (#134847)
psteinb Nov 9, 2024
58b661c
Revert "[c10d][Logging] Remove args and kwargs from c10d logging (#14…
pytorchmergebot Nov 9, 2024
263d8f7
[8/N] Don't skip ASAN on some tests (#140081)
cyyever Nov 9, 2024
0b8652a
[dynamo] Remove `NestedUserFunctionVariable.closure_scope` (#140033)
StrongerXi Nov 8, 2024
de40a23
[dynamo] Remove dead code path for capturing `__class__` in `UserFunc…
StrongerXi Nov 8, 2024
be172d2
[pt2, docs] Add new PT2 troubleshooting doc (#138620)
williamwen42 Nov 5, 2024
9c678af
Misc. non-contig NJT fixes (#140160)
jbschlosser Nov 8, 2024
2ede4c9
[Partitioner] Enumerate partitions by iterating partition ids (#136598)
lingzhiz1998 Nov 9, 2024
72976b2
Use manylinux-builder images with main tag (#140158)
afrittoli Nov 9, 2024
4893e24
[DTensor][Test] Remove safe global context for weights_only torch.loa…
wz337 Nov 8, 2024
4488e23
Fix another item memo loss location + bool specialization bug (#139587)
bobrenjc93 Nov 8, 2024
f915409
FlopCounterMode: Decompose ops for inference mode (#138508)
Feuermagier Nov 9, 2024
3b8470c
add special case for __round__ constant variables (#139583)
bobrenjc93 Nov 8, 2024
032135f
[2/N] Turn inline static functions into static (#140068)
cyyever Nov 9, 2024
8b2e385
Make size a property with an assertion (#139794)
mlazos Nov 9, 2024
d2d1258
Speed up AMD AOT Inductor lowering by memoizing hipify trie to regex …
kflu Nov 9, 2024
ab55a99
Use TORCH_DECLARE_XXX (#139952)
cyyever Nov 9, 2024
e2e425b
[CUDAGraph] Add dynamo timer to checkpoint, warmup, and record (#139818)
BoyuanFeng Nov 9, 2024
0b650c3
Build magma for windows (#139924)
afrittoli Nov 9, 2024
929a647
[Intel GPU] Support RegisterXPU.cpp codegen and compile for the in-tr…
etaf Nov 9, 2024
191971e
[AOTI] Introduce an extensibility mechanism for the c shim codegen to…
etaf Nov 9, 2024
8051ee8
Add XPU compiler version control in cmake to keep BC (#139258)
guangyey Nov 9, 2024
052b67e
Add torch.version.xpu (#139466)
guangyey Nov 9, 2024
5107d24
[c10d][Logging] Remove args and kwargs from c10d logging (#140169)
fduwjj Nov 9, 2024
a2ac96c
[BE] Rectify some references to caffe2 (#140204)
malfet Nov 9, 2024
7d4f5f7
[Environment Variable][6/N] Use thread-safe getenv functions (#140200)
cyyever Nov 9, 2024
f89b2b9
Refactor conda-builder -> almalinux-builder (#140157)
atalman Nov 9, 2024
5ef33e4
Add size param check of unfold (#139965)
zeshengzong Nov 9, 2024
4f6b30b
Add testing for the utils surrounding dynamo_timed (#140094)
masnesral Nov 9, 2024
94c9bb7
[Inductor] [CPP] Update BRGEMM parameters for Half cpp gemm template …
CaoE Nov 10, 2024
c3087ac
Update torch-xpu-ops commit pin (#139986)
xytintel Nov 10, 2024
d90c25e
OpenReg: Support event (#140111)
Zhenbin-8 Nov 10, 2024
ffb9790
[7/N] Fix Wextra-semi warning (#140225)
cyyever Nov 10, 2024
2f3a5a1
[SymmetricMemory] improve the API for stream_write_value32 (#139934)
yifuwang Nov 7, 2024
565a794
Recover non-standard bool test for msort (#139870)
xw285cornell Nov 11, 2024
20b60b1
correct get kvs
Chao1Han Nov 11, 2024
cb15c15
[logging] Overhaul dynamo_timed and CompilationMetrics logging. (#139…
masnesral Nov 9, 2024
ceb44b2
[FR] Enable best effort parital analysis and verbose mode for trace p…
fduwjj Nov 7, 2024
d4b8857
[codecache][triton 3.2] hash -> base64 conversion for triton 3.2 (#14…
davidberard98 Nov 11, 2024
04b5b4a
Add base class for single-subgraph inductor HOPs (#139898)
zou3519 Nov 8, 2024
63715f6
S390x update builder image (#132983)
AlekseiNikiforovIBM Nov 11, 2024
f5ffd55
[MPS] Add `torch.special.i1` op (#140196)
malfet Nov 11, 2024
2fe110f
[BE][MPS] Standardize indexing shader compilation (#140271)
malfet Nov 11, 2024
5f4a21d
Revert "[SymmetricMemory] improve the API for stream_write_value32 (#…
pytorchmergebot Nov 11, 2024
96b6418
Delete Buck1 as it is no longer supported (#140067)
mcr229 Nov 11, 2024
0a0915f
[SymmetricMemory] improve the API for stream_write_value32 (#139934)
yifuwang Nov 7, 2024
e7ec294
NJT OpInfo tests v2 (#138370)
jbschlosser Nov 11, 2024
7f1e248
[Dynamo] Replace `torch._dynamo.optimize()` with `torch.compile()` [1…
shink Nov 11, 2024
10e40dd
[aoti][tooling] Add support to debug printing for all AOTI model run …
YUNQIUGUO Nov 11, 2024
001f736
[ROCm] Correct numerical issues in layer norm backwards kernel (#140259)
jataylo Nov 11, 2024
780b28f
[ONNX] Update docstring typo in building (#140281)
justinchuby Nov 11, 2024
115c58c
Update ET pin for #6744 (#140199)
huydhn Nov 11, 2024
2c77352
[AOTI][refactor] Clean up call chain in wrapper codegen (#136531)
desertfire Nov 8, 2024
d4cdc09
ILP for auto FSDP wrapping (#140298)
xuanzhang816 Nov 11, 2024
5f7ea7c
[invoke_subgraph] Support symint/int as inputs (#140058)
anijain2305 Nov 8, 2024
b742d11
[TD] Filepath heuristic also looks at file name (#140170)
clee2000 Nov 11, 2024
a290c1d
Fix building with system GLOO (#140275)
nlbrown2 Nov 11, 2024
5eb1cca
[dynamo][user-defined] Walk __mro__ to get the member descriptor sour…
anijain2305 Nov 11, 2024
2817fe8
Add unaligned attributes to q8gemm/4x4c2-sse2.c (#140188)
k-sheridan Nov 11, 2024
412df50
Revert "[dynamo] Remove dead code path for capturing `__class__` in `…
pytorchmergebot Nov 11, 2024
222175b
Revert "[Partitioner] Enumerate partitions by iterating partition ids…
pytorchmergebot Nov 11, 2024
a96aadf
fix specialization logic in Scalar.h (#140280)
bobrenjc93 Nov 11, 2024
c223e06
Tighten type hints for tensor arithmetic (#135392)
fzimmermann89 Nov 11, 2024
0af38b1
Remove temp table to post autograd IR (#140085)
tugsbayasgalan Nov 8, 2024
dbb55b4
Revert "[7/N] Fix Wextra-semi warning (#140225)"
pytorchmergebot Nov 12, 2024
e76f57d
add missing bracket in error message (#140307)
wdvr Nov 12, 2024
6438c86
[inductor] Refactor reduction type choices into V.choices (#139585)
jansel Nov 10, 2024
29114e4
[inductor] Support fixed triton configs defined at compile time (#140…
jansel Nov 10, 2024
263a5bf
[cpu] Modify inductor opt flag --- ftree-loop-vectorize (#136827)
Valentine233 Nov 12, 2024
4e487ed
Add linters for C10_UNUSED and C10_NODISCARD (#140302)
ezyang Nov 11, 2024
e21ee63
[Intel GPU] format XPU oneDNN integration codes (#139721)
ZhiweiYan-96 Nov 11, 2024
19eff28
[Intel GPU] Extract common utils for conv&qconv (#139580)
ZhiweiYan-96 Nov 11, 2024
455dc4c
Allow NJT by default for weights_only torch.load (#140304)
mikaylagawarecki Nov 11, 2024
b442419
update kvs key
Chao1Han Nov 12, 2024
23db92b
[FR] refactor build collective and return more info to db (#140082) (…
fduwjj Nov 12, 2024
469eae2
[inductor][invoke_subgraph] Fix SDPA seed/offset issue (#140070)
anijain2305 Nov 11, 2024
09bab75
Revert "Allow NJT by default for weights_only torch.load (#140304)"
pytorchmergebot Nov 12, 2024
965555d
[dynamo] Remove dead code path for capturing `__class__` in `UserFunc…
StrongerXi Nov 12, 2024
330c957
[Inductor] make decompose_mm_pass support cpu case (#139696)
hl475 Nov 12, 2024
a104b56
fix trace nn.parameters() (#138149)
majian4work Nov 12, 2024
9a5175e
fix shared submodule module call signature (#139438)
avikchaudhuri Nov 11, 2024
7691064
dispatcher module for multiple graphs (#139439)
avikchaudhuri Nov 11, 2024
f77eb07
Split int4wo weight packing (#139611)
yanbing-j Nov 12, 2024
ff91fcc
Refactor device index bound check for xpu code (#120768)
guangyey Nov 12, 2024
51e8a13
CD Enable Python 3.13 on windows (#138095)
atalman Nov 12, 2024
78a8f7f
[FSDP2] Fix CUDA sync for bf16 HSDP AR, fp32 params (#140044)
awgu Nov 7, 2024
057f0dc
Don't use sudo to checkout sources (#140263)
AlekseiNikiforovIBM Nov 12, 2024
761b42b
cpp_wrapper_cpu: Ensure reinterpret_view results in RAIIAtenTensorHan…
benjaminglass1 Nov 11, 2024
fef16fe
Enable all fixed cpp_wrapper tests (#139412)
benjaminglass1 Nov 11, 2024
8cb0b93
Fix broken AOTInductor node and kernel counts (#139435)
benjaminglass1 Nov 11, 2024
71d8bb7
implement `torch._foreach_rsqrt` (#134574)
crcrpar Nov 12, 2024
92fb1f7
[BE] Test interspersed empty tensors for _foreach_norm test parity (#…
janeyx99 Nov 11, 2024
213b8ef
[BE] add empty tensor testing for _foreach_addcmul/div (#140276)
janeyx99 Nov 11, 2024
faef151
Add batch rule for `native_dropout_backward` (#140140)
guilhermeleobas Nov 11, 2024
d723abf
[CI]Move CPU inductor test runners and cases to save cost (#136313)
zxd1997066 Nov 12, 2024
60db702
Noop m.set_python_module on C10_MOBILE builds (#140273)
zou3519 Nov 12, 2024
7a02457
[BE] Fix error message in torch._scaled_mm (#140343)
malfet Nov 12, 2024
5aadaaf
[Dynamo] Allow `filter()` to handle infinite iterator (#138305)
shink Nov 12, 2024
928b8ec
[BE]: Add pointwise tag to isfinite (#140291)
Skylion007 Nov 12, 2024
a3cff4b
[Environment Variable][7/N] Use thread-safe getenv functions (#140211)
cyyever Nov 12, 2024
e4195f8
Revert "[logging][ez] Add timer logging for pickling and unpickle for…
pytorchmergebot Nov 12, 2024
7624d62
[Reland][7/N] Fix Wextra-semi warning (#140342)
cyyever Nov 12, 2024
6a368b3
Add ScalarList overload to `_foreach_lerp` (#134482)
crcrpar Nov 12, 2024
c182c7c
Fix `triangular_solve` meta function out parameter names. (#140186)
ysiraichi Nov 12, 2024
726424f
Use base32 triton cache function if base64 is not found (#140297)
fulvius31 Nov 12, 2024
cc8e832
[AMD] use DC method for linalg.eigh (#140327)
Mellonta Nov 12, 2024
8304a1f
OpenReg: Fix issue when casting tensor on the executor size (#140255)
Zhenbin-8 Nov 12, 2024
c0ddd10
Revert "[inductor] Support fixed triton configs defined at compile ti…
pytorchmergebot Nov 12, 2024
069a710
Revert "[inductor] Refactor reduction type choices into V.choices (#1…
pytorchmergebot Nov 12, 2024
034b105
[BE][Ez]: Add NT unary op macro (#140213)
Skylion007 Nov 12, 2024
14bb49f
Add CUDA 12.6 Linux Builds to Binaries Matrix (#138899)
tinglvv Nov 12, 2024
1172a10
[Build] Do not regenerate code endlessly without XPU (#140438)
malfet Nov 12, 2024
4675875
Fix lint after #138899 (#140446)
atalman Nov 12, 2024
0db21a6
Remove most rockset references (#139922)
clee2000 Nov 12, 2024
8c6abe5
[aoti] Remove dir after packaging (#140022)
angelayi Nov 12, 2024
1f590fe
[AOTI][refactor] Update codegen_int_array_var API (#140299)
desertfire Nov 11, 2024
2ac71a5
[pipelining] add type checking to _backward functions (#140019)
H-Huang Nov 12, 2024
7578a0b
[pipelining] clean up stage functions (#140418)
H-Huang Nov 12, 2024
267641f
[Profiler] Add More Logging for Dynamic Collection API (#140285)
sraikund16 Nov 12, 2024
70a223c
[aotinductor] fix a few issues in bandwidth profiler (#139607)
frank-wei Nov 12, 2024
096929c
Add safe.directory to Almalinux docker image (#140454)
atalman Nov 12, 2024
1f28235
Allow NJT by default for weights_only torch.load (#140304)
mikaylagawarecki Nov 12, 2024
d48ea29
Revert "[aoti] Remove dir after packaging (#140022)"
pytorchmergebot Nov 12, 2024
3e82b1f
Build magma tarball for cuda 126 (#140143)
afrittoli Nov 12, 2024
3d2dd14
[BE][Bugfix]: Add rad2deg to pointwise ops (#140290)
Skylion007 Nov 13, 2024
891ba2e
Fix xpu cmake typo (#140374)
guangyey Nov 12, 2024
4906413
[Intel GPU] Support RegisterSparseXPU.cpp codegen. (#139267)
xiaowangintel Nov 13, 2024
fb7148d
Fix split decomp returning self (#140065)
eellison Nov 12, 2024
40fb738
Use Wextra-semi (#140236)
cyyever Nov 13, 2024
953286b
[DTensorTestbase] Fix `@with_comms` inactive problem (#139637)
tsunghsienlee Nov 13, 2024
d3da6d4
Add `cmake` to requirements.txt (#140491)
malfet Nov 13, 2024
8dc3cb0
[dynamo] Put cells into `closure_cells` and document relevant parts (…
StrongerXi Nov 13, 2024
698ff07
[dynamo] Fix name collision bug for captured cells and locals (#140036)
StrongerXi Nov 13, 2024
6a821c9
[dynamo] Remove cell unboxing/restart optimization (#140149)
StrongerXi Nov 13, 2024
d34d5cc
[dynamo] Fix some corner cases for modeling pre-existing cells (#140150)
StrongerXi Nov 13, 2024
3a622c5
[dynamo] Refine `LocalSource.cell_or_freevar` to `LocalSource.is_inpu…
StrongerXi Nov 13, 2024
6561591
[dynamo] Fix bugs in side-effect pruning and codegen (#140201)
StrongerXi Nov 13, 2024
39d1c91
[dynamo] Restrict support for `out=` variants of torch operators (#14…
StrongerXi Nov 13, 2024
659d213
Add architecture to XPU device property (#138186)
guangyey Nov 12, 2024
e9fb2c6
Add some error messages for flexattention (#138891)
Chillee Nov 12, 2024
4bbd6da
Enable XPUEvent elapsed_time function (#134666)
guangyey Nov 12, 2024
3e277eb
[pytorch/profiler] Profiler NCCL metadata can now contain collective …
sanrise Nov 12, 2024
42ad54c
[Intel GPU] Allow XPU device in LSTMCell operators (#140246)
yucai-intel Nov 13, 2024
d6b3ad4
[Dynamo] Replace `torch._dynamo.optimize()` with `torch.compile()` [2…
shink Nov 13, 2024
4c6eebf
[doc] improve code in fake tensor doc (#140329)
ssnl Nov 13, 2024
1886e33
Use device-agnostic runtime API in distributed DDP/FSDP instead of `c…
zhangxiaoli73 Nov 13, 2024
7b0d199
[doc] fix grammar in "Extending Torch" (#140209)
ssnl Nov 13, 2024
79fb741
[Intel GPU] Add device guard for XPU structured operator in torchgen …
xytintel Nov 13, 2024
5b1c67c
[Intel GPU] Avoid atomic add for XPU device in satter_add by determin…
PenghuiCheng Nov 13, 2024
8a80cee
[Dynamo] Replace `torch._dynamo.optimize()` with `torch.compile()` [3…
shink Nov 13, 2024
f06ee3e
[pt2] Add meta for _add_relu (#140009)
pralay-das Nov 13, 2024
cb71bcc
Replace clone.detach with detach.clone (#140264)
zeshengzong Nov 13, 2024
c61ccaf
[FR] Polish the log message for dtype mismatch and don't exit when to…
fduwjj Nov 13, 2024
e754611
[aoti] Add error msg if we can't find a proxy executor (#140308)
angelayi Nov 13, 2024
ba136a7
[aoti] Remove dir after packaging (#140022)
angelayi Nov 13, 2024
97d995a
Revert "[pytorch/profiler] Profiler NCCL metadata can now contain col…
pytorchmergebot Nov 13, 2024
34743d8
Support dlpack for privateuse1 (#135331)
hipudding Nov 13, 2024
4a18e26
Revert "[Environment Variable][7/N] Use thread-safe getenv functions …
pytorchmergebot Nov 13, 2024
c6a29fc
Revert "[Environment Variable][4/N] Use thread-safe getenv functions …
pytorchmergebot Nov 13, 2024
a8a1e58
[inductor] Log how compile_threads is set (#139771)
masnesral Nov 5, 2024
b4cc5d3
Revert "[aoti] Remove dir after packaging (#140022)"
pytorchmergebot Nov 13, 2024
5dc6b8c
Revert "Allow NJT by default for weights_only torch.load (#140304)"
pytorchmergebot Nov 13, 2024
a58a565
Revert "[Environment Variable][6/N] Use thread-safe getenv functions …
pytorchmergebot Nov 13, 2024
3d61801
Fix RMSNorm Notation: Parentheses, Indices, Comma (#140215)
d-kleine Nov 13, 2024
2675ef8
Revert " [Environment Variable][5/N] Use thread-safe getenv functions…
pytorchmergebot Nov 13, 2024
1a8752b
[TorchScript] bindings for torch._C.ClassType.method_names() (#140444)
davidberard98 Nov 12, 2024
03cccaa
Doc: Rewrite the storage.rst file to emphasize untyped storages (#140…
vmoens Nov 13, 2024
22dfb5b
[dynamo, 3.13] replace deprecated PyWeakref_GetObject (#140187)
williamwen42 Nov 12, 2024
c98ef02
[dynamo] add SymNode bitwise and/or (#138777)
williamwen42 Nov 12, 2024
42622cf
enable concat linear with mkldnn linear by flag (#139048)
zhuhaozhe Nov 13, 2024
d63eb3c
Revert "[logging] Overhaul dynamo_timed and CompilationMetrics loggin…
pytorchmergebot Nov 13, 2024
51e0996
Add missing pytorch-linux-jammy-py3.12-triton-cpu Docker image (#140571)
huydhn Nov 13, 2024
f3a6832
[inductor] Skip autotuning config on ptxas error (#140495)
aakhundov Nov 13, 2024
0f739b8
[Codemod] `skipIfMps`->`skipIfMPS` (#140562)
malfet Nov 13, 2024
c25999b
Revert "Add missing pytorch-linux-jammy-py3.12-triton-cpu Docker imag…
pytorchmergebot Nov 13, 2024
82597d0
type annotations for meta_utils (#140203)
aorenste Nov 13, 2024
c8be6f1
[codemod] Remove unused-variable in pytorch (#140569)
r-barnes Nov 13, 2024
49c124f
dynamo: guard on FSDP module parameters (#138819)
bdhirsh Nov 1, 2024
ba8568f
[c10d][logging] Add wait counter for time spent in object to tensor a…
fduwjj Nov 12, 2024
c1bf714
[Profiler] Fix ASAN Overflow Issues (#140441)
sraikund16 Nov 13, 2024
a8de849
OpenReg: Export the number of devices (#140492)
Zhenbin-8 Nov 13, 2024
26fde11
Refactor user-defined triton kernel source code collection (#140577)
oulgen Nov 13, 2024
274f4cf
[3/x][fx minimizer] Support all_outputs in minimizer (#139774)
zejunh Nov 13, 2024
08acfcd
[ez] Fix check labels error when deleting comment (#140578)
clee2000 Nov 13, 2024
9d93c27
Implement unfold_backward on MPS (#135411)
malfet Nov 13, 2024
70060b0
Add proper parse_tensor_constants support (#140558)
antoniojkim Nov 13, 2024
b34bb1f
Add support for parsing torch.Generator in JIT (#140489)
antoniojkim Nov 13, 2024
2f1dbfe
Logging Refactor - Remove Print Statements (#139782)
basilwong Nov 13, 2024
f1e045e
Update torch-xpu-ops commit pin (#140277)
xytintel Nov 13, 2024
9c75475
Add missing pytorch-linux-jammy-py3.12-triton-cpu Docker image (#140571)
huydhn Nov 13, 2024
f85e433
[ONNX] Remove the contiguous patch (#140428)
justinchuby Nov 14, 2024
70acf02
Use Manylinux2_28 for wheel builds (#138732)
atalman Nov 14, 2024
e2b7f0b
clarifies the wording in the main README to make it clearer that visu…
fmgblackwolf Nov 14, 2024
85deef9
[AOTI][refactor] Rename generate_extern_kernel_alloc_and_find_schema_…
desertfire Nov 12, 2024
80870f6
[AOTI][refactor] Switch remaining aoti_torch_get_data_ptr (#140448)
desertfire Nov 12, 2024
c6c0554
[EZ] Delete `linux-focal-cuda12_1-py3_10-gcc9-bazel-test` (#140659)
malfet Nov 14, 2024
77da050
[executorch hash update] update the pinned executorch hash (#139588)
pytorchupdatebot Nov 14, 2024
b1d6250
[ONNX] Use TracedONNXFunction op signature to promote inputs to tenso…
titaiwangms Nov 14, 2024
8d3a07e
[Inductor UT] Skip test_decompose_mem_bound_mm.py for XPU since we ha…
etaf Nov 13, 2024
3ce75e7
[Inductor UT] Fix duplicate registration of custom ops amount test ca…
etaf Nov 13, 2024
d32eac8
Put a compile lock around backward compile (#140626)
oulgen Nov 14, 2024
e608301
fix test_float_to_int_conversion_nonfinite for NumPy 2 (#138131)
haifeng-jin Nov 14, 2024
99c8d5a
Don't pass credentials explicitly to sccache (#140611)
malfet Nov 14, 2024
62eea62
[Quant][Onednn] add linear_dynamic_fp16 ops (#140376)
Xia-Weiwen Nov 13, 2024
0aedc00
Merge remote-tracking branch 'origin/main' into xccl-p2p
Chao1Han Nov 14, 2024
65e0d9d
WA AVG reduction
Chao1Han Nov 14, 2024
3e97e67
update test case
Chao1Han Nov 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
26 changes: 0 additions & 26 deletions .buckconfig.oss

This file was deleted.

19 changes: 19 additions & 0 deletions .ci/aarch64_linux/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Aarch64 (ARM/Graviton) Support Scripts
Scripts for building aarch64 PyTorch PIP Wheels. These scripts build the following wheels:
* torch
* torchvision
* torchaudio
* torchtext
* torchdata
## Aarch64_ci_build.sh
This script is design to support CD operations within PyPi manylinux aarch64 container, and be executed in the container. It prepares the container and then executes __aarch64_wheel_ci_build.py__ to build the wheels. The script "assumes" the PyTorch repo is located at: ```/pytorch``` and will put the wheels into ```/artifacts```.
### Usage
```DESIRED_PYTHON=<PythonVersion> aarch64_ci_build.sh```

__NOTE:__ CI build is currently __EXPERMINTAL__

## Build_aarch64_wheel.py
This app allows a person to build using AWS EC3 resources and requires AWS-CLI and Boto3 with AWS credentials to support building EC2 instances for the wheel builds. Can be used in a codebuild CD or from a local system.

### Usage
```build_aarch64_wheel.py --key-name <YourPemKey> --use-docker --python 3.8 --branch <RCtag>```
39 changes: 39 additions & 0 deletions .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/bash
set -eux -o pipefail

GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}

SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
source $SCRIPTPATH/aarch64_ci_setup.sh

tagged_version() {
GIT_DESCRIBE="git --git-dir /pytorch/.git describe --tags --match v[0-9]*.[0-9]*.[0-9]*"
if ${GIT_DESCRIBE} --exact >/dev/null; then
${GIT_DESCRIBE}
else
return 1
fi
}

if tagged_version >/dev/null; then
export OVERRIDE_PACKAGE_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"
fi

###############################################################################
# Run aarch64 builder python
###############################################################################
cd /
# adding safe directory for git as the permissions will be
# on the mounted pytorch repo
git config --global --add safe.directory /pytorch
pip install -r /pytorch/requirements.txt
pip install auditwheel
if [ "$DESIRED_CUDA" = "cpu" ]; then
echo "BASE_CUDA_VERSION is not set. Building cpu wheel."
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
else
echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA"
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
fi
23 changes: 23 additions & 0 deletions .ci/aarch64_linux/aarch64_ci_setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash
set -eux -o pipefail

# This script is used to prepare the Docker container for aarch64_ci_wheel_build.py python script
# By creating symlinks from desired /opt/python to /usr/local/bin/

NUMPY_VERSION=2.0.2
PYGIT2_VERSION=1.15.1
if [[ "$DESIRED_PYTHON" == "3.13" ]]; then
NUMPY_VERSION=2.1.2
PYGIT2_VERSION=1.16.0
fi

SCRIPTPATH="$( cd "$(dirname "$0")" ; pwd -P )"
source $SCRIPTPATH/../manywheel/set_desired_python.sh

pip install -q numpy==${NUMPY_VERSION} pyyaml==6.0.2 scons==4.7.0 ninja==1.11.1 patchelf==0.17.2 pygit2==${PYGIT2_VERSION}

for tool in python python3 pip pip3 ninja scons patchelf; do
ln -sf ${DESIRED_PYTHON_BIN_DIR}/${tool} /usr/local/bin;
done

python --version
230 changes: 230 additions & 0 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
#!/usr/bin/env python3
# encoding: UTF-8

import os
import shutil
from subprocess import check_call, check_output
from typing import List

from pygit2 import Repository


def list_dir(path: str) -> List[str]:
"""'
Helper for getting paths for Python
"""
return check_output(["ls", "-1", path]).decode().split("\n")


def build_ArmComputeLibrary() -> None:
"""
Using ArmComputeLibrary for aarch64 PyTorch
"""
print("Building Arm Compute Library")
acl_build_flags = [
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
acl_install_dir = "/acl"
acl_checkout_dir = "ComputeLibrary"
os.makedirs(acl_install_dir)
check_call(
[
"git",
"clone",
"https://github.com/ARM-software/ComputeLibrary.git",
"-b",
"v24.09",
"--depth",
"1",
"--shallow-submodules",
]
)

check_call(
["scons", "Werror=1", "-j8", f"build_dir=/{acl_install_dir}/build"]
+ acl_build_flags,
cwd=acl_checkout_dir,
)
for d in ["arm_compute", "include", "utils", "support", "src"]:
shutil.copytree(f"{acl_checkout_dir}/{d}", f"{acl_install_dir}/{d}")


def update_wheel(wheel_path) -> None:
"""
Update the cuda wheel libraries
"""
folder = os.path.dirname(wheel_path)
wheelname = os.path.basename(wheel_path)
os.mkdir(f"{folder}/tmp")
os.system(f"unzip {wheel_path} -d {folder}/tmp")
libs_to_copy = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
"/usr/local/cuda/lib64/libcudnn.so.9",
"/usr/local/cuda/lib64/libcublas.so.12",
"/usr/local/cuda/lib64/libcublasLt.so.12",
"/usr/local/cuda/lib64/libcudart.so.12",
"/usr/local/cuda/lib64/libcufft.so.11",
"/usr/local/cuda/lib64/libcusparse.so.12",
"/usr/local/cuda/lib64/libcusparseLt.so.0",
"/usr/local/cuda/lib64/libcusolver.so.11",
"/usr/local/cuda/lib64/libcurand.so.10",
"/usr/local/cuda/lib64/libnvToolsExt.so.1",
"/usr/local/cuda/lib64/libnvJitLink.so.12",
"/usr/local/cuda/lib64/libnvrtc.so.12",
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.4",
"/usr/local/cuda/lib64/libcudnn_adv.so.9",
"/usr/local/cuda/lib64/libcudnn_cnn.so.9",
"/usr/local/cuda/lib64/libcudnn_graph.so.9",
"/usr/local/cuda/lib64/libcudnn_ops.so.9",
"/usr/local/cuda/lib64/libcudnn_engines_runtime_compiled.so.9",
"/usr/local/cuda/lib64/libcudnn_engines_precompiled.so.9",
"/usr/local/cuda/lib64/libcudnn_heuristic.so.9",
"/lib64/libgomp.so.1",
"/usr/lib64/libgfortran.so.5",
"/acl/build/libarm_compute.so",
"/acl/build/libarm_compute_graph.so",
]
if enable_cuda:
libs_to_copy += [
"/usr/local/lib/libnvpl_lapack_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_blas_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_lapack_core.so.0",
"/usr/local/lib/libnvpl_blas_core.so.0",
]
else:
libs_to_copy += [
"/opt/OpenBLAS/lib/libopenblas.so.0",
]
# Copy libraries to unzipped_folder/a/lib
for lib_path in libs_to_copy:
lib_name = os.path.basename(lib_path)
shutil.copy2(lib_path, f"{folder}/tmp/torch/lib/{lib_name}")
os.system(
f"cd {folder}/tmp/torch/lib/; "
f"patchelf --set-rpath '$ORIGIN' --force-rpath {folder}/tmp/torch/lib/{lib_name}"
)
os.mkdir(f"{folder}/cuda_wheel")
os.system(f"cd {folder}/tmp/; zip -r {folder}/cuda_wheel/{wheelname} *")
shutil.move(
f"{folder}/cuda_wheel/{wheelname}",
f"{folder}/{wheelname}",
copy_function=shutil.copy2,
)
os.system(f"rm -rf {folder}/tmp/ {folder}/cuda_wheel/")


def complete_wheel(folder: str) -> str:
"""
Complete wheel build and put in artifact location
"""
wheel_name = list_dir(f"/{folder}/dist")[0]

if "pytorch" in folder and not enable_cuda:
print("Repairing Wheel with AuditWheel")
check_call(["auditwheel", "repair", f"dist/{wheel_name}"], cwd=folder)
repaired_wheel_name = list_dir(f"/{folder}/wheelhouse")[0]

print(f"Moving {repaired_wheel_name} wheel to /{folder}/dist")
os.rename(
f"/{folder}/wheelhouse/{repaired_wheel_name}",
f"/{folder}/dist/{repaired_wheel_name}",
)
else:
repaired_wheel_name = wheel_name

print(f"Copying {repaired_wheel_name} to artifacts")
shutil.copy2(
f"/{folder}/dist/{repaired_wheel_name}", f"/artifacts/{repaired_wheel_name}"
)

return repaired_wheel_name


def parse_arguments():
"""
Parse inline arguments
"""
from argparse import ArgumentParser

parser = ArgumentParser("AARCH64 wheels python CD")
parser.add_argument("--debug", action="store_true")
parser.add_argument("--build-only", action="store_true")
parser.add_argument("--test-only", type=str)
parser.add_argument("--enable-mkldnn", action="store_true")
parser.add_argument("--enable-cuda", action="store_true")
return parser.parse_args()


if __name__ == "__main__":
"""
Entry Point
"""
args = parse_arguments()
enable_mkldnn = args.enable_mkldnn
enable_cuda = args.enable_cuda
repo = Repository("/pytorch")
branch = repo.head.name
if branch == "HEAD":
branch = "master"

print("Building PyTorch wheel")
build_vars = "MAX_JOBS=5 CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
os.system("cd /pytorch; python setup.py clean")

override_package_version = os.getenv("OVERRIDE_PACKAGE_VERSION")
if override_package_version is not None:
version = override_package_version
build_vars += (
f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={version} PYTORCH_BUILD_NUMBER=1 "
)
elif branch in ["nightly", "master"]:
build_date = (
check_output(["git", "log", "--pretty=format:%cs", "-1"], cwd="/pytorch")
.decode()
.replace("-", "")
)
version = (
check_output(["cat", "version.txt"], cwd="/pytorch").decode().strip()[:-2]
)
if enable_cuda:
desired_cuda = os.getenv("DESIRED_CUDA")
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={version}.dev{build_date}+{desired_cuda} PYTORCH_BUILD_NUMBER=1 "
else:
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={version}.dev{build_date} PYTORCH_BUILD_NUMBER=1 "
elif branch.startswith(("v1.", "v2.")):
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1:branch.find('-')]} PYTORCH_BUILD_NUMBER=1 "

if enable_mkldnn:
build_ArmComputeLibrary()
print("build pytorch with mkldnn+acl backend")
build_vars += (
"USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
"ACL_ROOT_DIR=/acl "
"LD_LIBRARY_PATH=/pytorch/build/lib:/acl/build:$LD_LIBRARY_PATH "
"ACL_INCLUDE_DIR=/acl/build "
"ACL_LIBRARY=/acl/build "
)
if enable_cuda:
build_vars += "BLAS=NVPL "
else:
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/OpenBLAS "
else:
print("build pytorch without mkldnn backend")

os.system(f"cd /pytorch; {build_vars} python3 setup.py bdist_wheel")
if enable_cuda:
print("Updating Cuda Dependency")
filename = os.listdir("/pytorch/dist/")
wheel_path = f"/pytorch/dist/{filename[0]}"
update_wheel(wheel_path)
pytorch_wheel_name = complete_wheel("/pytorch/")
print(f"Build Complete. Created {pytorch_wheel_name}..")
Loading