All notable changes to this project will be documented in this file. The format is based on Keep a Changelog.
- Added PyTorch 2.4 support (#338)
- Added PyTorch 2.3 support (#322)
- Added Windows support (#315)
- Added macOS Apple Silicon support (#310)
- Added PyTorch 2.2 support (#294)
- Added
softmax_csr
implementation (#264, #282) - Added support for edge-level sampling (#280)
- Added support for
bfloat16
data type insegment_matmul
andgrouped_matmul
(CPU only) (#272)
- Dropped the MKL code path when sampling neighbors with
replace=False
since it does not correctly prevent duplicates (#275) - Added
--biased
parameter to run benchmarks for biased sampling (#267) - Improved speed of biased sampling (#270)
- Fixed
grouped_matmul
when tensors are not contiguous (#290)
- Added PyTorch 2.1 support (#256)
- Added low-level support for distributed neighborhood sampling (#246, #252, #253, #254)
- Added support for homogeneous and heterogeneous biased neighborhood sampling (#247, #251)
- Added dispatch for XPU device in
index_sort
(#243) - Added
metis
partitioning (#229) - Enable
hetero_neighbor_samplee
to work in parallel (#211)
- Fixed vector-based mapping issue in
Mapping
(#244) - Fixed performance issues reported by Coverity Tool (#240)
- Updated
cutlass
version for speed boosts insegment_matmul
andgrouped_matmul
(#235) - Drop nested tensor wrapper for
grouped_matmul
implementation (#226) - Added
generate_range_of_ints
function (it uses MKL library in order to generate ints) to RandintEngine class (#222) - Fixed TorchScript support in
grouped_matmul
(#220)
- Added PyTorch 2.0 support (#214)
neighbor_sample
routines now also return information about the number of sampled nodes/edges per layer (#197)- Added
index_sort
implementation (#181, #192) - Added
triton>=2.0
support (#171) - Added
bias
term togrouped_matmul
andsegment_matmul
(#161) - Added
sampled_op
implementation (#156, #159, #160)
- Improved
[segment|grouped]_matmul
GPU implementation by reducing launch overheads (#213) - Sample the nodes with the same timestamp as seed nodes (#187)
- Added
write-csv
(saves benchmark results as csv file) andlibraries
(determines which libraries will be used in benchmark) parameters (#167) - Enable benchmarking of neighbor sampler on temporal graphs (#165)
- Improved
[segment|grouped]_matmul
CPU implementation viaat::matmul_out
and MKL BLASgemm_batch
(#146, #172)
- Added PyTorch 1.13 support (#145)
- Added native PyTorch support for
grouped_matmul
(#137) - Added
fused_scatter_reduce
operation for multiple reductions (#141, #142) - Added
triton
dependency (#133, #134) - Enable
pytest
testing (#132) - Added C++-based autograd and TorchScript support for
segment_matmul
(#120, #122) - Allow overriding
time
for seed nodes viaseed_time
inneighbor_sample
(#118) - Added
[segment|grouped]_matmul
CPU implementation (#111) - Added
temporal_strategy
option toneighbor_sample
(#114) - Added benchmarking tool (Google Benchmark) along with
pyg::sampler::Mapper
benchmark example (#101) - Added CSC mode to
pyg::sampler::neighbor_sample
andpyg::sampler::hetero_neighbor_sample
(#95, #96) - Speed up
pyg::sampler::neighbor_sample
viaIndexTracker
implementation (#84) - Added
pyg::sampler::hetero_neighbor_sample
implementation (#90, #92, #94, #97, #98, #99, #102, #110) - Added
pyg::utils::to_vector
implementation (#88) - Added support for PyTorch 1.12 (#57, #58)
- Added
grouped_matmul
andsegment_matmul
CUDA implementations viacutlass
(#51, #56, #61, #64, #69, #73, #123) - Added
pyg::sampler::neighbor_sample
implementation (#54, #76, #77, #78, #80, #81), #85, #86, #87, #89) - Added
pyg::sampler::Mapper
utility for mapping global to local node indices (#45, #83) - Added benchmark script (#45, #79, #82, #91, #93, #106)
- Added download script for benchmark data (#44)
- Added
biased sampling
utils (#38) - Added
CHANGELOG.md
(#39) - Added
pyg.subgraph()
(#31) - Added nightly builds (#28, #36)
- Added
rand
CPU engine (#26, #29, #32, #33) - Added
pyg.random_walk()
(#21, #24, #25) - Added documentation via
readthedocs
(#19, #20) - Added code coverage report (#15, #16, #17, #18)
- Added
CMakeExtension
support (#14) - Added test suite via
gtest
(#13) - Added
clang-format
linting viapre-commit
(#12) - Added
CMake
support (#5) - Added
pyg.cuda_version()
(#4)
- Allow different types for graph and timestamp data (#143)
- Fixed dispatcher in
hetero_neighbor_sample
(#125) - Require sorted neighborhoods according to time in temporal sampling (#108)
- Only sample neighbors with a strictly earlier timestamp than the seed node (#104)
- Prevent absolute paths in wheel (#75)
- Improved installation instructions (#68)
- Replaced std::unordered_map with a faster phmap::flat_hash_map (#65)
- Fixed versions of
checkout
andsetup-python
in CI (#52) - Make use of the
pyg_sphinx_theme
documentation template (#47) - Auto-compute number of threads and blocks in CUDA kernels (#41)
- Optional return types in
pyg.subgraph()
(#40) - Absolute headers (#30)
- Use
at::equal
rather thanat::all
in tests (#37) - Build
*.so
extension on Mac instead of*.dylib
(#107)