Update hetero dist relabel #284

kgajdamo · 2023-12-01T15:33:41Z

Due to the fact that the implementation of distributed training for hetero has changed, it is also necessary to change the dist hetero relabel neighborhood function.

Related pytorch_geometric PR: #8503

Changes made:
- num_sampled_neighbors_per_node dictionary currently store information about the number of sampled neighbors for each layer separately: 
const c10::Dict<rel_type, std::vector<int64_t>>&num_sampled_neighbors_per_node_dict -> const c10::Dict<rel_type, std::vector<std::vector<int64_t>>>&num_sampled_neighbors_per_node_dict
- The method of mapping nodes has also been changed. This is now done layer by layer.
- After each layer, the range of src nodes for each edge type for the next layer is calculated and the offsets for edge types having the same src node types must be the same.
- The src node range for each edge type in a given layer is defined by a dictionary srcs_slice_dict. Local src nodes (sampled_rows) will be created on its basis and the starting value of the next layer will be the end value from the previous layer.

codecov · 2023-12-01T15:39:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a5fcc87) 86.47% compared to head (c65a353) 86.47%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #284   +/-   ##
=======================================
  Coverage   86.47%   86.47%           
=======================================
  Files          35       35           
  Lines        1213     1213           
=======================================
  Hits         1049     1049           
  Misses        164      164

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The purpose of this PR is to improve distributed hetero sampling algorithm. **IMPORTANT INFO**: This PR is complementary with [#284](pyg-team/pyg-lib#284) from pyg-lib. The pyg-lib one needs to be merged for this one to work properly. **Description:** (sorry if too long) Distributed hetero neighbor sampling is a procedure analogous to homo sampling, but more complicated due to the presence of different types of nodes and edges. Sampling in distributed training imitates the `hetero_neighbor_sample()` function in pyg-lib. Therefore, the mechanism of action and the nomenclature of variables are similar. Due to the fact that in distributed training, after sampling each layer, it is necessary to synchronize the results between machines, the loop iterating through the layers was implemented in Python. The main two loops iterate sequentially: over layers and edge types. Inside the loop, the `sample_one_hop()` function is called, which performs sampling for one layer. The input to the `sample_one_hop()` function is data of a specific type, so its execution is almost identical to homo. The sample_one_hop() function, depending on whether the input nodes are located on a given partition or a remote one, performs sampling or sends an RPC request to the remote machine to do so. The `dist_neighbor_sample()`->`neighbor_sample()` function is used for sampling. Nodes are sampled with duplicates so that they can later be used to construct local to global node mappings. When all machines have finished sampling, their outputs are merged and synchronized in the same way as for homo. Then the results return to the `node_sample()` function where they are written to the output dictionaries and the src nodes for the next layer are calculated. After going through all the layers, the global node indices are finally mapped to the local ones in the `hetero_dist_relabel()` function. Information about some of the variables used in a node_sample() function: `node_dict` - class storing information about nodes. It has three fields: `src`, `with_dupl`, `out`, which are described in more detail in the distributed/utils.py file. `batch_dict` - class used when sampling with the disjoint option. It stores information about the affiliation of nodes to subgraphs. Just like `node_dict`, it has three fields: `src`, `with_dupl`, `out`. `sampled_nbrs_per_node_dict` - a dictionary that stores information about the number of sampled neighbors by each src node. To facilitate subsequent operations, for each edge type is additionally divided into layers. `num_sampled_nodes_dict`, `num_sampled_edges_dict` - needed for HGAM to work. --------- Co-authored-by: rusty1s <[email protected]>

hetero dist relabel update

c65a353

kgajdamo requested review from rusty1s, ZhengHongming888 and JakubPietrakIntel December 1, 2023 15:33

kgajdamo mentioned this pull request Dec 1, 2023

Update DistNeighborSampler for hetero pyg-team/pytorch_geometric#8503

Merged

rusty1s approved these changes Dec 4, 2023

View reviewed changes

rusty1s assigned kgajdamo Dec 4, 2023

rusty1s added 0 - Priority P0 refactor sampler skip-changelog labels Dec 4, 2023

rusty1s merged commit d2370c2 into pyg-team:master Dec 4, 2023
15 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update hetero dist relabel #284

Update hetero dist relabel #284

kgajdamo commented Dec 1, 2023 •

edited

Loading

codecov bot commented Dec 1, 2023

Update hetero dist relabel #284

Update hetero dist relabel #284

Conversation

kgajdamo commented Dec 1, 2023 • edited Loading

codecov bot commented Dec 1, 2023

Codecov Report

kgajdamo commented Dec 1, 2023 •

edited

Loading