Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding task-based simulator for PCGs #1365

Open
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

Marsella8
Copy link
Contributor

@Marsella8 Marsella8 commented Apr 12, 2024

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:


This change is Reviewable

@Marsella8 Marsella8 requested review from lockshaw and wmdi April 12, 2024 20:18
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Stylistically it matches the existing code, which is much appreciated 🙂

Obviously tests need to get finished off and there are probably some minor edge cases to be handled, but as an initial draft this looks really good

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/src/machine_mapping.cc line 123 at r1 (raw file):

  std::unordered_set<UpwardOpenMultiDiEdge> incoming_edges =
      get_incoming_edges(g, node);
  std::vector<ParallelTensorShape> inputs = transform(

If the ordering here doesn't matter why is this a vector?


lib/compiler/src/machine_mapping.cc line 124 at r1 (raw file):

      get_incoming_edges(g, node);
  std::vector<ParallelTensorShape> inputs = transform(
      as_vector(incoming_edges), [&](UpwardOpenMultiDiEdge const &input_edge) {

Suggestion:

  as_vector(get_incoming_edges(g, node)), [&](UpwardOpenMultiDiEdge const &input_edge) {

lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):

  // Filling the frontier
  for (auto &[edge, _] : frontier_machine_views) {

Suggestion:

  for (auto const &[edge, _] : frontier_machine_views) {

lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):

  // Filling the frontier
  for (auto &[edge, _] : frontier_machine_views) {

Maybe cleaner to just get the set of nodes with no incoming closed edges?


lib/compiler/src/machine_mapping.cc line 176 at r1 (raw file):

      if (std::all_of(devices.begin(), devices.end(), [&occupied](int d) {
            return occupied[d] == false
          })) {

See all_of in containers.h

Code quote:

      if (std::all_of(devices.begin(), devices.end(), [&occupied](int d) {
            return occupied[d] == false
          })) {

lib/compiler/test/src/test_parallel_cost_estimator.cpp line 10 at r1 (raw file):

  TEST_CASE("parallel_estimate_cost") {
    // Test graph structure
    //     /-- 1 --\

Add a test case that starts with the simpler case of a straight line? Might be an easier starting point and I think should have a clean analytical answer you can check against


lib/compiler/test/src/test_parallel_cost_estimator.cpp line 11 at r1 (raw file):

    // Test graph structure
    //     /-- 1 --\
        //  0 -         - 3

Suggestion:

  //  0 -         - 3

lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):

std::unordered_map<Node, std::unordered_set<Node>>
    get_successors(DiGraphView const &g,

Alternatively, implement this as get_predecessors on the flipped graph

Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 4 of 5 files reviewed, 8 unresolved discussions (waiting on @lockshaw and @Marsella8)

Copy link
Contributor Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 7 files reviewed, 8 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/machine_mapping.cc line 124 at r1 (raw file):

      get_incoming_edges(g, node);
  std::vector<ParallelTensorShape> inputs = transform(
      as_vector(incoming_edges), [&](UpwardOpenMultiDiEdge const &input_edge) {

Done.


lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):

  // Filling the frontier
  for (auto &[edge, _] : frontier_machine_views) {

Done.


lib/compiler/src/machine_mapping.cc line 161 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Maybe cleaner to just get the set of nodes with no incoming closed edges?

Done.


lib/compiler/src/machine_mapping.cc line 176 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

See all_of in containers.h

Done.


lib/compiler/test/src/test_parallel_cost_estimator.cpp line 10 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Add a test case that starts with the simpler case of a straight line? Might be an easier starting point and I think should have a clean analytical answer you can check against

Done.


lib/compiler/test/src/test_parallel_cost_estimator.cpp line 11 at r1 (raw file):

    // Test graph structure
    //     /-- 1 --\
        //  0 -         - 3

Done.

Copy link
Collaborator

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r3, 8 of 8 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @lockshaw and @Marsella8)

Copy link
Contributor Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 13 files reviewed, 8 unresolved discussions (waiting on @lockshaw, @Marsella8, and @wmdi)


lib/compiler/src/machine_mapping.cc line 123 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

If the ordering here doesn't matter why is this a vector?

Done.


lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Alternatively, implement this as get_predecessors on the flipped graph

Done.

@Marsella8 Marsella8 marked this pull request as ready for review July 1, 2024 23:03
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r4, 5 of 9 files at r5, all commit messages.
Reviewable status: 9 of 13 files reviewed, 6 unresolved discussions (waiting on @Marsella8 and @wmdi)


lib/compiler/include/compiler/cost_estimator.h line 12 at r5 (raw file):

using SubParallelComputationGraphView =
    OutputLabelledOpenMultiDiGraphView<Operator, ParallelTensor>;

Prefer dtgen-wrapped type (see current SubParallelComputationGraph, which honestly should probably be a View anyway so feel free to just change it to be a View)


lib/compiler/src/cost_estimator.cc line 29 at r5 (raw file):

}

struct TimedNode { // Node and associated finishing time

Prefer dtgen over visitable


lib/compiler/src/cost_estimator.cc line 68 at r5 (raw file):

    std::unordered_set<Node> copy(frontier);
    for (Node const &node : copy) {
      auto views = device_mapping.machine_views.at(node);

Prefer actual type over auto

Code quote:

auto

lib/compiler/src/cost_estimator.cc line 73 at r5 (raw file):

              devices.begin(), devices.end(), [&occupied](device_id_t d) {
                return occupied[d] == false;
              })) {

Prefer functions from containers.h

Code quote:

      if (std::all_of(
              devices.begin(), devices.end(), [&occupied](device_id_t d) {
                return occupied[d] == false;
              })) {

lib/compiler/src/cost_estimator.cc line 75 at r5 (raw file):

              })) {
        float cost = node_estimate_cost(node, g, estimator, device_mapping);
        processing.push({node, current_time + cost});

Pulling some of this out into separate lambdas might improve readability (say, a lambda named start_node_processing)


lib/pcg/src/device_id.cc line 19 at r5 (raw file):

// Most likely not the best way to do it.
device_id_t operator+(device_id_t device, size_t increment) {
  if (std::holds_alternative<gpu_id_t>(device)) {

Refactor to use dtgen (but otherwise this isn't bad, just replace the static_casts with actual field accesses)

Is this an operation we actually want to support for device_id_t, or should this just be done in the context of other functions?


lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):

Previously, Marsella8 wrote…

Done.

Not seeing it done? (though it doesn't necessarily need to be, just saying I don't see a change)

Pietro Max Marsella added 2 commits July 15, 2024 00:15
Copy link
Contributor Author

@Marsella8 Marsella8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, 2 of 8 files at r4, 9 of 9 files at r5.
Reviewable status: 11 of 14 files reviewed, 6 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/cost_estimator.h line 12 at r5 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer dtgen-wrapped type (see current SubParallelComputationGraph, which honestly should probably be a View anyway so feel free to just change it to be a View)

Tried to sync my branch so as to dtgen and wasn't able to get it working, best if i get this PR merged and then finish this off in a separate branch (e.g. the machine-related one)?


lib/compiler/src/cost_estimator.cc line 29 at r5 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer dtgen over visitable

See comment above

Also, we will always use dtgen going forward for all structs, including the ones that are only for private use / limited to a single module right?


lib/compiler/src/cost_estimator.cc line 68 at r5 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer actual type over auto

Done.


lib/compiler/src/cost_estimator.cc line 73 at r5 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer functions from containers.h

Done.


lib/compiler/src/cost_estimator.cc line 75 at r5 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Pulling some of this out into separate lambdas might improve readability (say, a lambda named start_node_processing)

Done, let me know if this is ok.

As a general question, when do we use lambdas inside the function vs just having a separate function do the same thing? Only when the function is only applicable tto the function we are trying to write?


lib/pcg/src/device_id.cc line 19 at r5 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Refactor to use dtgen (but otherwise this isn't bad, just replace the static_casts with actual field accesses)

Is this an operation we actually want to support for device_id_t, or should this just be done in the context of other functions?

Due to problems above with dtgen, will clean up everything into the machine-related branch.

I don't think we'll ever intend for the user to directly increment device_id_t types, but it is useful for the functions that are part of machine_views.cc and associated modules (e.g. if we want to iterate over the range of IDs to generate all possible IDs)


lib/utils/src/graph/algorithms.cc line 380 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Not seeing it done? (though it doesn't necessarily need to be, just saying I don't see a change)

Now it's changed.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 9 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @Marsella8)


lib/compiler/include/compiler/cost_estimator.h line 12 at r5 (raw file):

Previously, Marsella8 wrote…

Tried to sync my branch so as to dtgen and wasn't able to get it working, best if i get this PR merged and then finish this off in a separate branch (e.g. the machine-related one)?

You're going to need to merge with repo-refactor anyway, so if you can't get dtgen to work I'm guessing the branch won't be mergeable. That said, if you want I'm already making the change to this specific struct in a different PR, so you can skip this one (though it would still be good to update the other structs such as TimedNode in this PR)


lib/compiler/src/cost_estimator.cc line 29 at r5 (raw file):

Previously, Marsella8 wrote…

See comment above

Also, we will always use dtgen going forward for all structs, including the ones that are only for private use / limited to a single module right?

I'd say use it when you can--it only captures a limited subset of what C++ structs can do (e.g., no methods, etc.), but it also removes a lot of difficulty of making your structs support expected behaviors (printability, equality, hashing, etc.) yourself, so if you struct fits in its semantic model I'd say use it


lib/compiler/src/cost_estimator.cc line 75 at r5 (raw file):

Previously, Marsella8 wrote…

Done, let me know if this is ok.

As a general question, when do we use lambdas inside the function vs just having a separate function do the same thing? Only when the function is only applicable tto the function we are trying to write?

We use lambdas for functions that aren't reusable across the codebase. They're especially useful when you want to manipulate local variables, like occupied here, as the lambdas can manipulate them directly. When in doubt I'd say start with lambdas


lib/compiler/src/cost_estimator.cc line 41 at r6 (raw file):

};

bool predecessors_have_been_processed(

This should be a lambda


lib/compiler/src/cost_estimator.cc line 44 at r6 (raw file):

    std::unordered_set<Node> const &predecessors,
    std::unordered_set<TimedNode> processed) {
  std::unordered_set<Node> simple_processed =

processed_nodes?

Code quote:

 simple_processed

lib/compiler/src/cost_estimator.cc line 47 at r6 (raw file):

      transform(processed, [](TimedNode const &tn) { return tn.node; });

  return all_of(predecessors, [&simple_processed](Node p) {

Suggestion:

  return all_of(predecessors, [&](Node p) {

lib/compiler/src/cost_estimator.cc line 48 at r6 (raw file):

  return all_of(predecessors, [&simple_processed](Node p) {
    return simple_processed.find(p) != simple_processed.end();

contains from utils/containers.h


lib/compiler/src/cost_estimator.cc line 55 at r6 (raw file):

                                     MachineMapping const &device_mapping) {
  return device_mapping.machine_views.at(node).device_ids();
}

Should be moved to the corresponding header for MachineMapping

Code quote:

std::vector<device_id_t> get_devices(Node const &node,
                                     MachineMapping const &device_mapping) {
  return device_mapping.machine_views.at(node).device_ids();
}

lib/compiler/src/cost_estimator.cc line 106 at r6 (raw file):

      std::vector<device_id_t> devices = get_devices(node, device_mapping);
      if (all_of(devices,
                 [&occupied](device_id_t d) { return occupied[d] == false; })) {

Don't worry about explicitly managing lambda references unless there's a good reason (e.g., they're going to live outside of local scope)

Suggestion:

                 [&](device_id_t d) { return occupied[d] == false; })) {

lib/compiler/src/cost_estimator.cc line 117 at r6 (raw file):

      finish_node_processing(finished);

      // Adding candidates to the frontier

FYI these comments aren't really adding a huge amount here and can probably be removed. With the lambdas added the code is readable enough that I don't think they're contributing much over the code itself


lib/compiler/src/cost_estimator.cc line 121 at r6 (raw file):

        std::unordered_set<Node> predecessors = get_predecessors(g, successor);

        if (predecessors_have_been_processed(predecessors, processed)) {

Why not just have predecessors_have_been_processed call get_predecessors directly?

Code quote:

predecessors_have_been_processed(predecessors, processed)

lib/compiler/test/CMakeLists.txt line 13 at r6 (raw file):

    doctest
    utils-test-common
    pcg

Shouldn't be necessary, pcg is already pulled in by compiler (but this isn't a huge deal)


lib/pcg/src/device_id.cc line 19 at r5 (raw file):

Previously, Marsella8 wrote…

Due to problems above with dtgen, will clean up everything into the machine-related branch.

I don't think we'll ever intend for the user to directly increment device_id_t types, but it is useful for the functions that are part of machine_views.cc and associated modules (e.g. if we want to iterate over the range of IDs to generate all possible IDs)

In that case I'd just do the unwrapping manually in machine_views.cc, etc. Only expose an add operator if it's actually a valid operation to add device ids in general, which I'm not sure is the case (though I could be convinced of)


lib/pcg/src/device_id.cc line 17 at r6 (raw file):

}

// Most likely not the best way to do it.

?


lib/pcg/src/machine_view.cc line 30 at r6 (raw file):

    StridedRectangleSide side = this->rect.at(ff_dim_t{0});
    for (device_id_t id = this->start; id < this->start + side.get_num_points();
         id = id + 1) {

The stride of the side doesn't appear to be used?


lib/pcg/src/machine_view.cc line 30 at r6 (raw file):

    StridedRectangleSide side = this->rect.at(ff_dim_t{0});
    for (device_id_t id = this->start; id < this->start + side.get_num_points();
         id = id + 1) {

Suggestion:

      id++) {

lib/pcg/src/machine_view.cc line 33 at r6 (raw file):

      ids.push_back(id);
    }
    return ids;

Either move the declaration of ids in a scope or the return out a scope, having them in different scopes is weird and confusing


lib/pcg/src/strided_rectangle.cc line 21 at r6 (raw file):

  StridedRectangleSide side = this->sides.at(dim);
  return side;
}

Suggestion:

StridedRectangleSide StridedRectangle::at(ff_dim_t const &dim) const {
  return this->sides.at(dim);
}

@lockshaw lockshaw changed the title Adding cost_estimator implementation considering parallelization Adding task-based simulator for PCGs Sep 18, 2024
@lockshaw lockshaw changed the base branch from repo-refactor to master December 16, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants