-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding task-based simulator for PCGs #1365
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Stylistically it matches the existing code, which is much appreciated 🙂
Obviously tests need to get finished off and there are probably some minor edge cases to be handled, but as an initial draft this looks really good
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/src/machine_mapping.cc
line 123 at r1 (raw file):
std::unordered_set<UpwardOpenMultiDiEdge> incoming_edges = get_incoming_edges(g, node); std::vector<ParallelTensorShape> inputs = transform(
If the ordering here doesn't matter why is this a vector
?
lib/compiler/src/machine_mapping.cc
line 124 at r1 (raw file):
get_incoming_edges(g, node); std::vector<ParallelTensorShape> inputs = transform( as_vector(incoming_edges), [&](UpwardOpenMultiDiEdge const &input_edge) {
Suggestion:
as_vector(get_incoming_edges(g, node)), [&](UpwardOpenMultiDiEdge const &input_edge) {
lib/compiler/src/machine_mapping.cc
line 161 at r1 (raw file):
// Filling the frontier for (auto &[edge, _] : frontier_machine_views) {
Suggestion:
for (auto const &[edge, _] : frontier_machine_views) {
lib/compiler/src/machine_mapping.cc
line 161 at r1 (raw file):
// Filling the frontier for (auto &[edge, _] : frontier_machine_views) {
Maybe cleaner to just get the set of nodes with no incoming closed edges?
lib/compiler/src/machine_mapping.cc
line 176 at r1 (raw file):
if (std::all_of(devices.begin(), devices.end(), [&occupied](int d) { return occupied[d] == false })) {
See all_of
in containers.h
Code quote:
if (std::all_of(devices.begin(), devices.end(), [&occupied](int d) {
return occupied[d] == false
})) {
lib/compiler/test/src/test_parallel_cost_estimator.cpp
line 10 at r1 (raw file):
TEST_CASE("parallel_estimate_cost") { // Test graph structure // /-- 1 --\
Add a test case that starts with the simpler case of a straight line? Might be an easier starting point and I think should have a clean analytical answer you can check against
lib/compiler/test/src/test_parallel_cost_estimator.cpp
line 11 at r1 (raw file):
// Test graph structure // /-- 1 --\ // 0 - - 3
Suggestion:
// 0 - - 3
lib/utils/src/graph/algorithms.cc
line 380 at r1 (raw file):
std::unordered_map<Node, std::unordered_set<Node>> get_successors(DiGraphView const &g,
Alternatively, implement this as get_predecessors
on the flipped graph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 4 of 5 files reviewed, 8 unresolved discussions (waiting on @lockshaw and @Marsella8)
… and related objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 7 files reviewed, 8 unresolved discussions (waiting on @lockshaw)
lib/compiler/src/machine_mapping.cc
line 124 at r1 (raw file):
get_incoming_edges(g, node); std::vector<ParallelTensorShape> inputs = transform( as_vector(incoming_edges), [&](UpwardOpenMultiDiEdge const &input_edge) {
Done.
lib/compiler/src/machine_mapping.cc
line 161 at r1 (raw file):
// Filling the frontier for (auto &[edge, _] : frontier_machine_views) {
Done.
lib/compiler/src/machine_mapping.cc
line 161 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Maybe cleaner to just get the set of nodes with no incoming closed edges?
Done.
lib/compiler/src/machine_mapping.cc
line 176 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See
all_of
incontainers.h
Done.
lib/compiler/test/src/test_parallel_cost_estimator.cpp
line 10 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Add a test case that starts with the simpler case of a straight line? Might be an easier starting point and I think should have a clean analytical answer you can check against
Done.
lib/compiler/test/src/test_parallel_cost_estimator.cpp
line 11 at r1 (raw file):
// Test graph structure // /-- 1 --\ // 0 - - 3
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r3, 8 of 8 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @lockshaw and @Marsella8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 13 files reviewed, 8 unresolved discussions (waiting on @lockshaw, @Marsella8, and @wmdi)
lib/compiler/src/machine_mapping.cc
line 123 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
If the ordering here doesn't matter why is this a
vector
?
Done.
lib/utils/src/graph/algorithms.cc
line 380 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Alternatively, implement this as
get_predecessors
on the flipped graph
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r4, 5 of 9 files at r5, all commit messages.
Reviewable status: 9 of 13 files reviewed, 6 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/include/compiler/cost_estimator.h
line 12 at r5 (raw file):
using SubParallelComputationGraphView = OutputLabelledOpenMultiDiGraphView<Operator, ParallelTensor>;
Prefer dtgen-wrapped type (see current SubParallelComputationGraph
, which honestly should probably be a View
anyway so feel free to just change it to be a View
)
lib/compiler/src/cost_estimator.cc
line 29 at r5 (raw file):
} struct TimedNode { // Node and associated finishing time
Prefer dtgen
over visitable
lib/compiler/src/cost_estimator.cc
line 68 at r5 (raw file):
std::unordered_set<Node> copy(frontier); for (Node const &node : copy) { auto views = device_mapping.machine_views.at(node);
Prefer actual type over auto
Code quote:
auto
lib/compiler/src/cost_estimator.cc
line 73 at r5 (raw file):
devices.begin(), devices.end(), [&occupied](device_id_t d) { return occupied[d] == false; })) {
Prefer functions from containers.h
Code quote:
if (std::all_of(
devices.begin(), devices.end(), [&occupied](device_id_t d) {
return occupied[d] == false;
})) {
lib/compiler/src/cost_estimator.cc
line 75 at r5 (raw file):
})) { float cost = node_estimate_cost(node, g, estimator, device_mapping); processing.push({node, current_time + cost});
Pulling some of this out into separate lambdas might improve readability (say, a lambda named start_node_processing
)
lib/pcg/src/device_id.cc
line 19 at r5 (raw file):
// Most likely not the best way to do it. device_id_t operator+(device_id_t device, size_t increment) { if (std::holds_alternative<gpu_id_t>(device)) {
Refactor to use dtgen
(but otherwise this isn't bad, just replace the static_cast
s with actual field accesses)
Is this an operation we actually want to support for device_id_t
, or should this just be done in the context of other functions?
lib/utils/src/graph/algorithms.cc
line 380 at r1 (raw file):
Previously, Marsella8 wrote…
Done.
Not seeing it done? (though it doesn't necessarily need to be, just saying I don't see a change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r1, 2 of 8 files at r4, 9 of 9 files at r5.
Reviewable status: 11 of 14 files reviewed, 6 unresolved discussions (waiting on @lockshaw)
lib/compiler/include/compiler/cost_estimator.h
line 12 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer dtgen-wrapped type (see current
SubParallelComputationGraph
, which honestly should probably be aView
anyway so feel free to just change it to be aView
)
Tried to sync my branch so as to dtgen
and wasn't able to get it working, best if i get this PR merged and then finish this off in a separate branch (e.g. the machine-related
one)?
lib/compiler/src/cost_estimator.cc
line 29 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
dtgen
overvisitable
See comment above
Also, we will always use dtgen
going forward for all structs, including the ones that are only for private use / limited to a single module right?
lib/compiler/src/cost_estimator.cc
line 68 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer actual type over
auto
Done.
lib/compiler/src/cost_estimator.cc
line 73 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer functions from
containers.h
Done.
lib/compiler/src/cost_estimator.cc
line 75 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Pulling some of this out into separate lambdas might improve readability (say, a lambda named
start_node_processing
)
Done, let me know if this is ok.
As a general question, when do we use lambdas inside the function vs just having a separate function do the same thing? Only when the function is only applicable tto the function we are trying to write?
lib/pcg/src/device_id.cc
line 19 at r5 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Refactor to use
dtgen
(but otherwise this isn't bad, just replace thestatic_cast
s with actual field accesses)Is this an operation we actually want to support for
device_id_t
, or should this just be done in the context of other functions?
Due to problems above with dtgen
, will clean up everything into the machine-related
branch.
I don't think we'll ever intend for the user to directly increment device_id_t
types, but it is useful for the functions that are part of machine_views.cc
and associated modules (e.g. if we want to iterate over the range of IDs to generate all possible IDs)
lib/utils/src/graph/algorithms.cc
line 380 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Not seeing it done? (though it doesn't necessarily need to be, just saying I don't see a change)
Now it's changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 9 files at r5, 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @Marsella8)
lib/compiler/include/compiler/cost_estimator.h
line 12 at r5 (raw file):
Previously, Marsella8 wrote…
Tried to sync my branch so as to
dtgen
and wasn't able to get it working, best if i get this PR merged and then finish this off in a separate branch (e.g. themachine-related
one)?
You're going to need to merge with repo-refactor
anyway, so if you can't get dtgen
to work I'm guessing the branch won't be mergeable. That said, if you want I'm already making the change to this specific struct in a different PR, so you can skip this one (though it would still be good to update the other structs such as TimedNode
in this PR)
lib/compiler/src/cost_estimator.cc
line 29 at r5 (raw file):
Previously, Marsella8 wrote…
See comment above
Also, we will always use
dtgen
going forward for all structs, including the ones that are only for private use / limited to a single module right?
I'd say use it when you can--it only captures a limited subset of what C++ structs can do (e.g., no methods, etc.), but it also removes a lot of difficulty of making your structs support expected behaviors (printability, equality, hashing, etc.) yourself, so if you struct fits in its semantic model I'd say use it
lib/compiler/src/cost_estimator.cc
line 75 at r5 (raw file):
Previously, Marsella8 wrote…
Done, let me know if this is ok.
As a general question, when do we use lambdas inside the function vs just having a separate function do the same thing? Only when the function is only applicable tto the function we are trying to write?
We use lambdas for functions that aren't reusable across the codebase. They're especially useful when you want to manipulate local variables, like occupied
here, as the lambdas can manipulate them directly. When in doubt I'd say start with lambdas
lib/compiler/src/cost_estimator.cc
line 41 at r6 (raw file):
}; bool predecessors_have_been_processed(
This should be a lambda
lib/compiler/src/cost_estimator.cc
line 44 at r6 (raw file):
std::unordered_set<Node> const &predecessors, std::unordered_set<TimedNode> processed) { std::unordered_set<Node> simple_processed =
processed_nodes
?
Code quote:
simple_processed
lib/compiler/src/cost_estimator.cc
line 47 at r6 (raw file):
transform(processed, [](TimedNode const &tn) { return tn.node; }); return all_of(predecessors, [&simple_processed](Node p) {
Suggestion:
return all_of(predecessors, [&](Node p) {
lib/compiler/src/cost_estimator.cc
line 48 at r6 (raw file):
return all_of(predecessors, [&simple_processed](Node p) { return simple_processed.find(p) != simple_processed.end();
contains
from utils/containers.h
lib/compiler/src/cost_estimator.cc
line 55 at r6 (raw file):
MachineMapping const &device_mapping) { return device_mapping.machine_views.at(node).device_ids(); }
Should be moved to the corresponding header for MachineMapping
Code quote:
std::vector<device_id_t> get_devices(Node const &node,
MachineMapping const &device_mapping) {
return device_mapping.machine_views.at(node).device_ids();
}
lib/compiler/src/cost_estimator.cc
line 106 at r6 (raw file):
std::vector<device_id_t> devices = get_devices(node, device_mapping); if (all_of(devices, [&occupied](device_id_t d) { return occupied[d] == false; })) {
Don't worry about explicitly managing lambda references unless there's a good reason (e.g., they're going to live outside of local scope)
Suggestion:
[&](device_id_t d) { return occupied[d] == false; })) {
lib/compiler/src/cost_estimator.cc
line 117 at r6 (raw file):
finish_node_processing(finished); // Adding candidates to the frontier
FYI these comments aren't really adding a huge amount here and can probably be removed. With the lambdas added the code is readable enough that I don't think they're contributing much over the code itself
lib/compiler/src/cost_estimator.cc
line 121 at r6 (raw file):
std::unordered_set<Node> predecessors = get_predecessors(g, successor); if (predecessors_have_been_processed(predecessors, processed)) {
Why not just have predecessors_have_been_processed
call get_predecessors
directly?
Code quote:
predecessors_have_been_processed(predecessors, processed)
lib/compiler/test/CMakeLists.txt
line 13 at r6 (raw file):
doctest utils-test-common pcg
Shouldn't be necessary, pcg
is already pulled in by compiler
(but this isn't a huge deal)
lib/pcg/src/device_id.cc
line 19 at r5 (raw file):
Previously, Marsella8 wrote…
Due to problems above with
dtgen
, will clean up everything into themachine-related
branch.I don't think we'll ever intend for the user to directly increment
device_id_t
types, but it is useful for the functions that are part ofmachine_views.cc
and associated modules (e.g. if we want to iterate over the range of IDs to generate all possible IDs)
In that case I'd just do the unwrapping manually in machine_views.cc
, etc. Only expose an add operator if it's actually a valid operation to add device ids in general, which I'm not sure is the case (though I could be convinced of)
lib/pcg/src/device_id.cc
line 17 at r6 (raw file):
} // Most likely not the best way to do it.
?
lib/pcg/src/machine_view.cc
line 30 at r6 (raw file):
StridedRectangleSide side = this->rect.at(ff_dim_t{0}); for (device_id_t id = this->start; id < this->start + side.get_num_points(); id = id + 1) {
The stride
of the side
doesn't appear to be used?
lib/pcg/src/machine_view.cc
line 30 at r6 (raw file):
StridedRectangleSide side = this->rect.at(ff_dim_t{0}); for (device_id_t id = this->start; id < this->start + side.get_num_points(); id = id + 1) {
Suggestion:
id++) {
lib/pcg/src/machine_view.cc
line 33 at r6 (raw file):
ids.push_back(id); } return ids;
Either move the declaration of ids
in a scope or the return out a scope, having them in different scopes is weird and confusing
lib/pcg/src/strided_rectangle.cc
line 21 at r6 (raw file):
StridedRectangleSide side = this->sides.at(dim); return side; }
Suggestion:
StridedRectangleSide StridedRectangle::at(ff_dim_t const &dim) const {
return this->sides.at(dim);
}
Description of changes:
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is