-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New MachineView
representation
#1458
New MachineView
representation
#1458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 35 of 41 files at r1, all commit messages.
Reviewable status: 35 of 41 files reviewed, 14 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/pcg/include/pcg/device_coordinates.struct.toml
line 1 at r1 (raw file):
# DeviceCoordinates is exclusive to machine_view.cc, must not be used outside of it.
I don't think this requirement is necessary anymore with the new design?
lib/pcg/include/pcg/strided_rectangle.struct.toml
line 17 at r1 (raw file):
"utils/hash/unordered_multiset.h", "utils/fmt/unordered_multiset.h", ]
utils/hash
and utils/fmt
are only needed for the implementation, not the declaration
Suggestion:
includes = [
"pcg/strided_rectangle_side.dtg.h",
"<unordered_set>",
]
src_includes = [
"utils/hash/unordered_multiset.h",
"utils/fmt/unordered_multiset.h",
]
lib/pcg/src/pcg/device_id.cc
line 29 at r1 (raw file):
int get_raw_id(device_id_t device_id) { if (get_device_type(device_id) == DeviceType::GPU) {
switch
might be cleaner? You can also use overload
if you'd prefer
Code quote:
if (get_device_type(device_id) == DeviceType::GPU) {
lib/pcg/src/pcg/device_id.cc
line 35 at r1 (raw file):
} else { assert(false && "Unsupported DeviceType"); return -1;
assert
s will get removed in non-debug builds
Suggestion:
throw mk_runtime_error(fmt::format("Unsupported DeviceType {} for device_id_t {}", get_device_type(device_id), device_id));
lib/pcg/test/src/pcg/machine_view.cc
line 66 at r1 (raw file):
CHECK(expected == result); } SUBCASE("get_last_device_id") {
Test functions separately (i.e., group by get_last_device_id
rather than by MachineView
tested on) except for particularly trivial utility functions
lib/pcg/test/src/pcg/machine_view.cc
line 70 at r1 (raw file):
} } SUBCASE("MachineView #2") {
Use better test names
lib/pcg/test/src/pcg/machine_view.cc
line 93 at r1 (raw file):
} TEST_CASE("MachineView make_1d_machine_view - GPU") {
The return type here isn't really necessary and is a bit confusing
Suggestion:
TEST_CASE("make_1d_machine_view - GPU") {
lib/utils/include/utils/containers/foldl.h
line 27 at r1 (raw file):
C remaining(it, c.end()); return foldl(remaining, init, func); }
Suggestion:
template <typename C, typename F, typename E = typename C::value_type>
auto foldl1(C const &c, F func) -> typename C::value_type {
if (c.empty()) {
throw mk_runtime_error(fmt::format("foldl1 received empty container: {}", c));
}
std::optional<E> result;
for (E const &e : c) {
if (!result.has_value()) {
result = e;
} else {
result = f(result.value(), e);
}
}
return result.value();
}
lib/utils/include/utils/containers/range.h
line 11 at r1 (raw file):
template <typename T> std::vector<T> range(T start, T end, T step = 1) {
C++ doesn't do generics super well so I'd lean toward concrete types unless there's a strong need for something more flexible
Suggestion:
std::vector<int> range(int start, int end, int step = 1) {
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
template <typename T> std::unordered_multiset<T> replicate(std::size_t n, T const &element) {
Prefer int
--size_t
intuitively seems like it should be good (it represents non-negative values!) but actually what it does is just remove the failure case by overflowing so you can't even assert that the value is okay. At some point we may add a non-negative integer type with the proper semantics, but we don't have one yet
Code quote:
std::size_t
lib/utils/include/utils/containers/scanl.h
line 9 at r1 (raw file):
template <typename C, typename F, typename T> std::vector<T> scanl(C const &c, T init, F const &op) {
Can you add a docstring with a reference to https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelude.html#v:scanl? Not everyone on the FlexFlow team does a bunch of Haskell 🙂
lib/utils/include/utils/containers/scanl.h
line 24 at r1 (raw file):
template <typename C, typename F> auto scanl1(C const &c, F op) { using T = typename C::value_type;
See implementation of fold1
for a more readable version
lib/utils/test/src/utils/containers/foldl.cc
line 9 at r1 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("foldl") {
I can see you're enjoying Haskell 😛
lib/utils/test/src/utils/containers/range.cc
line 45 at r1 (raw file):
std::vector<int> correct = {}; CHECK(result == correct); }
What's the expected behavior of range(0, 10, -1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 24 of 46 files reviewed, 14 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/pcg/include/pcg/device_coordinates.struct.toml
line 1 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think this requirement is necessary anymore with the new design?
Not a requirement anymore, but I think the user will generally reason about the position of the devices using device_id_t
, so it probably won't be used.
lib/pcg/include/pcg/strided_rectangle.struct.toml
line 17 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
utils/hash
andutils/fmt
are only needed for the implementation, not the declaration
Done.
lib/pcg/src/pcg/device_id.cc
line 29 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
switch
might be cleaner? You can also useoverload
if you'd prefer
Done.
lib/pcg/src/pcg/device_id.cc
line 35 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
assert
s will get removed in non-debug builds
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 66 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Test functions separately (i.e., group by
get_last_device_id
rather than byMachineView
tested on) except for particularly trivial utility functions
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 70 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Use better test names
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 93 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
The return type here isn't really necessary and is a bit confusing
Done.
lib/utils/include/utils/containers/foldl.h
line 27 at r1 (raw file):
C remaining(it, c.end()); return foldl(remaining, init, func); }
cool pattern !
lib/utils/include/utils/containers/range.h
line 11 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
C++ doesn't do generics super well so I'd lean toward concrete types unless there's a strong need for something more flexible
Done.
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Prefer
int
--size_t
intuitively seems like it should be good (it represents non-negative values!) but actually what it does is just remove the failure case by overflowing so you can't even assert that the value is okay. At some point we may add a non-negative integer type with the proper semantics, but we don't have one yet
Ahh makes sense thanks.
lib/utils/include/utils/containers/scanl.h
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Can you add a docstring with a reference to https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelude.html#v:scanl? Not everyone on the FlexFlow team does a bunch of Haskell 🙂
Done.
lib/utils/include/utils/containers/scanl.h
line 24 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See implementation of
fold1
for a more readable version
Done.
lib/utils/test/src/utils/containers/foldl.cc
line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I can see you're enjoying Haskell 😛
hahaha yes, I'm starting to see why you're so frustrated with C++. Functional code is so clean, wish the C++ syntax was a bit more accomodating...
lib/utils/test/src/utils/containers/range.cc
line 45 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's the expected behavior of
range(0, 10, -1)
?
empty range (this range is modelled after python's range
s, so I'd like to keep it consistent, but we can also change it to throw an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 41 files at r1, 19 of 24 files at r2, all commit messages.
Reviewable status: 44 of 49 files reviewed, 32 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/include/compiler/machine_mapping.h
line 63 at r2 (raw file):
std::unordered_set<StartInvariantMachineView> get_allowed_start_invariant_machine_views( MachineSpecification const &machinespec,
Suggestion:
MachineSpecification const &machine_spec,
lib/compiler/src/machine_mapping.cc
line 378 at r2 (raw file):
} std::vector<int> get_tensor_parallel_degrees(ParallelTensorShape const &shape) {
Should not return a vector
as that imposes an order on ParallelTensorShape
's dims, should return either an unordered_set
or a custom dtgen struct
lib/compiler/src/machine_mapping.cc
line 385 at r2 (raw file):
} bool is_valid_machine_view(MachineView const &mv,
A blank line here or there would also help readability--you can use them to break up flows of steps into logical pieces
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
tensor_degrees = filter(tensor_degrees, [](int degree) { return degree != 1; }); return without_order(mv_degrees) == without_order(tensor_degrees);
This returns a multiset
, right?
Code quote:
without_order(mv_degrees)
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
ParallelTensorShape const &shape) { auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {
Suggestion:
std::vector<int> const &tensor_dims
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
ParallelTensorShape const &shape) { auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {
I think a return type annotation here would really improve readability
Code quote:
[](std::vector<int> tensor_dims, int total_devices) {
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
int max_stride_upper_bound = (total_devices + 1) / product(transform(tensor_dims, [](int degree) { return degree - 1; }));
This is a bit complicated, I feel like a one-line comment would help me figure out what this dense piece of arithmetic is actually doing (or break up the nested expressions and give them each names. While composition of a bunch of functions can be concise, having intermediate variables can really help make code more readable as you can describe each intermediate value/step along the way. Conciseness is good when it benefits readability, but sometimes it gets in the way more than it helps)
Code quote:
int max_stride_upper_bound =
(total_devices + 1) /
product(transform(tensor_dims, [](int degree) { return degree - 1; }));
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
(total_devices + 1) / product(transform(tensor_dims, [](int degree) { return degree - 1; })); std::unordered_multiset<std::vector<int>> strides = cartesian_product(
Should this be a stride_size_t
?
Code quote:
int
lib/compiler/src/machine_mapping.cc
line 410 at r2 (raw file):
}; std::vector<int> tensor_dims = filter(get_tensor_parallel_degrees(shape),
Seems like a common pattern, probably good to pull out a separate function?
lib/compiler/src/machine_mapping.cc
line 413 at r2 (raw file):
[](int degree) { return degree != 1; }); std::unordered_set<MachineView> machine_views; int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;
Why not just have a get_num_devices
or something function for MachineSpecification
?
Code quote:
int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node;
lib/compiler/src/machine_mapping.cc
line 414 at r2 (raw file):
std::unordered_set<MachineView> machine_views; int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node; for (std::vector<int> stride :
Suggestion:
for (std::vector<int> const &stride :
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
for (std::vector<int> stride : candidate_strides(tensor_dims, total_devices)) { for (int start_id = 0; start_id < total_devices; start_id++) {
Seems like this loop is intuitively creating StartInvariantMachineView
s and then looping over possible start degrees? If so, could this be written to do that? Using more descriptive data structures instead of things like std::vector<int>
or std::vector<std::unordered_set<int>>
is super helpful for readability. This would also let you break up this function into more intuitive pieces (generating all start-invariant machine views, and then generating all possible start dims for them)
lib/compiler/src/machine_mapping.cc
line 437 at r2 (raw file):
get_candidate_machine_views(machinespec, shape); views = filter(views, [&](MachineView const &view) { return is_valid_machine_view(view, shape);
Suggestion:
return is_valid_machine_view(view, shape) && is_valid_machine_view(view, machinespec);
lib/compiler/test/src/machine_mapping.cc
line 9 at r2 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("get_allowed_machine_view") {
Suggestion:
TEST_CASE("get_allowed_machine_views") {
lib/compiler/test/src/machine_mapping.cc
line 12 at r2 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification(5, 1, 1, 0, 0);
Suggestion:
MachineSpecification ms = MachineSpecification{5, 1, 1, 0, 0};
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
[[fields]] name = "coords" type = "::FlexFlow::FFOrdered<int>"
I don't think FFOrdered
should be used here, since there's no correlation between MachineView
dim ordering and tensor dimension ordering. FFOrdered
represents things that are in the same order as the dims in a TensorShape
(or the shard dims in a ParallelTensorShape
)
lib/pcg/include/pcg/machine_view.h
line 20 at r2 (raw file):
size_t num_dims(MachineView const &mv); size_t num_devices(MachineView const &mv); size_t get_size(MachineView const &mv);
What is get_size
here? Name is a bit unclear, might be good to either clarify the name or just add a quick doxygen docstring explaining what this does (I think I know what it is and I'm not sure I can think of a better name, so that would imply a doxygen docstring. That said, I'm not sure this function is actually useful anywhere?)
lib/pcg/include/pcg/start_invariant_machine_view.h
line 9 at r2 (raw file):
namespace FlexFlow { MachineView to_start_dependent(StartInvariantMachineView const &mv,
Suggestion:
MachineView machine_view_from_start_invariant(StartInvariantMachineView const &mv,
lib/pcg/include/pcg/start_invariant_machine_view.h
line 11 at r2 (raw file):
MachineView to_start_dependent(StartInvariantMachineView const &mv, device_id_t const &start_id); StartInvariantMachineView to_start_invariant(MachineView const &mv);
Suggestion:
StartInvariantMachineView start_invariant_from_machine_view(MachineView const &mv);
lib/pcg/include/pcg/strided_rectangle.h
line 15 at r2 (raw file):
private: std::vector<StridedRectangleSide> _sides;
I don't remember if there are reserved id issues with underscore prefixes, but convention-wise we don't usually use them
Suggestion:
std::vector<StridedRectangleSide> sides;
lib/pcg/include/pcg/strided_rectangle.h
line 28 at r2 (raw file):
bool operator>=(StridedRectangle const &) const; std::vector<StridedRectangleSide> get_sides() const;
For pure getters (things where the result is actually a field) it's safe to return a const &
. That said, if you're ever in doubt just return a value type, as that's always safe 🙂
Suggestion:
std::vector<StridedRectangleSide> const &get_sides() const;
lib/pcg/include/pcg/strided_rectangle.h
line 29 at r2 (raw file):
std::vector<StridedRectangleSide> get_sides() const; };
Suggestion:
private:
std::tuple<decltype(sides) const &> tie() const;
friend struct std::hash<StridedRectangle>;
};
lib/pcg/src/pcg/strided_rectangle.cc
line 21 at r2 (raw file):
bool StridedRectangle::operator==(StridedRectangle const &other) const { return std::tie(this->_sides) == std::tie(other._sides);
Suggestion:
return this->tie() == other.tie();
lib/pcg/src/pcg/strided_rectangle.cc
line 85 at r2 (raw file):
result ^= std::hash<std::vector<::FlexFlow::StridedRectangleSide>>{}( x.get_sides()) + 0x9e3779b9 + (result << 6) + (result >> 2);
FYI this is actually a function in hash-utils
called hash_combine
, we just inline it in dtgen
(for not really any good reason, we just do at the moment)
lib/pcg/src/pcg/strided_rectangle.cc
line 86 at r2 (raw file):
x.get_sides()) + 0x9e3779b9 + (result << 6) + (result >> 2); return result;
Suggestion:
return get_std_hash(x.tie());
lib/utils/include/utils/containers/cartesian_product.h
line 13 at r2 (raw file):
template <typename Container> auto cartesian_product(Container const &containers) {
Seems like this is always going to be called over a std::vector<SomeContainer>
, so maybe do that specialization to simplify this code a bit?
Code quote:
auto cartesian_product(Container const &containers) {
lib/utils/include/utils/containers/cartesian_product.h
line 26 at r2 (raw file):
} for (const auto &item : ordered[depth]) {
Suggestion:
ordered.at(depth)
lib/utils/include/utils/containers/cartesian_product.h
line 28 at r2 (raw file):
for (const auto &item : ordered[depth]) { current.push_back(item); recurse(current, depth + 1);
Would this be simpler as a non-recursive loop? Not a big issue as this is pretty readable, just wondering
lib/utils/src/utils/containers/any_of.h
line 0 at r2 (raw file):
Rename to any_of.cc
?
lib/utils/test/src/utils/containers/cartesian_product.cc
line 58 at r2 (raw file):
{1, 2}, {1, 3}, {1, 3}, {1, 2}}; CHECK(result == correct); }
What if one container is empty and the others aren't? Might be a good test case to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 63 files reviewed, 32 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/include/compiler/machine_mapping.h
line 63 at r2 (raw file):
std::unordered_set<StartInvariantMachineView> get_allowed_start_invariant_machine_views( MachineSpecification const &machinespec,
Done.
lib/compiler/src/machine_mapping.cc
line 378 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Should not return a
vector
as that imposes an order onParallelTensorShape
's dims, should return either anunordered_set
or a custom dtgen struct
changed to unordered_multiset
(but have to sort the multiset in get_candidate_machine_views
, since I need the ordering to be coherent between stride and dimension).
lib/compiler/src/machine_mapping.cc
line 385 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
A blank line here or there would also help readability--you can use them to break up flows of steps into logical pieces
Done.
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This returns a
multiset
, right?
yes, since different dimensions might have the same degree (we want to see if there exists a mapping between the 2 (multi)sets of dimensions, so order does not matter)
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
ParallelTensorShape const &shape) { auto candidate_strides = [](std::vector<int> tensor_dims, int total_devices) {
Done.
lib/compiler/src/machine_mapping.cc
line 401 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I think a return type annotation here would really improve readability
Done.
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This is a bit complicated, I feel like a one-line comment would help me figure out what this dense piece of arithmetic is actually doing (or break up the nested expressions and give them each names. While composition of a bunch of functions can be concise, having intermediate variables can really help make code more readable as you can describe each intermediate value/step along the way. Conciseness is good when it benefits readability, but sometimes it gets in the way more than it helps)
I've simplified, but it's still a bit tricky to explain:
Naively, we could think that, given a (2,3) stride for example, it would result in 3*2=6 as many tiles occupied with respect to having all strides be 1, so we could say max_stride_product = num_total_devices/num_devices_used_by_tensor
(where num_devices_used_by_tensor
is the product of the parallel dims) and so we could simply say that the max stride across any dimension is max_stride_product
, so we do the cartesian product and then filter downstream.
This however, doesn't quite work: consider, for example, a 2D MachineView with 2x2 points, and stride 2 across each dimension, and suppose there are 9 total devices. While the "volume" of the MachineView is technically 4x4, it can really fit into a 3x3 and thus we could fit it with the existing devices (since the outer volume can be cut without messing with any of the devices). To address this, I find not the number of total devices used by the tensor, but, the total number of "inner" devices, essentially the ones such that they have associated with them a full stride "volume". So I find the max stride for these using the naive procedure (which works since they all have full stride volume) and I know that if a given stride is too large for them then surely it's too large for the full set of devices, which essentially contains them. (Note that this solutions is incredibly underoptimized, if the generation turns out to be to slow I'll fix it).
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Should this be a
stride_size_t
?
fixed. is there a way to cast a container containing type X to the same container but with type Y (so not using transform and casting within the lambda)?
lib/compiler/src/machine_mapping.cc
line 410 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems like a common pattern, probably good to pull out a separate function?
Done.
lib/compiler/src/machine_mapping.cc
line 413 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why not just have a
get_num_devices
or something function forMachineSpecification
?
Done.
lib/compiler/src/machine_mapping.cc
line 414 at r2 (raw file):
std::unordered_set<MachineView> machine_views; int total_devices = machinespec.num_nodes * machinespec.num_gpus_per_node; for (std::vector<int> stride :
Done.
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems like this loop is intuitively creating
StartInvariantMachineView
s and then looping over possible start degrees? If so, could this be written to do that? Using more descriptive data structures instead of things likestd::vector<int>
orstd::vector<std::unordered_set<int>>
is super helpful for readability. This would also let you break up this function into more intuitive pieces (generating all start-invariant machine views, and then generating all possible start dims for them)
fixed it up a bit, is this enough? I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase. I could do something like using StrideVec = std::vector<stride_t>
to make it more readable, though I'm not sure it would make it more understandable.
lib/compiler/src/machine_mapping.cc
line 437 at r2 (raw file):
get_candidate_machine_views(machinespec, shape); views = filter(views, [&](MachineView const &view) { return is_valid_machine_view(view, shape);
Done.
lib/compiler/test/src/machine_mapping.cc
line 9 at r2 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("get_allowed_machine_view") {
Done.
lib/compiler/test/src/machine_mapping.cc
line 12 at r2 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification(5, 1, 1, 0, 0);
Done.
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think
FFOrdered
should be used here, since there's no correlation betweenMachineView
dim ordering and tensor dimension ordering.FFOrdered
represents things that are in the same order as the dims in aTensorShape
(or the shard dims in aParallelTensorShape
)
Got it, so FFOrdered
is only for Tensor dims?
lib/pcg/include/pcg/machine_view.h
line 20 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What is
get_size
here? Name is a bit unclear, might be good to either clarify the name or just add a quick doxygen docstring explaining what this does (I think I know what it is and I'm not sure I can think of a better name, so that would imply a doxygen docstring. That said, I'm not sure this function is actually useful anywhere?)
Obsolete and removed (originally it was the "volume", so product of all the sizes of the sides, which I thought I could use for doing bounds check, but it's now useless since the devices themselves can occupy a space smaller than that (e.g. imagine a 2D machine view with stride 2 and 2 devices per side, it technically occupies a 4x4 area but can also really fit into a 3x3)
lib/pcg/include/pcg/start_invariant_machine_view.h
line 9 at r2 (raw file):
namespace FlexFlow { MachineView to_start_dependent(StartInvariantMachineView const &mv,
Done.
lib/pcg/include/pcg/start_invariant_machine_view.h
line 11 at r2 (raw file):
MachineView to_start_dependent(StartInvariantMachineView const &mv, device_id_t const &start_id); StartInvariantMachineView to_start_invariant(MachineView const &mv);
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 15 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't remember if there are reserved id issues with underscore prefixes, but convention-wise we don't usually use them
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 28 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For pure getters (things where the result is actually a field) it's safe to return a
const &
. That said, if you're ever in doubt just return a value type, as that's always safe 🙂
Done.
lib/pcg/include/pcg/strided_rectangle.h
line 29 at r2 (raw file):
std::vector<StridedRectangleSide> get_sides() const; };
Done.
lib/pcg/src/pcg/strided_rectangle.cc
line 21 at r2 (raw file):
bool StridedRectangle::operator==(StridedRectangle const &other) const { return std::tie(this->_sides) == std::tie(other._sides);
Done.
lib/pcg/src/pcg/strided_rectangle.cc
line 86 at r2 (raw file):
x.get_sides()) + 0x9e3779b9 + (result << 6) + (result >> 2); return result;
Done.
lib/utils/include/utils/containers/cartesian_product.h
line 13 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems like this is always going to be called over a
std::vector<SomeContainer>
, so maybe do that specialization to simplify this code a bit?
Yes you're right, fixed (not it's vector<Containers> -> unordered_multiset<vector<...>>
lib/utils/include/utils/containers/cartesian_product.h
line 26 at r2 (raw file):
} for (const auto &item : ordered[depth]) {
Done.
lib/utils/include/utils/containers/cartesian_product.h
line 28 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Would this be simpler as a non-recursive loop? Not a big issue as this is pretty readable, just wondering
I like the recursive version and found it simpler to implement.
lib/utils/test/src/utils/containers/cartesian_product.cc
line 58 at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What if one container is empty and the others aren't? Might be a good test case to add
Done.
lib/utils/src/utils/containers/any_of.h
line at r2 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Rename to
any_of.cc
?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 30 of 66 files reviewed, 33 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/src/compiler/allowed_machine_views.cc
line 124 at r3 (raw file):
start_invariant_from_machine_view); }
Unsure where to put these functions, will we be using parallel_tensor_dim_idx
as the main way of indexing the ParallelTensorShape going forwards or is it just for doing the machine view to tensor mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 36 of 36 files at r3, all commit messages.
Reviewable status: all files reviewed, 33 unresolved discussions (waiting on @lockshaw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 36 of 36 files at r3, all commit messages.
Reviewable status: all files reviewed, 50 unresolved discussions (waiting on @Marsella8)
lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml
line 2 at r3 (raw file):
namespace = "FlexFlow" name = "TensorToMachineViewInjection"
Why is this not a surjection? Seems like all of the MachineView
dims need to be mapped to?
lib/compiler/include/compiler/tensor_to_machine_view_injection.struct.toml
line 14 at r3 (raw file):
"utils/bidict/bidict.h", "utils/hash/unordered_map.h" ]
Suggestion:
includes = [
"pcg/machine_view_dim_idx.dtg.h",
"op-attrs/parallel_tensor_dim_idx.dtg.h",
"utils/bidict/bidict.h",
]
lib/compiler/src/compiler/allowed_machine_views.cc
line 36 at r3 (raw file):
} bool is_valid_machine_view(MachineView const &mv,
I don't see this function declared in allowed_machine_view.h
?
lib/compiler/src/compiler/allowed_machine_views.cc
line 124 at r3 (raw file):
Previously, Marsella8 wrote…
Unsure where to put these functions, will we be using
parallel_tensor_dim_idx
as the main way of indexing the ParallelTensorShape going forwards or is it just for doing the machine view to tensor mapping?
For now let's put get_parallel_dim_at_idx
and get_parallel_tensor_indices
into parallel_tensor_dim_idx.h
lib/compiler/src/compiler/allowed_machine_views.cc
line 152 at r3 (raw file):
std::unordered_set<machine_view_dim_idx> get_machine_view_indices(MachineView const &mv) {
Move to machine_view_dim_idx.h
lib/compiler/src/compiler/allowed_machine_views.cc
line 157 at r3 (raw file):
} bool is_valid_injection(TensorToMachineViewInjection const &injection,
Probably better to move this and the functions below to an tensor_to_machine_view_injection.h
file
lib/compiler/src/compiler/allowed_machine_views.cc
line 175 at r3 (raw file):
std::unordered_set<parallel_tensor_dim_idx> shape_indices = get_parallel_tensor_indices(shape); shape_indices = filter(shape_indices, [&](auto const idx) {
Suggestion:
(parallel_tensor_dim_idx const &idx)
lib/compiler/src/compiler/allowed_machine_views.cc
line 183 at r3 (raw file):
permutations(shape_indices)) { TensorToMachineViewInjection injection = TensorToMachineViewInjection(bidict(zip(sorted(mv_indices), p)));
Suggestion:
std::vector<machine_view_dim_idx> machine_view_dim_ordering = sorted(mv_indices);
std::unordered_set<TensorToMachineViewInjection> result;
for (std::vector<parallel_tensor_dim_idx> const &tensor_dim_ordering :
permutations(shape_indices)) {
TensorToMachineViewInjection injection =
TensorToMachineViewInjection(bidict(zip(machine_view_dim_ordering, tensor_dim_ordering)));
lib/compiler/test/src/allowed_machine_ views.cc
line 62 at r3 (raw file):
views.insert(mv); } return views;
Suggestion:
return unordered_set_of(transform(count(num_starts),
[&](int start) {
return MachineView{
device_id_t{gpu_id_t{start}},
StridedRectangle{{
StridedRectangleSide{num_points_t{2}, stride_t{stride1}},
StridedRectangleSide{num_points_t{3}, stride_t{stride2}}
}},
};
}));
lib/compiler/test/src/allowed_machine_ views.cc
line 64 at r3 (raw file):
return views; }; std::unordered_set<MachineView> correct;
Is it possible to instead consider a smaller MachineSpecification
where the set of MachineView
s is small enough that you don't need all this infrastructure to generate it? This testcase is a bit complicated and might be rather annoying to debug
lib/compiler/test/src/allowed_machine_ views.cc
line 136 at r3 (raw file):
make_2d_view(/*stride1*/ 3, /*stride2*/ 1), make_2d_view(/*stride1*/ 1, /*stride2*/ 3), make_2d_view(/*stride1*/ 1, /*stride2*/ 4)};
Suggestion:
std::unordered_set<StartInvariantMachineView> correct = {
make_2d_view(/*stride1*/ 1, /*stride2*/ 1),
make_2d_view(/*stride1*/ 2, /*stride2*/ 1),
make_2d_view(/*stride1*/ 1, /*stride2*/ 2),
make_2d_view(/*stride1*/ 3, /*stride2*/ 1),
make_2d_view(/*stride1*/ 1, /*stride2*/ 3),
make_2d_view(/*stride1*/ 1, /*stride2*/ 4),
};
lib/compiler/test/src/allowed_machine_ views.cc
line 140 at r3 (raw file):
std::unordered_set<StartInvariantMachineView> result = get_allowed_start_invariant_machine_views(ms, shape); CHECK(result == correct);
Suggestion:
std::unordered_set<StartInvariantMachineView> result =
get_allowed_start_invariant_machine_views(ms, shape);
CHECK(result == correct);
lib/compiler/test/src/allowed_machine_ views.cc
line 163 at r3 (raw file):
StridedRectangleSide{num_points_t(2), stride_t(4)}, StridedRectangleSide{num_points_t(3), stride_t(1)}}, }};
Suggestion:
TEST_CASE("get_all_tensor_to_machine_view_injections") {
ParallelTensorShape shape = ParallelTensorShape{
ParallelTensorDims{
FFOrdered<ShardParallelDim>{
ShardParallelDim{10, 3},
},
ReplicaParallelDimSet{
SumDegree{2},
DiscardCopyDegree{2},
},
},
DataType::FLOAT,
};
MachineView view =
MachineView{
device_id_from_index(0, DeviceType::GPU),
StridedRectangle{{
StridedRectangleSide{num_points_t(2), stride_t(1)},
StridedRectangleSide{num_points_t(2), stride_t(4)},
StridedRectangleSide{num_points_t(3), stride_t(1)},
}},
};
lib/compiler/test/src/allowed_machine_ views.cc
line 178 at r3 (raw file):
std::unordered_set<TensorToMachineViewInjection> result = get_all_tensor_to_machine_view_injections(view, shape); CHECK(correct == result);
Suggestion:
std::unordered_set<TensorToMachineViewInjection> correct = {
TensorToMachineViewInjection{b1}, TensorToMachineViewInjection{b2}};
std::unordered_set<TensorToMachineViewInjection> result =
get_all_tensor_to_machine_view_injections(view, shape);
CHECK(correct == result);
lib/compiler/test/src/machine_mapping.cc
line 7 at r3 (raw file):
TEST_SUITE(FF_TEST_SUITE) { // TEST_CASE("MachineMapping::combine") {
What's happening with this file?
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml
line 2 at r3 (raw file):
namespace = "FlexFlow" name = "parallel_tensor_dim_idx"
Naming convention-wise we usually either do ParallelTensorDimIdx
or parallel_tensor_dim_idx_t
, generally to allow parallel_tensor_dim_idx
to be a variable name
Suggestion:
name = "parallel_tensor_dim_idx_t"
lib/op-attrs/include/op-attrs/parallel_tensor_dim_idx.variant.toml
line 10 at r3 (raw file):
"fmt", ] explicit_constructors = false
Prefer explicit_constructors = true
unless there's a good reason. What's the reason here?
lib/pcg/include/pcg/device_coordinates.struct.toml
line 18 at r2 (raw file):
Previously, Marsella8 wrote…
Got it, so
FFOrdered
is only for Tensor dims?
Yes, to distinguish between the FlexFlow/pytorch ordering of dimensions and the flipped legion ordering of dimensions
lib/pcg/include/pcg/device_coordinates.struct.toml
line 17 at r3 (raw file):
"utils/fmt/vector.h", ]
Suggestion:
includes = [
"<vector>",
]
src_includes = [
"utils/hash/vector.h",
"utils/fmt/vector.h",
]
lib/pcg/include/pcg/device_coordinates.struct.toml
line 20 at r3 (raw file):
[[fields]] name = "coords"
Suggestion:
name = "raw_coords"
lib/pcg/include/pcg/machine_view_dim_idx.struct.toml
line 2 at r3 (raw file):
namespace = "FlexFlow" name = "machine_view_dim_idx"
For convention we usually either use MachineViewDimIdx
or machine_view_dim_idx_t
so that machine_view_dim_idx
can be a variable name
Suggestion:
name = "machine_view_dim_idx_t"
lib/pcg/include/pcg/strided_rectangle.h
line 17 at r3 (raw file):
std::vector<StridedRectangleSide> sides; std::tuple<std::vector<StridedRectangleSide> const &> tie() const; friend struct std::hash<StridedRectangle>;
Usually we put member variables (i.e., sides
) into their own private:
or public:
section for readability
Suggestion:
private:
std::tuple<std::vector<StridedRectangleSide> const &> tie() const;
friend struct std::hash<StridedRectangle>;
lib/pcg/src/pcg/device_id.cc
line 36 at r3 (raw file):
default: throw mk_runtime_error( fmt::format("Unsupported DeviceType {}", get_device_type(device_id)));
Why remove logging of the device_id_t
?
lib/pcg/src/pcg/device_id.cc
line 43 at r3 (raw file):
switch (device_type) { case DeviceType::GPU: return device_id_t(gpu_id_t(idx));
Prefer {...}
initialization for consistency
Suggestion:
return device_id_t{gpu_id_t{idx}};
lib/pcg/src/pcg/machine_view.cc
line 33 at r3 (raw file):
sum(transform(zip(coefficients, as_vector(point.coords)), [](auto const pair) { return pair.first * pair.second; })) + get_raw_id(mv.start);
Suggestion:
size_t coord_offset = sum(transform(zip(coefficients, as_vector(point.coords)),
[](auto const pair) { return pair.first * pair.second; }));
size_t raw_id = get_raw_id(mv.start) + coord_offset;
lib/pcg/src/pcg/machine_view.cc
line 37 at r3 (raw file):
return ((get_device_type(mv) == DeviceType::CPU) ? device_id_t(cpu_id_t(raw_id)) : device_id_t(gpu_id_t(raw_id)));
Suggestion:
return device_id_from_index(raw_id, get_device_type(mv));
lib/pcg/src/pcg/machine_view.cc
line 43 at r3 (raw file):
std::vector<std::vector<int>> ranges = transform(mv.rect.get_sides(), [](StridedRectangleSide const &side) { return range(0, get_side_size(side).unwrapped, side.stride.unwrapped);
Pull the lambda out into a separate function (get_points
or something) in strided_rectangle_side.h
Probably also worth pulling out a get_coords
or something in strided_rectangle.h
Code quote:
transform(mv.rect.get_sides(), [](StridedRectangleSide const &side) {
return range(0, get_side_size(side).unwrapped, side.stride.unwrapped);
lib/pcg/src/pcg/machine_view.cc
line 47 at r3 (raw file):
std::unordered_set<DeviceCoordinates> devices_as_points = unordered_set_of( transform(cartesian_product(ranges), [](auto const &point) { return DeviceCoordinates(point); }));
Prefer concrete type
Suggestion:
(std::vector<int> const &point)
lib/pcg/src/pcg/machine_view.cc
line 55 at r3 (raw file):
} device_id_t get_last_device_id(MachineView const &mv) {
Suggestion:
device_id_t get_maximum_device_id(MachineView const &mv) {
lib/pcg/src/pcg/machine_view.cc
line 59 at r3 (raw file):
// transform(mv.rect.get_sides(), [](StridedRectangleSide const &s) { // return s.stride.unwrapped; // }));
Delete comment
Code quote:
// DeviceCoordinates last_device = DeviceCoordinates(
// transform(mv.rect.get_sides(), [](StridedRectangleSide const &s) {
// return s.stride.unwrapped;
// }));
lib/pcg/src/pcg/machine_view.cc
line 67 at r3 (raw file):
} std::vector<num_points_t> get_num_devices_per_dim(MachineView const &mv) {
Do MachineView
s have ordered sides?
Code quote:
std::vector<num_points_t>
lib/pcg/src/pcg/machine_view.cc
line 73 at r3 (raw file):
} std::vector<side_size_t> get_side_size_per_dim(MachineView const &mv) {
Do MachineView
s have ordered sides?
Code quote:
std::vector<side_size_t>
lib/pcg/src/pcg/strided_rectangle.cc
line 24 at r3 (raw file):
std::tuple<std::vector<StridedRectangleSide> const &> StridedRectangle::tie() const { return std::tie(sides);
Suggestion:
return std::tie(this->sides);
lib/pcg/src/pcg/strided_rectangle.cc
line 52 at r3 (raw file):
std::vector<StridedRectangleSide> const &StridedRectangle::get_sides() const { return sides;
Always use explicit this
to reduce ambiguity
Suggestion:
return this->sides;
lib/pcg/test/src/pcg/machine_view.cc
line 56 at r3 (raw file):
}}; gpu_id_t start(0); MachineView mv{device_id_t{start}, rect};
Suggestion:
MachineView mv = MachineView{
gpu_id_t{0},
StridedRectangle{{
StridedRectangleSide(num_points_t(2), stride_t{3}),
StridedRectangleSide(num_points_t(2), stride_t{2}),
}},
};
lib/pcg/test/src/pcg/machine_view.cc
line 59 at r3 (raw file):
SUBCASE("get_device_ids") { std::unordered_set<device_id_t> expected = make_gpu_device_ids({0, 2, 12, 14});
Why aren't the device ids here {0, 2, 3, 5}
? Or at the very least not {0, 2, 6, 8}
? I feel like I don't understand the calculation here
lib/pcg/test/src/pcg/start_invariant_machine_view.cc
line 16 at r3 (raw file):
MachineView input = MachineView{start, rect}; MachineView result = machine_view_from_start_invariant(
This is a good property for a rapidcheck
test, but it would be good to also have a concrete test of each of the directions that isn't just testing invertibility
lib/pcg/test/src/pcg/strided_rectangle.cc
line 17 at r3 (raw file):
StridedRectangle r1 = StridedRectangle{{s1, s0}}; CHECK(r0 == r1); CHECK(r1.get_sides() == std::vector<StridedRectangleSide>{s0, s1});
How much do we care that the sides are sorted vs just caring that there is a canonical order?
lib/utils/include/utils/bidict/bidict.h
line 25 at r3 (raw file):
} bidict(std::vector<std::pair<L, R>> init)
Any particular reason for adding this over the existing constructors?
lib/utils/include/utils/bidict/bidict.h
line 26 at r3 (raw file):
bidict(std::vector<std::pair<L, R>> init) : bidict(init.begin(), init.end()) {}
Suggestion:
explicit bidict(std::vector<std::pair<L, R>> init)
: bidict(init.begin(), init.end()) {}
lib/utils/include/utils/containers/cartesian_product.h
line 17 at r3 (raw file):
std::function<void(V &, size_t)> recurse = [&](V ¤t, size_t depth) { if (depth == containers.size()) {
Suggestion:
template <typename C, typename Elem = typename C::value_type>
std::unordered_multiset<std::vector<Elem>>
cartesian_product(std::vector<C> const &containers) {
std::unordered_multiset<std::vector<Elem>> result;
std::function<void(std::vector<Elem> &, size_t)> recurse = [&](std::vector<Elem> ¤t, size_t depth) {
if (depth == containers.size()) {
lib/utils/include/utils/containers/cartesian_product.h
line 22 at r3 (raw file):
} for (const auto &item : containers.at(depth)) {
Suggestion:
for (Elem const &item : containers.at(depth)) {
lib/utils/include/utils/containers/filter.h
line 53 at r3 (raw file):
std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f); return result; }
Suggestion:
template <typename Elem, typename F>
std::unordered_multiset<Elem> filter(std::unordered_multiset<Elem> const &m,
F const &f) {
std::unordered_multiset<Elem> result;
std::copy_if(m.cbegin(), m.cend(), std::inserter(result, result.begin()), f);
return result;
}
lib/utils/include/utils/containers/permutations.h
line 13 at r3 (raw file):
template <typename C, typename V = std::vector<typename C::value_type>> auto permutations(C const &container) {
FYI there's also a lazy implementation of permutations
on the substitutions-fix
branch at https://github.com/lockshaw/FlexFlow/blob/substitutions-fix/lib/utils/include/utils/containers/get_all_permutations.h (note that having this version too is fine, just wanted to make you aware)
lib/utils/include/utils/containers/permutations.h
line 13 at r3 (raw file):
template <typename C, typename V = std::vector<typename C::value_type>> auto permutations(C const &container) {
Suggestion:
template <typename C, typename Elem = typename C::value_type>
std::unordered_set<std::vector<Elem>> permutations(C const &container) {
lib/utils/include/utils/containers/replicate.h
line 9 at r1 (raw file):
Previously, Marsella8 wrote…
Ahh makes sense thanks.
Not seeing this fix?
lib/utils/test/src/utils/containers/cartesian_product.cc
line 10 at r1 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("cartesian_product") { SUBCASE("empty") {
Why remove this test case?
lib/utils/test/src/utils/containers/permutations.cc
line 5 at r3 (raw file):
#include <doctest/doctest.h> #include <unordered_set> #include <vector>
Suggestion:
#include <unordered_set>
#include <vector>
#include "utils/fmt/vector.h"
#include "utils/fmt/unordered_set.h"
lib/compiler/src/machine_mapping.cc
line 393 at r2 (raw file):
Previously, Marsella8 wrote…
yes, since different dimensions might have the same degree (we want to see if there exists a mapping between the 2 (multi)sets of dimensions, so order does not matter)
Might be good to rename that function to multiset_of
to remove ambiguity
lib/compiler/src/machine_mapping.cc
line 404 at r2 (raw file):
Previously, Marsella8 wrote…
I've simplified, but it's still a bit tricky to explain:
Naively, we could think that, given a (2,3) stride for example, it would result in 3*2=6 as many tiles occupied with respect to having all strides be 1, so we could saymax_stride_product = num_total_devices/num_devices_used_by_tensor
(wherenum_devices_used_by_tensor
is the product of the parallel dims) and so we could simply say that the max stride across any dimension ismax_stride_product
, so we do the cartesian product and then filter downstream.
This however, doesn't quite work: consider, for example, a 2D MachineView with 2x2 points, and stride 2 across each dimension, and suppose there are 9 total devices. While the "volume" of the MachineView is technically 4x4, it can really fit into a 3x3 and thus we could fit it with the existing devices (since the outer volume can be cut without messing with any of the devices). To address this, I find not the number of total devices used by the tensor, but, the total number of "inner" devices, essentially the ones such that they have associated with them a full stride "volume". So I find the max stride for these using the naive procedure (which works since they all have full stride volume) and I know that if a given stride is too large for them then surely it's too large for the full set of devices, which essentially contains them. (Note that this solutions is incredibly underoptimized, if the generation turns out to be to slow I'll fix it).
Ah okay, yeah I'd add that explanation in as a comment. The code makes a lot more sense with that bit of context
lib/compiler/src/machine_mapping.cc
line 405 at r2 (raw file):
Previously, Marsella8 wrote…
fixed. is there a way to cast a container containing type X to the same container but with type Y (so not using transform and casting within the lambda)?
I don't think so, and generally we avoid casting anyway in favor of explicit construction
lib/compiler/src/machine_mapping.cc
line 416 at r2 (raw file):
I'm a bit hesitant using dtgen since I don't want to give too much visibility to components that are only used within a very narrow piece of the codebase
That makes sense, but I think having a MultiDimensionalStride
or something like that would be fine, especially as it's a pretty simple/obvious concept and could be used elsewhere e.g., to move get_strided_rectangle
out into its own function that constructs a StridedRectangle
from a starting point, a number of repetitions in each dim, and a MultiDimensionalStride
fixed it up a bit, is this enough?
It's definitely a big improvement! I do think breaking it up into stages that produce different named data structures (i.e., first generating the candidate MultiDimensionalStride
s, then using them to generate the StartInvariantMachineView
s, and then using them to generate the MachineView
s) would help, along with a brief comment explaining the stride generation logic which is inherently a bit complicated. It might also be a good idea to add a brief comment to the candidate generation functions explaining the guarantees (if any) they provide on their outputs: for example, are each of the outputs guaranteed to have at least one valid MachineView
, or can there be outputs for which no valid MachineView
s exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 21 files at r9.
Reviewable status: 73 of 96 files reviewed, 30 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/include/compiler/allowed_machine_views.h
line 15 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
These two overloads seem nontrivially different(?), so giving them different (and clearer) names might help
second one was redundant.
lib/compiler/src/compiler/allowed_machine_views.cc
line 67 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Can any of this get pulled out into separate functions that make logical sense by themselves? Each statement is okay to read but the sheer number of statements is a bit intense and doesn't provide a lot of walking the reader through. If it really is this complicated that's fine (there are places where stuff just fundamentally needs a bunch of code that can't get decomposed), but I'd like to be convinced of that 🙂
I've tried to split it up a bit more, but it's probably not enough. Not super sure how to express the candidate_strides
computation more cleanly otherwise.
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's going on here?
the interface for MachineView
has changed so there is not really an equivalent for make_1d_machine_view
anymore.
It would have to be something like a OperatorTaskSpace
which encodes the number of points (so degree of the only dimension) and then a MachineView
which tells us the stride and the StartCoordinate within the MachineSpecification space.
lib/op-attrs/src/op-attrs/parallel_tensor_dim_idx_t.cc
line 10 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't know how widely
parallel_tensor_dim_idx_t
will be used, but it's such an obvious thing to have conceptually that I think it makes sense to integrate intoparallel_tensor_shape.h
, etc.
Done. note that since we don't have the bijective mapping stuff anymore, parallel_tensor_dim_idx
is not needed anymore, but I've moved it to 'parallel_tensor_shape.cc` for future use.
lib/pcg/include/pcg/machine_specification.h
line 19 at r8 (raw file):
bool is_valid_machine_space_coordinates(MachineSpecification const &ms, MachineSpaceCoordinate const &coord);
Done.
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why are these separate
vectors
rather than a singlevector
?
we can have them as a single vector of pairs (or some custom other data type) but i think it would make it a bit more clunky to iterate over either vector. I can change it if you think it's preferrable.
lib/pcg/include/pcg/machine_view_dim_idx_t.h
line 9 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why a
vector
here?
old residue from the previous interface, now it's deleted
lib/pcg/include/pcg/task_space_operator.h
line 12 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Why not just
get_task_space_coordinates
?
Done.
lib/pcg/include/pcg/task_space_operator.h
line 15 at r8 (raw file):
TaskSpaceCoordinate get_maximum_fragment_coordinate(TaskSpaceOperator const &task);
Done.
lib/pcg/include/pcg/task_space_operator.h
line 18 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Clearer and doesn't require new terminology (i.e., "fragment")--not that I mind new terminology when necessary, just don't think it's necessary here
Done.
lib/pcg/src/pcg/delete.c
line at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What's going on here?
sorry old file where I dumped some stuff, now deleted
lib/pcg/src/pcg/machine_specification.cc
line 43 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Might be better to throw an exception so that the out-of-bounds case can be tested? We can't really test that an
assert
is fired
Done + added some tests
lib/pcg/src/pcg/machine_view.cc
line 47 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
This needs simplifying (pull out some lambdas and/or functions--for example, a function that adds
DeviceCoordinate
s together would make a lot of sense, or maybe a function that gives you a number based on astride
in aStridedRectangleDim
?). When you're dealing with wrapper types, instead of just unwrapping them in your function it's often better to write small wrapper functions that do the unwrapping for you (where those functions logically stand by themselves) and then use those, e.g., the functions incomputation_graph.h
changed, lmk if this is better (old implementation was crap, I apologize)
lib/pcg/test/src/pcg/machine_view.cc
line 35 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Add argument name comments?
Done.
lib/utils/include/utils/containers/get_all_permutations.h
line 105 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'd move this to a separate
get_all_permutations_with_repetitions.h
file unless there's a strong reason it should be here
Done.
lib/utils/include/utils/containers/transform.h
line 50 at r8 (raw file):
template <typename F, typename In, typename Out = decltype(std::declval<F>()(std::declval<In>()))>
Done.
lib/utils/include/utils/containers/transform.h
line 53 at r8 (raw file):
std::set<Out> transform(std::set<In> const &v, F const &f) { std::set<Out> result; for (auto const &e : v) {
Done.
lib/utils/test/src/utils/containers/get_all_permutations.cc
line 56 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Come up with better testcase descriptions. The goal of these should be a high level discussion of semantic properties and/or the failure cases they're trying to check--"container size = 3, n = 1" is something I can trivially see from the code and isn't really (at least in the current wording) a high-level idea. If there's a reason this testcase is here (say, testing the limiting case that we have only one element), then the description should be clarified. If there's not a good high-level reason for that specific testcase, it should probably be removed 🙂
great point, yeah tests weren't super targeted, slightly retooled them to be more informative.
lib/utils/test/src/utils/containers/get_all_permutations.cc
line 60 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'm guessing this terminology is from https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition--that said, the naming is somewhat confusing (as even wikipedia notes), so either come up with a slightly clearer name for this (isn't this just essentially cartesian product in the case that the tuple size is >= the list size?) or add an explicit docstring with links to that wikipedia page
afaik, permutations_with_repetition is the most standard name for this kind of behaviour. I've added a docstring
lib/utils/test/src/utils/containers/range.cc
line 2 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't these have to be
test/utils/doctest/fmt/vector.h
now?
Done.
lib/utils/include/utils/fmt/unordered_multiset.h
line 27 at r7 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Delete?
Done.
lib/pcg/include/pcg/machine_specification_coordinate.struct.toml
line 17 at r7 (raw file):
Previously, lockshaw (Colin Unger) wrote…
"inter" by itself is unclear. "inter"-what?
Done.
lib/pcg/include/pcg/machine_specification_coordinate.struct.toml
line 21 at r7 (raw file):
[[fields]] name = "intra"
Done.
lib/pcg/include/pcg/task_space_operator.struct.toml
line 2 at r9 (raw file):
namespace = "FlexFlow" name = "TaskSpaceOperator"
Done.
lib/pcg/include/pcg/task_space_operator.struct.toml
line 24 at r9 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is there a value to using
num_points
now? It doesn't seem like there's really an alternative forOperatorTaskSpace
(like, there's no way for it to be strided, so the extra safety doesn't seem as worth it now?)
agree, only thing i can think of is that num_points_t
makes clear what the degree of an operator means. changed back to int
lib/utils/include/utils/containers/unordered_multiset_of.h
line 10 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
It's not really a big deal either way, but technically
std::cbegin
andstd::cend
work on raw arrays. This was really more of a "C++ interesting FYI" than a particularly important comment tbh
got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 21 files at r9, 34 of 36 files at r10, all commit messages.
Reviewable status: 99 of 101 files reviewed, 38 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/test/src/allowed_machine_views.cc
line 18 at r10 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification{1, 5, 5, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/1,
/*num_cpus_per_node=*/5,
/*num_gpus_per_node=*/5,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/compiler/test/src/allowed_machine_views.cc
line 24 at r10 (raw file):
MachineView{{{stride_t{1}}}, {{MachineSpecificationDimension::INTRA_NODE}}, MachineSpaceCoordinate{0, 0, DeviceType::GPU}},
Suggestion:
MachineSpaceCoordinate{
/*node=idx=*/0,
/*device_idx=*/0,
DeviceType::GPU,
}},
lib/compiler/test/src/allowed_machine_views.cc
line 45 at r10 (raw file):
SUBCASE("2 degrees of parallelism") { MachineSpecification ms = MachineSpecification{3, 3, 3, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/3,
/*num_cpus_per_node=*/3,
/*num_gpus_per_node=*/3,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/compiler/test/src/allowed_machine_views.cc
line 48 at r10 (raw file):
OperatorTaskSpace task = OperatorTaskSpace{{2, 3}}; auto make_2d_views = [&](int start_x,
Suggestion:
auto make_2d_view
lib/compiler/test/src/allowed_machine_views.cc
line 49 at r10 (raw file):
auto make_2d_views = [&](int start_x, int start_y,
Aren't these in machine space? What does y
mean in a machine space?
Code quote:
int start_y,
lib/compiler/test/src/allowed_machine_views.cc
line 63 at r10 (raw file):
auto inter = MachineSpecificationDimension::INTER_NODE; std::unordered_set<MachineView> correct = { make_2d_views(0, 0, /*stride1*/ 1, /*stride2*/ 1, inter, intra),
Suggestion:
make_2d_views(0, 0, /*stride1=*/1, /*stride2=*/1, inter, intra),
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
Previously, Marsella8 wrote…
the interface for
MachineView
has changed so there is not really an equivalent formake_1d_machine_view
anymore.
It would have to be something like aOperatorTaskSpace
which encodes the number of points (so degree of the only dimension) and then aMachineView
which tells us the stride and the StartCoordinate within the MachineSpecification space.
Got it, so we likely need to (at some point) add a primitive in op-attrs
that lets us take an UnmappedCostEstimateKey
and return the OperatorTaskSpace
for the operator?
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
Previously, Marsella8 wrote…
we can have them as a single vector of pairs (or some custom other data type) but i think it would make it a bit more clunky to iterate over either vector. I can change it if you think it's preferrable.
I'd go with one vector due to the "make invalid states unrepresentable" reasoning (one vector prevents the two from having different lengths). I'd add a custom well-named (maybe MachineDimensionStride
? or StridedMachineDimension
? or even something as basic as MachineDimensionWithStride
? Feel free to use something else if you can come up with something, I'm not exactly in love with any of the above names) .struct.toml
datatype to represent the combination of the two
lib/pcg/test/src/pcg/machine_specification.cc
line 11 at r10 (raw file):
TEST_CASE("MachineSpecification") { MachineSpecification ms = MachineSpecification{4, 16, 8, 100.0f, 200.0f};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/4,
/*num_cpus_per_node=*/16,
/*num_gpus_per_node=*/8,
/*inter_node_bandwidth=*/100.0f,
/*intra_node_bandwidth=*/200.0f,
};
lib/pcg/test/src/pcg/machine_specification.cc
line 29 at r10 (raw file):
SUBCASE("valid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 12, DeviceType::CPU};
Suggestion:
MachineSpaceCoordinate{
/*node_idx=*/2,
/*device_idx=*/12,
DeviceType::CPU,
};
lib/pcg/test/src/pcg/machine_specification.cc
line 35 at r10 (raw file):
CHECK(correct == result); } SUBCASE("invalid MachineSpaceCoordinate") {
Suggestion:
SUBCASE("MachineSpaceCoordinate out of bounds of machine spec") {
lib/pcg/test/src/pcg/machine_specification.cc
line 37 at r10 (raw file):
SUBCASE("invalid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 18, DeviceType::CPU};
Suggestion:
MachineSpaceCoordinate{
/*node_idx=*/2,
/*device_idx=*/18,
DeviceType::CPU,
};
lib/pcg/test/src/pcg/machine_view.cc
line 52 at r6 (raw file):
Previously, Marsella8 wrote…
Currently the indexing is done "jumping along with the stride" (e.g 3x3 machine view, 2x2 points, 2x2 stride, we have (0,0) be in position (0, 0), (1,0) be in the real position (2, 0), etc...). Depends on whether we want to index the machineview directly (in which case it's better to have the current indexing) or not (in which case you'd prefer the second one). I personally like the second one best, but when we were discussing it a few weeks ago you had used the current one and had assumed you preferred it.
I think this was an out-of-date comment from when I reviewed pre-our-few-weeks-ago-discussion, sorry for the mixup
lib/pcg/test/src/pcg/machine_view.cc
line 38 at r10 (raw file):
/*num_gpus_per_node*/ 6, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/1,
/*num_cpus_per_node=*/6,
/*num_gpus_per_node=*/6,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/pcg/test/src/pcg/machine_view.cc
line 40 at r10 (raw file):
0}; SUBCASE("Fragment 0") {
Didn't we get rid of "fragment"?
Code quote:
SUBCASE("Fragment 0") {
lib/pcg/test/src/pcg/machine_view.cc
line 49 at r10 (raw file):
} SUBCASE("Fragment 1") {
Come up with better case names. It appears you have three subcases here for the 2d thing, so I'm assuming you're checking the start point and then checking the stepping in each of the two dimensions?
Actually looks like not as there's only one dim, so I'm not sure why Fragment 2
is here at all. Try to think about your subcase names as explaining to me why they're they're, not just what they are (you can add near-infinite testcases, so your names should give me some idea of why this particular set of cases was chosen).
Code quote:
SUBCASE("Fragment 1") {
lib/pcg/test/src/pcg/machine_view.cc
line 58 at r10 (raw file):
} SUBCASE("Fragment 2") {
What happens if the TaskSpaceCoordinate
is out of bounds?
lib/pcg/test/src/pcg/machine_view.cc
line 74 at r10 (raw file):
{{MachineSpecificationDimension::INTER_NODE, MachineSpecificationDimension::INTRA_NODE}}, MachineSpaceCoordinate{1, 2, DeviceType::GPU}};
Feels a bit weird having the start
field last? 😂
Code quote:
MachineSpaceCoordinate{1, 2, DeviceType::GPU}}
lib/pcg/test/src/pcg/machine_view.cc
line 80 at r10 (raw file):
/*num_gpus_per_node*/ 5, 0, 0};
Suggestion:
MachineSpecification ms = MachineSpecification{
/*num_nodes=*/3,
/*num_cpus_per_node=*/5,
/*num_gpus_per_node=*/5,
/*inter_node_bandwidth=*/0,
/*intra_node_bandwidth=*/0,
};
lib/pcg/test/src/pcg/machine_view.cc
line 82 at r10 (raw file):
0}; SUBCASE("Fragment (0,0)") {
See above about better testcase naming
Code quote:
SUBCASE("Fragment (0,0)") {
lib/pcg/test/src/pcg/machine_view.cc
line 85 at r10 (raw file):
TaskSpaceCoordinate coord = TaskSpaceCoordinate{{0, 0}}; MachineSpaceCoordinate correct = MachineSpaceCoordinate{1, 2, DeviceType::GPU};
Suggestion:
MachineSpaceCoordinate{/*node_idx=*/1, /*device_idx=*/2, DeviceType::GPU};
lib/pcg/test/src/pcg/machine_view.cc
line 124 at r10 (raw file):
{{MachineSpecificationDimension::INTER_NODE, MachineSpecificationDimension::INTRA_NODE, MachineSpecificationDimension::INTRA_NODE}},
Rather than having duplicate INTER_NODE
/INTRA_NODE
values for the first time in the 3d case, why not have a separate 2d case that is just duplicates?
Code quote:
OperatorTaskSpace task = OperatorTaskSpace{{2, 2, 2}};
MachineView mv =
MachineView{{{stride_t{1}, stride_t{2}, stride_t{1}}},
{{MachineSpecificationDimension::INTER_NODE,
MachineSpecificationDimension::INTRA_NODE,
MachineSpecificationDimension::INTRA_NODE}},
lib/pcg/test/src/pcg/machine_view.cc
line 128 at r10 (raw file):
MachineSpecification ms = MachineSpecification{/*num_nodes*/ 2, /*num_cpus_per_node*/ 8,
Fix parameter naming comments
lib/pcg/test/src/pcg/machine_view.cc
line 133 at r10 (raw file):
0}; SUBCASE("Fragment (0,0,1)") {
Better testcase names (see above)
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 16 at r10 (raw file):
* @details *https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition **/
Suggestion:
/**
* @brief For a given container `c` and integer `n`, return all possible vectors
* of size `n` that only contain (possibly duplicated) elements of `c`.
* @details
* https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition
**/
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 18 at r10 (raw file):
**/ template <typename T> struct permutations_with_repetition_container {
Would probably be better to just do this as a strict function rather than a lazy iterator until we actually have a concrete need to not materialize the whole container at once (I probably should have taken my own advice when writing get_permutations
tbh). I'm not against the container implementation, but it's usually going to be much more complex and harder to read so we should only do it if we have a real reason
lib/utils/test/src/utils/containers/cartesian_product.cc
line 4 at r10 (raw file):
#include "test/utils/doctest/fmt/unordered_multiset.h" #include "utils/fmt/unordered_multiset.h" #include "utils/fmt/vector.h"
Are these still necessary?
Code quote:
#include "utils/fmt/unordered_multiset.h"
#include "utils/fmt/vector.h"
lib/utils/test/src/utils/containers/replicate.cc
line 4 at r10 (raw file):
#include "test/utils/doctest/fmt/vector.h" #include "utils/fmt/unordered_set.h" #include "utils/fmt/vector.h"
Are these still necessary?
Code quote:
#include "utils/fmt/unordered_set.h"
#include "utils/fmt/vector.h"
lib/utils/test/src/utils/containers/scanl.cc
line 3 at r10 (raw file):
#include "utils/containers/scanl.h" #include "test/utils/doctest/fmt/vector.h" #include "utils/fmt/vector.h"
Is this still necessary?
Code quote:
#include "utils/fmt/vector.h"
lib/utils/test/src/utils/containers/scanl.cc
line 40 at r10 (raw file):
} TEST_CASE("scanl1") {
What happens if the input to scanl1
is empty?
Code quote:
TEST_CASE("scanl1") {
lib/pcg/include/pcg/operator_task_space.h
line 1 at r10 (raw file):
#ifndef _FLEXFLOW_PCG_INCLUDE_operator_task_space_H
Capitalization?
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("operator_task_space functions") { SUBCASE("get_task_space_coordinates") {
What if the task space dim list is empty?
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("operator_task_space functions") { SUBCASE("get_task_space_coordinates") {
Suggestion:
TEST_CASE("get_task_space_coordinates") {
lib/pcg/test/src/pcg/operator_task_space.cc
line 11 at r10 (raw file):
SUBCASE("get_task_space_coordinates") { SUBCASE("2D Task") {
Suggestion:
SUBCASE("2D Task Space") {
lib/pcg/test/src/pcg/operator_task_space.cc
line 19 at r10 (raw file):
TaskSpaceCoordinate{{0, 1}}, TaskSpaceCoordinate{{1, 0}}, TaskSpaceCoordinate{{1, 1}}}};
Suggestion:
std::unordered_set<TaskSpaceCoordinate> correct = {
TaskSpaceCoordinate{{0, 0}},
TaskSpaceCoordinate{{0, 1}},
TaskSpaceCoordinate{{1, 0}},
TaskSpaceCoordinate{{1, 1}}.
};
lib/pcg/test/src/pcg/operator_task_space.cc
line 32 at r10 (raw file):
TaskSpaceCoordinate{{0, 0, 1}}, TaskSpaceCoordinate{{0, 1, 0}}, TaskSpaceCoordinate{{0, 1, 1}}}};
Suggestion:
std::unordered_set<TaskSpaceCoordinate> correct = {
TaskSpaceCoordinate{{0, 0, 0}},
TaskSpaceCoordinate{{0, 0, 1}},
TaskSpaceCoordinate{{0, 1, 0}},
TaskSpaceCoordinate{{0, 1, 1}},
};
lib/pcg/test/src/pcg/operator_task_space.cc
line 38 at r10 (raw file):
} } SUBCASE("get_task_space_maximum_coordinate") {
Suggestion:
TEST_CASE("get_task_space_maximum_coordinate") {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 99 of 101 files reviewed, 38 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/test/src/allowed_machine_views.cc
line 18 at r10 (raw file):
SUBCASE("1 degree of parallelism") { MachineSpecification ms = MachineSpecification{1, 5, 5, 0, 0};
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 24 at r10 (raw file):
MachineView{{{stride_t{1}}}, {{MachineSpecificationDimension::INTRA_NODE}}, MachineSpaceCoordinate{0, 0, DeviceType::GPU}},
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 45 at r10 (raw file):
SUBCASE("2 degrees of parallelism") { MachineSpecification ms = MachineSpecification{3, 3, 3, 0, 0};
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 48 at r10 (raw file):
OperatorTaskSpace task = OperatorTaskSpace{{2, 3}}; auto make_2d_views = [&](int start_x,
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 49 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Aren't these in machine space? What does
y
mean in a machine space?
Machine space is the 2D space described by machine specification. (!= Task Space, which is N-dimensional, with N dimension of the parallel tensor). So we have a vector of strides, a vector of dimensions to which to map each dimension of the task, and then a 2D coordinate (start_x, start_y) which is the node and device indices.
lib/compiler/test/src/allowed_machine_views.cc
line 63 at r10 (raw file):
auto inter = MachineSpecificationDimension::INTER_NODE; std::unordered_set<MachineView> correct = { make_2d_views(0, 0, /*stride1*/ 1, /*stride2*/ 1, inter, intra),
Done.
lib/local-execution/test/src/test_local_cost_estimator.cc
line 11 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Got it, so we likely need to (at some point) add a primitive in
op-attrs
that lets us take anUnmappedCostEstimateKey
and return theOperatorTaskSpace
for the operator?
yes pretty much (we don't really need OperatorTaskSpace
as long as we have a datatype from which we can get a vector of the parallel degrees, but obviously OperatorTaskSpace makes things more clean)
lib/pcg/test/src/pcg/machine_specification.cc
line 11 at r10 (raw file):
TEST_CASE("MachineSpecification") { MachineSpecification ms = MachineSpecification{4, 16, 8, 100.0f, 200.0f};
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 29 at r10 (raw file):
SUBCASE("valid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 12, DeviceType::CPU};
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 35 at r10 (raw file):
CHECK(correct == result); } SUBCASE("invalid MachineSpaceCoordinate") {
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 37 at r10 (raw file):
SUBCASE("invalid MachineSpaceCoordinate") { MachineSpaceCoordinate coord = MachineSpaceCoordinate{2, 18, DeviceType::CPU};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 38 at r10 (raw file):
/*num_gpus_per_node*/ 6, 0, 0};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 40 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Didn't we get rid of "fragment"?
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 49 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Come up with better case names. It appears you have three subcases here for the 2d thing, so I'm assuming you're checking the start point and then checking the stepping in each of the two dimensions?
Actually looks like not as there's only one dim, so I'm not sure why
Fragment 2
is here at all. Try to think about your subcase names as explaining to me why they're they're, not just what they are (you can add near-infinite testcases, so your names should give me some idea of why this particular set of cases was chosen).
made a docstring for each case that explains it pretty extensively.
lib/pcg/test/src/pcg/machine_view.cc
line 58 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What happens if the
TaskSpaceCoordinate
is out of bounds?
now returns an optional if out of bound.
(also added testing)
lib/pcg/test/src/pcg/machine_view.cc
line 74 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Feels a bit weird having the
start
field last? 😂
lol fair enough, changed :-)
lib/pcg/test/src/pcg/machine_view.cc
line 80 at r10 (raw file):
/*num_gpus_per_node*/ 5, 0, 0};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 82 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
See above about better testcase naming
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 85 at r10 (raw file):
TaskSpaceCoordinate coord = TaskSpaceCoordinate{{0, 0}}; MachineSpaceCoordinate correct = MachineSpaceCoordinate{1, 2, DeviceType::GPU};
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 124 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Rather than having duplicate
INTER_NODE
/INTRA_NODE
values for the first time in the 3d case, why not have a separate 2d case that is just duplicates?
good point, added new test case
lib/pcg/test/src/pcg/machine_view.cc
line 128 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Fix parameter naming comments
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 133 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Better testcase names (see above)
Done.
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 16 at r10 (raw file):
* @details *https://en.wikipedia.org/wiki/Permutation#Permutations_with_repetition **/
Done.
lib/utils/include/utils/containers/get_all_permutations_with_repetition.h
line 18 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Would probably be better to just do this as a strict function rather than a lazy iterator until we actually have a concrete need to not materialize the whole container at once (I probably should have taken my own advice when writing
get_permutations
tbh). I'm not against the container implementation, but it's usually going to be much more complex and harder to read so we should only do it if we have a real reason
Done.
lib/utils/test/src/utils/containers/cartesian_product.cc
line 4 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Are these still necessary?
Done.
lib/utils/test/src/utils/containers/replicate.cc
line 4 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Are these still necessary?
Done.
lib/utils/test/src/utils/containers/scanl.cc
line 3 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is this still necessary?
Done.
lib/utils/test/src/utils/containers/scanl.cc
line 40 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What happens if the input to
scanl1
is empty?
returns an empty list (added test case)
lib/pcg/include/pcg/operator_task_space.h
line 1 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Capitalization?
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("operator_task_space functions") { SUBCASE("get_task_space_coordinates") {
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What if the task space dim list is empty?
Returns a single empty coordinate (vacuously makes sense?). Not sure what a zero dimensional TaskSpaceOperator would represent, whether something with no parallelism or a straight up wrong data-type (so something with no parallelism would be a TaskSpaceOperator with 1 or more ones. Let me know what you think is best.
lib/pcg/test/src/pcg/operator_task_space.cc
line 11 at r10 (raw file):
SUBCASE("get_task_space_coordinates") { SUBCASE("2D Task") {
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 19 at r10 (raw file):
TaskSpaceCoordinate{{0, 1}}, TaskSpaceCoordinate{{1, 0}}, TaskSpaceCoordinate{{1, 1}}}};
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 32 at r10 (raw file):
TaskSpaceCoordinate{{0, 0, 1}}, TaskSpaceCoordinate{{0, 1, 0}}, TaskSpaceCoordinate{{0, 1, 1}}}};
Done.
lib/pcg/test/src/pcg/operator_task_space.cc
line 38 at r10 (raw file):
} } SUBCASE("get_task_space_maximum_coordinate") {
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 87 of 102 files reviewed, 38 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/pcg/include/pcg/machine_view.struct.toml
line 29 at r8 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I'd go with one vector due to the "make invalid states unrepresentable" reasoning (one vector prevents the two from having different lengths). I'd add a custom well-named (maybe
MachineDimensionStride
? orStridedMachineDimension
? or even something as basic asMachineDimensionWithStride
? Feel free to use something else if you can come up with something, I'm not exactly in love with any of the above names).struct.toml
datatype to represent the combination of the two
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 15 files at r11, all commit messages.
Reviewable status: 100 of 102 files reviewed, 7 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/compiler/test/src/allowed_machine_views.cc
line 45 at r10 (raw file):
Previously, Marsella8 wrote…
Done.
Doesn't appear to have changed?
lib/compiler/test/src/allowed_machine_views.cc
line 49 at r10 (raw file):
Previously, Marsella8 wrote…
Machine space is the 2D space described by machine specification. (!= Task Space, which is N-dimensional, with N dimension of the parallel tensor). So we have a vector of strides, a vector of dimensions to which to map each dimension of the task, and then a 2D coordinate (start_x, start_y) which is the node and device indices.
Then why not just call it start_node_idx
and start_device_idx
?
lib/pcg/test/src/pcg/machine_specification.cc
line 35 at r10 (raw file):
Previously, Marsella8 wrote…
Done.
I think you're missing "out of bounds" in the new name?
lib/pcg/test/src/pcg/machine_view.cc
line 49 at r10 (raw file):
Previously, Marsella8 wrote…
made a docstring for each case that explains it pretty extensively.
My concern was more that the names weren't really justifying why that point was included for testing, but as it appears that all of the possible points are tested exhaustively I guess this is fine
lib/pcg/test/src/pcg/machine_view.cc
line 245 at r11 (raw file):
/** * The tasks will thus be distributed like this:
proj format
appears to have butchered your nice diagram--there should be a clang-format comment you can insert to disable clang-format for these lines
lib/pcg/test/src/pcg/operator_task_space.cc
line 9 at r10 (raw file):
Previously, Marsella8 wrote…
Returns a single empty coordinate (vacuously makes sense?). Not sure what a zero dimensional TaskSpaceOperator would represent, whether something with no parallelism or a straight up wrong data-type (so something with no parallelism would be a TaskSpaceOperator with 1 or more ones. Let me know what you think is best.
Yeah this is fine, I was mostly just curious and it's a bit nice to have edgecases like this documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 100 of 102 files reviewed, 7 unresolved discussions (waiting on @lockshaw and @wmdi)
lib/compiler/test/src/allowed_machine_views.cc
line 45 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Doesn't appear to have changed?
Done.
lib/compiler/test/src/allowed_machine_views.cc
line 49 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Then why not just call it
start_node_idx
andstart_device_idx
?
Done.
lib/pcg/test/src/pcg/machine_specification.cc
line 35 at r10 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I think you're missing "out of bounds" in the new name?
Done.
lib/pcg/test/src/pcg/machine_view.cc
line 245 at r11 (raw file):
Previously, lockshaw (Colin Unger) wrote…
proj format
appears to have butchered your nice diagram--there should be a clang-format comment you can insert to disable clang-format for these lines
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r11, 3 of 3 files at r12, all commit messages.
Reviewable status: 101 of 102 files reviewed, 9 unresolved discussions (waiting on @Marsella8 and @wmdi)
lib/pcg/include/pcg/machine_view_dimension.struct.toml
line 20 at r12 (raw file):
[[fields]] name = "stride" type = "::FlexFlow::stride_t"
Any particular reason to keep this as a stride_t
rather than just dropping it to an int
? Not a big issue, just curious
lib/pcg/src/pcg/machine_view.cc
line 33 at r12 (raw file):
static std::vector<int> get_projection_indices(MachineView const &mv,
Suggestion:
get_projection_indices_for_dimension(MachineView const &mv,
lib/pcg/src/pcg/machine_view.cc
line 39 at r12 (raw file):
std::vector<MachineSpecificationDimension> projections = get_projections(mv); for (size_t i = 0; i < projections.size(); ++i) { if (projections[i] == dimension) {
Suggestion:
projections.at(i)
lib/pcg/src/pcg/machine_view.cc
line 44 at r12 (raw file):
} return projection_indices; }
Suggestion:
std::vector<int> projection_indices
= filter(count(projections.size()),
[&](int idx) {
return projections.at(idx) == dimension;
});
return projection_indices;
}
lib/pcg/src/pcg/machine_view.cc
line 46 at r12 (raw file):
} static int compute_index(int start_idx,
Try to aim for clearer function names even if they're static, especially if they take this number of parameters (in fact, arguably the sheer number of shared parameters imply that this should probably just be a lambda in get_machine_space_coordinate
as it doesn't really seem to stand alone
lib/pcg/src/pcg/machine_view.cc
line 62 at r12 (raw file):
sizes.push_back(dim_size); coord_points.push_back(coord.raw_coord[i]); strides.push_back(mv_strides[i].unwrapped);
Code-style wise probably cleaner to generate sizes
, coord_points
, and strides
as three separate transform
expressions rather than one big combined loop
Code quote:
for (int i : projection_indices) {
int dim_size = task.degrees[i] * mv_strides[i].unwrapped;
sizes.push_back(dim_size);
coord_points.push_back(coord.raw_coord[i]);
strides.push_back(mv_strides[i].unwrapped);
lib/pcg/src/pcg/machine_view.cc
line 78 at r12 (raw file):
MachineView const &mv, TaskSpaceCoordinate const &coord, MachineSpecification const &ms) {
What about splitting this into a function that takes a MachineView
and generates a DeduplicatedMachineView
(which has all of the strides multiplied so you can't produce redundant points), and then a corresponding function that takes a DeduplicatedMachineView
which then can do a much simpler computation to find the MachineSpaceCoordinate
. Thoughts?
Overall one of the best way you can make code more readable is to introduce (sane and conceptually clear) intermediate types, and then break a complicated function that does A -> D
into A -> B -> C -> D
Code quote:
std::optional<MachineSpaceCoordinate>
get_machine_space_coordinate(OperatorTaskSpace const &task,
MachineView const &mv,
TaskSpaceCoordinate const &coord,
MachineSpecification const &ms) {
lib/pcg/src/pcg/machine_view.cc
line 96 at r12 (raw file):
return ms_coord; } std::unordered_set<MachineSpaceCoordinate>
Is it possible that this should be an std::unordered_multiset
, or is MachineView
defined such as to make this impossible?
lib/pcg/src/pcg/machine_view.cc
line 103 at r12 (raw file):
return transform( get_task_space_coordinates(task), [&](TaskSpaceCoordinate const &c) { return get_machine_space_coordinate(task, mv, c, ms).value();
Try not to abbreviate these variables too much or the code starts to become rather hard to skim as I keep having to chase down definitions unnecessarily
Suggestion:
return get_machine_space_coordinate(task, machine_view, coord, machine_spec).value();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r11.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @Marsella8)
lib/compiler/src/compiler/allowed_machine_views.cc
line 67 at r6 (raw file):
Previously, Marsella8 wrote…
I've tried to split it up a bit more, but it's probably not enough. Not super sure how to express the
candidate_strides
computation more cleanly otherwise.
No worries, the key thing is that the high-level flow is clear. As long as you name get_candidate_strides
clearly so I know what it does I can just ignore the details of it and still get a good idea of how the overall code works. Breaking things into separate lambdas, naming things clearly, having solid intermediate types, and having a clear logical flow (generate candidate values, generate an overapproximation of the space, then filter) like you did here is great and much more important than low-level style.
lib/compiler/src/compiler/allowed_machine_views.cc
line 27 at r12 (raw file):
OperatorTaskSpace const &task, MachineSpecification const &ms) { std::optional<MachineSpaceCoordinate> maximum_device_coords =
Suggestion:
std::optional<MachineSpaceCoordinate> maximum_device_coord
lib/compiler/src/compiler/allowed_machine_views.cc
line 36 at r12 (raw file):
* The returned set includes all valid machine views, and might contain invalid ones. This function should never be used externally (see * `get_allowed_partial_machine_view_mappings` instead). There is no
Is this the right name?
Code quote:
`get_allowed_partial_machine_view_mappings
lib/compiler/src/compiler/allowed_machine_views.cc
line 40 at r12 (raw file):
its possible for all * `MachineView`s to be invalid) */
Suggestion:
/* Generates a set of candidate `MachineView`s
* The returned set includes all valid machine views, and might contain
* invalid ones. This function should not be used externally (see
* `get_allowed_partial_machine_view_mappings` instead). There is no
* guarantee that a non-empty returned set contains a valid machine view (i.e.
* it's possible for all the returned `MachineView`s to be invalid)
*/
lib/compiler/src/compiler/allowed_machine_views.cc
line 61 at r12 (raw file):
std::vector<stride_t> single_stride_range = transform(range(1, max_stride_upper_bound + 1), [](int stride) { return stride_t(stride); });
for consistency
Suggestion:
[](int stride) { return stride_t{stride}; });
lib/compiler/src/compiler/allowed_machine_views.cc
line 75 at r12 (raw file):
std::unordered_set<MachineSpaceCoordinate> result; for (int i : range(ms.num_nodes)) { for (int j : range(get_num_devices_per_node(ms, device_type))) {
Suggestion:
for (int node_idx : range(ms.num_nodes)) {
for (int device_idx : range(get_num_devices_per_node(ms, device_type))) {
lib/compiler/src/compiler/allowed_machine_views.cc
line 99 at r12 (raw file):
candidate_starts(machine_spec, device_type)) { for (std::vector<MachineSpecificationDimension> const &proj : candidate_projections(task)) {
FYI the whole projection vs machine-spec-dim is naming-wise a bit inconsistent and weird. Not a big issue as at least neither term is overloaded to also mean something else (you just have two words for one concept), but it can be a bit confusing at times
Code quote:
for (std::vector<MachineSpecificationDimension> const &proj :
candidate_projections(task)) {
lib/compiler/src/compiler/allowed_machine_views.cc
line 103 at r12 (raw file):
transform(zip(strides.raw_strides, proj), [&](auto const &p) { return MachineViewDimension{p.first, p.second}; });
I would add a function to MachineView
for constructing a MachineView
from a start and separate lists of strides
and projections
and then use it here to simplify this code a bit
Code quote:
std::vector<MachineViewDimension> dimensions =
transform(zip(strides.raw_strides, proj), [&](auto const &p) {
return MachineViewDimension{p.first, p.second};
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @lockshaw)
lib/compiler/src/compiler/allowed_machine_views.cc
line 67 at r6 (raw file):
Previously, lockshaw (Colin Unger) wrote…
No worries, the key thing is that the high-level flow is clear. As long as you name
get_candidate_strides
clearly so I know what it does I can just ignore the details of it and still get a good idea of how the overall code works. Breaking things into separate lambdas, naming things clearly, having solid intermediate types, and having a clear logical flow (generate candidate values, generate an overapproximation of the space, then filter) like you did here is great and much more important than low-level style.
great tip, will keep it in mind in the future
lib/compiler/src/compiler/allowed_machine_views.cc
line 27 at r12 (raw file):
OperatorTaskSpace const &task, MachineSpecification const &ms) { std::optional<MachineSpaceCoordinate> maximum_device_coords =
Done.
lib/compiler/src/compiler/allowed_machine_views.cc
line 36 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is this the right name?
changed to the correct one
lib/compiler/src/compiler/allowed_machine_views.cc
line 40 at r12 (raw file):
its possible for all * `MachineView`s to be invalid) */
Done.
lib/compiler/src/compiler/allowed_machine_views.cc
line 75 at r12 (raw file):
std::unordered_set<MachineSpaceCoordinate> result; for (int i : range(ms.num_nodes)) { for (int j : range(get_num_devices_per_node(ms, device_type))) {
Done.
lib/compiler/src/compiler/allowed_machine_views.cc
line 99 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
FYI the whole projection vs machine-spec-dim is naming-wise a bit inconsistent and weird. Not a big issue as at least neither term is overloaded to also mean something else (you just have two words for one concept), but it can be a bit confusing at times
agreed, replaced projection since it's not only unclear but a bit of a misnomer, since the operation we are doing is not really projection as much as it is flattening an N-dim space into a 2D one.
lib/compiler/src/compiler/allowed_machine_views.cc
line 103 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I would add a function to
MachineView
for constructing aMachineView
from a start and separate lists ofstrides
andprojections
and then use it here to simplify this code a bit
done
lib/pcg/include/pcg/machine_view_dimension.struct.toml
line 20 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Any particular reason to keep this as a
stride_t
rather than just dropping it to anint
? Not a big issue, just curious
no strong reason / preference, I don't mind it since it makes things a bit more explicit / clear. I can change it if needed
lib/pcg/src/pcg/machine_view.cc
line 33 at r12 (raw file):
static std::vector<int> get_projection_indices(MachineView const &mv,
Done.
lib/pcg/src/pcg/machine_view.cc
line 39 at r12 (raw file):
std::vector<MachineSpecificationDimension> projections = get_projections(mv); for (size_t i = 0; i < projections.size(); ++i) { if (projections[i] == dimension) {
Done.
lib/pcg/src/pcg/machine_view.cc
line 44 at r12 (raw file):
} return projection_indices; }
Done.
lib/pcg/src/pcg/machine_view.cc
line 46 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Try to aim for clearer function names even if they're static, especially if they take this number of parameters (in fact, arguably the sheer number of shared parameters imply that this should probably just be a lambda in
get_machine_space_coordinate
as it doesn't really seem to stand alone
yeah, put into a lambda, much clearer now
lib/pcg/src/pcg/machine_view.cc
line 62 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Code-style wise probably cleaner to generate
sizes
,coord_points
, andstrides
as three separatetransform
expressions rather than one big combined loop
done
lib/pcg/src/pcg/machine_view.cc
line 78 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
What about splitting this into a function that takes a
MachineView
and generates aDeduplicatedMachineView
(which has all of the strides multiplied so you can't produce redundant points), and then a corresponding function that takes aDeduplicatedMachineView
which then can do a much simpler computation to find theMachineSpaceCoordinate
. Thoughts?Overall one of the best way you can make code more readable is to introduce (sane and conceptually clear) intermediate types, and then break a complicated function that does
A -> D
intoA -> B -> C -> D
not sure I fully agree, what would DeduplicatedMachineView
represent exactly (the partial strides /degrees for each dimension)? I think the current logic is decently clear, at least at the top level. Cleaned up a bit, let me know if this is better
lib/pcg/src/pcg/machine_view.cc
line 96 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Is it possible that this should be an
std::unordered_multiset
, or isMachineView
defined such as to make this impossible?
each TaskSpace coordinate maps to a unique MachineSpaceCoordinate, so it's good.
lib/pcg/src/pcg/machine_view.cc
line 103 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Try not to abbreviate these variables too much or the code starts to become rather hard to skim as I keep having to chase down definitions unnecessarily
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r13, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Marsella8)
lib/pcg/include/pcg/start_invariant_machine_view.h
line 39 at r13 (raw file):
MachineSpecification const &ms); std::unordered_set<MachineSpaceCoordinate>
Seems a bit odd to return a MachineSpaceCoordinate
here rather than some other type like MachineSpaceOffset
or something to represent the fact that StartInvariantMachineView
doesn't actually have a start
lib/pcg/src/pcg/machine_view.cc
line 78 at r12 (raw file):
Previously, Marsella8 wrote…
not sure I fully agree, what would
DeduplicatedMachineView
represent exactly (the partial strides /degrees for each dimension)? I think the current logic is decently clear, at least at the top level. Cleaned up a bit, let me know if this is better
My understanding of how TaskSpaceCoordinate
-> MachineSpaceCoordinate
works is that for the most part it's a pretty simple calculation: given a machine view, you have machine_space_coord_x = task_space_coord_a * stride_a + start_x
. The complexity comes in when you start to have multiple task space dimensions projected to the same machine space dimension, so you end up with, e.g., machine_space_coord_x = task_space_coord_a * stride_a + task_space_coord_b * stride_b + start_x
, and then you have an issue where you can have two non-equal pairs (task_space_coord_a, task_space_coord_b)
and (task_space_coord_a', task_space_coord_b')
such that task_space_coord_a * stride_a + task_space_coord_b * stride_b == task_space_coord_a' * stride_a + task_space_coord_b' * stride_b
, leading to multiple TaskSpaceCoordinate
s mapping to the same device. Correct me if I'm wrong, but the way we fix this is by introducing an order within the task space dimensions that map to a single machine space dimension (e.g., [a, b]
here), and then we change the computation to task_space_coord_a * stride_a * (task_space_size_b * stride_b) + task_space_coord_b * stride_b + start_x
, which makes the mapping one-to-one. As the logic for computing this is more complex, one way to simplify things would be to argue that stride_a * task_space_size_b * stride_b
simply functions as a "normalized"/"deduplicated" stride stride_a'
, and then you could divide the computation into MachineView
-> NormalizedMachineView
, and then use the simpler computation for transforming a TaskSpaceCoordinate
into a MachineSpaceCoordinatethrough a
NormalizedMachineView`
The current code is reasonably nice so this isn't critical, just throwing it out there as a potential improvement as I don't think you'll be able to get the existing code much nicer via only style changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Marsella8)
lib/pcg/include/pcg/start_invariant_machine_view.h
line 39 at r13 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Seems a bit odd to return a
MachineSpaceCoordinate
here rather than some other type likeMachineSpaceOffset
or something to represent the fact thatStartInvariantMachineView
doesn't actually have astart
yeah agree, added new dtgen type.
lib/pcg/src/pcg/machine_view.cc
line 78 at r12 (raw file):
Previously, lockshaw (Colin Unger) wrote…
My understanding of how
TaskSpaceCoordinate
->MachineSpaceCoordinate
works is that for the most part it's a pretty simple calculation: given a machine view, you havemachine_space_coord_x = task_space_coord_a * stride_a + start_x
. The complexity comes in when you start to have multiple task space dimensions projected to the same machine space dimension, so you end up with, e.g.,machine_space_coord_x = task_space_coord_a * stride_a + task_space_coord_b * stride_b + start_x
, and then you have an issue where you can have two non-equal pairs(task_space_coord_a, task_space_coord_b)
and(task_space_coord_a', task_space_coord_b')
such thattask_space_coord_a * stride_a + task_space_coord_b * stride_b == task_space_coord_a' * stride_a + task_space_coord_b' * stride_b
, leading to multipleTaskSpaceCoordinate
s mapping to the same device. Correct me if I'm wrong, but the way we fix this is by introducing an order within the task space dimensions that map to a single machine space dimension (e.g.,[a, b]
here), and then we change the computation totask_space_coord_a * stride_a * (task_space_size_b * stride_b) + task_space_coord_b * stride_b + start_x
, which makes the mapping one-to-one. As the logic for computing this is more complex, one way to simplify things would be to argue thatstride_a * task_space_size_b * stride_b
simply functions as a "normalized"/"deduplicated" stridestride_a'
, and then you could divide the computation intoMachineView
->NormalizedMachineView
, and then use the simpler computation for transforming aTaskSpaceCoordinate
into a MachineSpaceCoordinatethrough a
NormalizedMachineView`The current code is reasonably nice so this isn't critical, just throwing it out there as a potential improvement as I don't think you'll be able to get the existing code much nicer via only style changes
aah ok got it know thanks! I think I'll leave it like this for now, but will take this into consideration when going over that CPU kernel tests PR you tagged me in to see if it makes sense to have a centralized set of functions for working with these indices transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r14, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @Marsella8)
…e-view' into pietro-machine-view
get_allowed_machine_views
MachineView
representation
Description of changes:
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is