Graph Library #1521

Marsella8 · 2024-10-10T04:25:16Z

Description of changes:

Extending testing functionality for graph library
Bug fixing
General clean up

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

This change is

codecov · 2024-10-10T04:54:59Z

Codecov Report

Attention: Patch coverage is 95.67010% with 42 lines in your changes missing coverage. Please review.

Please upload report for BASE (master@93298ed). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...h/series_parallel/series_parallel_decomposition.cc	66.66%	12 Missing ⚠️
.../utils/graph/instances/hashmap_undirected_graph.cc	37.50%	10 Missing ⚠️
lib/utils/include/utils/containers/zip.h	30.00%	7 Missing ⚠️
lib/utils/src/utils/graph/algorithms.cc	0.00%	3 Missing ⚠️
.../graph/instances/unordered_set_undirected_graph.cc	0.00%	2 Missing ⚠️
...rc/utils/graph/undirected/undirected_edge_query.cc	0.00%	2 Missing ⚠️
...raph/multidigraph/algorithms/get_incoming_edges.cc	88.88%	1 Missing ⚠️
...raph/multidigraph/algorithms/get_outgoing_edges.cc	88.88%	1 Missing ⚠️
.../utils/graph/series_parallel/parallel_reduction.cc	96.15%	1 Missing ⚠️
...rc/utils/graph/series_parallel/series_reduction.cc	97.43%	1 Missing ⚠️
... and 2 more

Additional details and impacted files

@@            Coverage Diff            @@
##             master    #1521   +/-   ##
=========================================
  Coverage          ?   79.78%           
=========================================
  Files             ?      880           
  Lines             ?    29673           
  Branches          ?      792           
=========================================
  Hits              ?    23676           
  Misses            ?     5997           
  Partials          ?        0

Flag	Coverage Δ
unittests	`79.78% <95.67%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
lib/utils/include/utils/containers/find.h	`100.00% <100.00%> (ø)`
...nclude/utils/graph/dataflow_graph/dataflow_graph.h	`44.44% <ø> (ø)`
lib/utils/include/utils/graph/digraph/digraph.h	`100.00% <ø> (ø)`
...b/utils/include/utils/graph/digraph/digraph_view.h	`100.00% <ø> (ø)`
...ls/include/utils/graph/multidigraph/multidigraph.h	`100.00% <ø> (ø)`
lib/utils/include/utils/graph/node/graph_view.h	`100.00% <ø> (ø)`
.../include/utils/graph/undirected/undirected_graph.h	`100.00% <ø> (ø)`
...ude/utils/graph/undirected/undirected_graph_view.h	`100.00% <ø> (ø)`
lib/utils/src/utils/graph/digraph/algorithms.cc	`100.00% <100.00%> (ø)`
...plete_bipartite_composite/get_cbc_decomposition.cc	`97.50% <100.00%> (ø)`
... and 33 more

…y-testing

Marsella8

Reviewable status: 0 of 41 files reviewed, 1 unresolved discussion (waiting on @lockshaw)

lib/utils/src/utils/graph/series_parallel/series_parallel_decomposition.cc line 82 at r2 (raw file):

}

bool is_empty(Node const &node) {

borrowed from #1562, will fix them there if needed

…eedup

…ting

lockshaw

Reviewed 23 of 32 files at r1, 12 of 16 files at r2, 7 of 9 files at r3, all commit messages.
Reviewable status: 42 of 47 files reviewed, 35 unresolved discussions (waiting on @Marsella8)

lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc line 10 at r3 (raw file):

using namespace FlexFlow;

TEST_CASE("get_connected_components") {

Suggestion:

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("get_connected_components") {

lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc line 41 at r3 (raw file):

  }

  SUBCASE("3 components") {

Also test the case where everything is one component

Also test the case where the graph is empty

lib/utils/test/src/utils/graph/digraph/algorithms/get_dominators.cc line 34 at r3 (raw file):

      std::unordered_set<Node> correct = {n.at(0)};
      CHECK(correct == result);
    }

Suggestion:

    SUBCASE("small acyclic graph") {
      std::vector<Node> n = add_nodes(g, 4);
      std::vector<DirectedEdge> e = {
          DirectedEdge{n.at(0), n.at(3)},
          DirectedEdge{n.at(0), n.at(1)},
          DirectedEdge{n.at(0), n.at(2)},
          DirectedEdge{n.at(1), n.at(2)},
      };
      add_edges(g, e);

      SUBCASE("get_dominators(..., Node const &)") {
        Node node = n.at(2);
        std::unordered_set<Node> correct = {n.at(0), n.at(2)};
        std::unordered_set<Node> result = get_dominators(g, node);
        CHECK(correct == result);
      }
  
      SUBCASE("get_dominators(..., std::unordered_set<Node> const &)") {
        std::unordered_set<Node> nodes = {n.at(1), n.at(3)};
        std::unordered_set<Node> result = get_dominators(g, nodes);
        std::unordered_set<Node> correct = {n.at(0)};
        CHECK(correct == result);
      }
    }

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 20 at r3 (raw file):

std::unordered_multiset<Node> get_nodes(Node const &);

bool is_empty(Node const &node);

How would you end up with an empty SP decomposition? That doesn't seem like it should be allowed (i.e., might be a good idea to change the container in the splits to be some wrapper type of std::vector that requires itself to be non-empty)

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 28 at r3 (raw file):

SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
                                        Node const &node);

How is this a well-behaved operation? Like, what are the semantics of this? Are you sure we want this function as part of the API for SeriesParallelDecomposition?

Code quote:

SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
                                        Node const &node);

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 30 at r3 (raw file):

                                        Node const &node);

// duplicate nodes within `sp` are counted multiple times

Doxygen style please 🙂

Code quote:

// duplicate nodes within `sp` are counted multiple times

lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h line 33 at r3 (raw file):

size_t num_nodes(SeriesParallelDecomposition const &sp);

SeriesParallelDecomposition serial_composition(

For consistency, never use "serial" in the FF codebase, always use "series"

Suggestion:

SeriesParallelDecomposition series_composition(

lib/utils/include/utils/graph/series_parallel/series_reduction.h line 19 at r3 (raw file):

std::unordered_set<std::vector<MultiDiEdge>>
    find_all_extended_series_reductions(MultiDiGraphView const &g);

What is an "extended series reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation

Also, it might be nice to create an actual datatype (e.g., ExtendedSeriesReduction) that is a wrapper type around std::vector<MultiDiEdge> rather than just using a raw vector.

Code quote:

    find_all_extended_series_reductions(MultiDiGraphView const &g);

lib/utils/test/src/utils/graph/undirected/undirected.cc line 15 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE_TEMPLATE(

Why are there two tests here?

lib/utils/test/src/utils/graph/undirected/undirected.cc line 18 at r3 (raw file):

      "UndirectedGraph implementations", T, HashmapUndirectedGraph) {

    RC_SUBCASE("Full", [&]() {

Why is this using rapidcheck?

lib/utils/src/utils/graph/views/views.cc line 2 at r3 (raw file):

#include "utils/graph/views/views.h"
#include "utils/bidict/algorithms/right_entries.h"

I'm not seeing `right_entries used anywhere here in new code. Was this already necessary and just missing, or can this be removed?

lib/utils/src/utils/graph/views/views.cc line 70 at r3 (raw file):

UndirectedEdge to_undirected_edge(DirectedEdge const &e) {
  return UndirectedEdge{{e.src, e.dst}};

Does commutative_pair not have an explicit constructor?

Code quote:

{e.src, e.dst}

lib/utils/src/utils/graph/views/views.cc line 122 at r3 (raw file):

  std::unordered_set<UndirectedEdge> undirected_edges =
      set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
                g.query_edges(UndirectedEdgeQuery{q.dsts}));

Can't you just union the query_sets (i.e., q.srcs and q.dsts) and then do the query, rather than unioning the query results?

Code quote:

  std::unordered_set<UndirectedEdge> undirected_edges =
      set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
                g.query_edges(UndirectedEdgeQuery{q.dsts}));

lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc line 12 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("get_unchecked_dfs_ordering") {

IIRC the semantics of unchecked_dfs_ordering are essentially that it is free to visit nodes multiple times (i.e., in a diamond graph of 4 nodes (a -> b, a -> c, b -> d, c -> d), it would either yield [a, b, d, c, d] or [a, c, d, b, d]--it would be good to add a testcase for this

lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc line 24 at r3 (raw file):

      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
      CHECK(correct == result);
    }

Suggestion:

    SUBCASE("simple path") {
      std::vector<Node> correct = {n[0], n[1], n[2], n[3]};
      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
      CHECK(correct == result);
    }
    
    SUBCASE("start from non-initial node") {
      std::vector<Node> correct = {n[1], n[2], n[3]};
      std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[1]});
      CHECK(correct == result);
    }

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 11 at r3 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  TEST_CASE("DiGraph - algorithms.cc") {

Prefer one TEST_CASE per function, rather than one SUBCASE per function

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 16 at r3 (raw file):

    std::vector<Node> n = add_nodes(g, 4);
    std::vector<DirectedEdge> e = {
        DirectedEdge{n[0], n[1]},

Prefer .at for bounds checking

Suggestion:

        DirectedEdge{n.at(0), n.at(1)},

lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc line 55 at r3 (raw file):

    }

    SUBCASE("get_sinks") {

Potentially rename to get_terminal_nodes/get_initial_nodes following https://reviewable.io/reviews/flexflow/flexflow-train/1565#-OGCWYwtE_rcz9e1WizH

lib/utils/test/src/utils/containers/contains.cc line 19 at r3 (raw file):

      std::unordered_set<int> s = {1, 2, 3, 4, 5};
      CHECK(contains(s, 3));
      CHECK(!contains(s, 6));

Doesn't doctest require this to use CHECK_FALSE? iirc doctest has an issue with CHECK(!...)

Code quote:

      CHECK(!contains(s, 6));

lib/utils/test/src/utils/graph/views/views.cc line 49 at r3 (raw file):

      CHECK(result == expected);
    }
  }

Probably the best way to test these (due to the complexity of the querying interface) would be to use rapidcheck to generate a bunch of operations and then ensure the behaviors are the same between the view and the non-view graph that would be generated. This is going to be annoying to write, but without restructuring things this is probably the correct approach. The good thing is that the equivalence testing code only needs to be written once per graphview type and then can be reused for each of the views.

Code quote:

  TEST_CASE("UndirectedSubgraphView") {
    UndirectedGraph g = UndirectedGraph::create<HashmapUndirectedGraph>();
    std::vector<Node> n = add_nodes(g, 5);
    add_edges(g,
              {UndirectedEdge{{n.at(0), n.at(3)}},
               UndirectedEdge{{n.at(1), n.at(1)}},
               UndirectedEdge{{n.at(1), n.at(2)}},
               UndirectedEdge{{n.at(1), n.at(3)}},
               UndirectedEdge{{n.at(2), n.at(3)}},
               UndirectedEdge{{n.at(2), n.at(4)}}});
    std::unordered_set<Node> sub_nodes = {n.at(0), n.at(1), n.at(3)};
    UndirectedGraphView view = view_subgraph(g, sub_nodes);

    SUBCASE("get_nodes") {
      std::unordered_set<Node> expected = {n.at(0), n.at(1), n.at(3)};

      std::unordered_set<Node> result = get_nodes(view);

      CHECK(result == expected);
    }

    SUBCASE("get_edges") {
      std::unordered_set<UndirectedEdge> expected = {
          UndirectedEdge{{n.at(0), n.at(3)}},
          UndirectedEdge{{n.at(1), n.at(1)}},
          UndirectedEdge{{n.at(1), n.at(3)}},
      };

      std::unordered_set<UndirectedEdge> result = get_edges(view);

      CHECK(result == expected);
    }
  }

lib/utils/include/utils/graph/series_parallel/parallel_reduction.h line 15 at r3 (raw file):

    find_parallel_reduction(MultiDiGraphView const &);

std::unordered_map<DirectedEdge, std::unordered_set<MultiDiEdge>>

What is an "extended parallel reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation

Also, it might be nice to create an actual datatype (e.g., ExtendedParallelReduction) that is a wrapper type around std::unordered_set<MultiDiEdge> rather than just using a raw unordered set

If this is just for accelerating the SP decomposition algorithm's rewriting step, it might be better to put these functions (and the corresponding ones for "extended series reductions" in that file rather than in the general "parallel_reduction.h" header)

lib/utils/test/src/utils/graph/multidigraph/algorithms/get_outgoing_edges.cc line 35 at r3 (raw file):

    SUBCASE("node has no outgoing edges") {
      std::unordered_set<MultiDiEdge> result = get_outgoing_edges(g, n.at(2));

Would be nice for this node to have some incoming edges to ensure they're not accidentally picked up

lib/utils/test/src/utils/graph/multidigraph/algorithms/get_incoming_edges.cc line 31 at r3 (raw file):

    SUBCASE("node has no incoming edges") {
      std::unordered_set<MultiDiEdge> result = get_incoming_edges(g, n.at(2));

Would be nice for this node to have some outgoing edges to ensure they're not accidentally picked up

lib/utils/include/utils/graph/node/node.struct.toml line 9 at r3 (raw file):

  "fmt",
  "json",
  "rapidcheck",

How are you using a default Arbitrary instance for Node? I have no issue adding generators for Node, but the default instance seems unlikely to be helpful as Node is only meaningful in the context of a graph?

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 263 at r3 (raw file):

          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {
          {e[0], e[1], e[2]}};

Prefer .at for bounds checking

Suggestion:

          {e.at(0), e.at(1), e.at(2)}};

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 267 at r3 (raw file):

    }

    SUBCASE("2 linear strands") {

Suggestion:

    SUBCASE("2 linear strands that share a common terminal node") {

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 277 at r3 (raw file):

      std::unordered_set<std::vector<MultiDiEdge>> result =
          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {{e[0], e[2]},

Prefer .at for bounds checking

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 300 at r3 (raw file):

          find_all_extended_series_reductions(g);
      std::unordered_set<std::vector<MultiDiEdge>> correct = {
          {e[0], e[2], e[7]}, {e[3], e[6]}, {e[5], e[9]}};

Prefer .at for bounds checking

lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc line 372 at r3 (raw file):

        std::unordered_set<MultiDiEdge> result_edges = get_edges(g);
        std::unordered_set<MultiDiEdge> correct_edges = [&] {
          std::unordered_set<MultiDiEdge> new_edges = unordered_set_of(e);

Why not just set_minus here?

lib/utils/include/utils/graph/multidigraph/algorithms/get_outgoing_edges.h line 12 at r3 (raw file):

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_outgoing_edges(MultiDiGraphView const &g);

Suggestion:

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_outgoing_edges_map(MultiDiGraphView const &g);

lib/utils/include/utils/graph/multidigraph/algorithms/get_incoming_edges.h line 12 at r3 (raw file):

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_incoming_edges(MultiDiGraphView const &g);

Suggestion:

std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
    get_incoming_edges_map(MultiDiGraphView const &g);

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 27 at r3 (raw file):

      DirectedEdgeQuery result = directed_edge_query_all();

      CHECK(matches_edge(result, DirectedEdge{n.at(0), n.at(1)}));

Do you really need 5 checks here? It feels like the behavior of directed_edge_query_all could have been tested on a much smaller graph (or even without a graph at all, using rapidcheck). This feels like unnecessary reusing of g, which is why I recommend doing one TEST_CASE per function instead of one SUBCASE per function unless you have very good reason.

In fact, I don't see how g is necessary for any of the cases in this file?

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 38 at r3 (raw file):

          DirectedEdgeQuery{query_set{n.at(0)}, query_set{n.at(1)}};

      CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));

Check that it doesn't match reversed edges

Code quote:

      CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));

lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc line 57 at r3 (raw file):

        CHECK(result == correct);
      }
      SUBCASE("intersection with std::nullopt") {

Suggestion:

      SUBCASE("intersection with matchall") {

lib/utils/src/utils/graph/series_parallel/series_reduction.cc line 79 at r3 (raw file):

  }
  return unordered_set_of(values(strands));
}

Newlines please--this is really hard to read

Code quote:

std::unordered_set<std::vector<MultiDiEdge>>
    find_all_extended_series_reductions(MultiDiGraphView const &g) {
  std::unordered_map<Node, std::unordered_set<MultiDiEdge>> incoming_edges =
      get_incoming_edges(g);
  std::unordered_map<Node, std::unordered_set<MultiDiEdge>> outgoing_edges =
      get_outgoing_edges(g);
  std::unordered_map<Node, std::vector<MultiDiEdge>> strands;
  std::unordered_map<Node, Node> node_to_head_of_strand;
  for (Node const &n : get_topological_ordering(g)) {
    if ((incoming_edges.at(n).size() == 1) &&
        (outgoing_edges.at(n).size() == 1)) {
      MultiDiEdge incoming = get_only(incoming_edges.at(n));
      MultiDiEdge outgoing = get_only(outgoing_edges.at(n));
      Node pre = g.get_multidiedge_src(incoming);
      if (contains_key(node_to_head_of_strand, pre)) {
        Node head = node_to_head_of_strand.at(pre);
        node_to_head_of_strand.emplace(n, head);
        strands.at(head).push_back(outgoing);
      } else {
        node_to_head_of_strand.emplace(n, n);
        strands[n].push_back(incoming);
        strands[n].push_back(outgoing);
      }
    }
  }
  return unordered_set_of(values(strands));
}

Marsella8

Reviewable status: 42 of 47 files reviewed, 35 unresolved discussions (waiting on @lockshaw)

lib/utils/include/utils/graph/node/node.struct.toml line 9 at r3 (raw file):