Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change OpCostMetrics.memory to be a nonnegative_int (Issue 1584) #1585

Closed
wants to merge 6 commits into from

Conversation

victorli2002
Copy link
Collaborator

@victorli2002 victorli2002 commented Jan 21, 2025

  • More refactoring
  • Further restructuring
  • Pin spdlog version
  • Add framework for quickcheck-style testing
  • Operator params refactoring
  • Operator and parallel dim mapping record refactoring
  • More operator params fixing
  • More param refactoring
  • Fix some includes
  • Add operator params interface
  • Fix up logging, start simplifying graph library
  • Add framework for testing simplified graph library
  • Fix operator includes
  • Continued testing and graph library
  • Fix graph tests
  • Fix include
  • [ElementUnary] Kernel refactor
  • Add dfs traverals and is_acyclic
  • Add additional acyclic subcase
  • [MultiHeadAttention] Kernels
  • More simplified graph library
  • Update spdlog and rapidcheck deps to use https address to avoid ssh permission issue.
  • Move kernel functions and params
  • Fix compile error in operator_params
  • Fix namespaces
  • setup bin
  • Add visit_struct
  • Add substitutions lib and fix cmake
  • Add initial draft of sp decomposition
  • Start refactoring substitutions
  • Add first draft of substitution logic
  • Continue drafting unity search algorithm
  • Remove accidental fmt submodule
  • Do some rearranging in ffr
  • Remaining params -> attrs for clarity
  • Additional cleaning and reordering
  • Fix some cmake issues
  • Fix includes and update based on renaming
  • Move kernel fns
  • ADd cuda flag
  • Add FF_USE_* macro definitions
  • Add initial draft of task_specs
  • Add imaginary accessor interface
  • agg spec kernels
  • aggregate kernels
  • Layer norm kernels
  • reduce kernels
  • reverse kernels
  • Add most of implementation of task specs
  • Enable CMake to generate position independent code (-fPIC). This is needed to link spdlog.a to the shared libraries generated by FlexFlow.
  • topk kernels
  • Try out a bunch of likely-unnecessary template metaprogramming
  • Rename op-impl to kernels
  • Fix up batchnorm kernels
  • Batch norm implementation kernel calls
  • Remove cache operator
  • Fix kernel includes
  • Dropout kernels [placeholder mod]
  • Fix dropout
  • Move cuda and hip helpers, fix elem unary
  • Work on de-legioning kernels and some renaming
  • Add cast dispatch mechanism
  • Update combine to use dt dispatch
  • Partial concat updates
  • Work on v2 of taskspec interface
  • Reenable compiler files
  • ofix p-meta -> op-attrs in include paaths
  • Further task_spec and serialization work
  • Use task_spec tensor acc for aggregate_spec
  • Work on implementing OpTaskArgumentAccessor
  • fix some incomplete renaming
  • Implement tensor accessing
  • further renaming
  • Implement argument access
  • Remove kernel wrapper functions
  • Further task spec refactoring
  • Runtime reorganization
  • OpMeta fixes
  • Fix some include not founds
  • Rename visitable
  • De-virtualize op-attrs
  • Large number of runtime build error fixes and other improvements
  • Further model.cc refactoring
  • Create even more compile errors
  • draft unity search
  • DataTypeDispatch for all kernels
  • draft graph split in unity algorithm
  • Start refactoring modelspec
  • Add mutability back to labelled open multi di graph
  • Add fmt, expected, continue ModelSpec
  • Make DeviceID a strong_typedef
  • Fix a bunch of virtual issues
  • Add a bunch of static_asserts
  • Refactor visitable cmp
  • Fused and fused parallel
  • minor fix
  • record best strategy in unity search
  • Add tensordims, add graphview infra
  • fix INodeLabelledMultiDiGraph polymorphic copy compliant
  • remove dup formatter templates for for MultiDi{Input,Output}
  • unsafe_view_as_flipped take IDiGraphView
  • add static asserts for abstract struct IUndirectedGraphView
  • add UndirectedGraphView
  • update CXX_STANDARD to 17 for utils
  • add update to header file to match change to impl
  • Update some function signatures
  • Merge commit "Add multi-objective global memory search algorithm"
  • Move the file locations that couldn't be done when merging the commits
  • Fix utils enough to get to linking stage
  • add strategy cache
  • TSR BatchMatmul, fixes for aggregate, MHA
  • Implement undirected get_subgraph
  • TaskSpec refactor (1/n)
  • non-op task spec and initializer refactoring
  • cmore implementation
  • Start on refactoring ops
  • More movement toward final op and task_spec interfaces
  • More op_task_spec -> task_spec resolution
  • adapt unity algorithm for new graph interfaces & some unit tests for open graphs
  • add some labelled graph interfaces & unit tests for labelled graphs
  • Optimizer and taskspec improvements
  • Dropout with new task spec, tests
  • Start to implement high-level execution coordination
  • Add SimTaskBinding interface to ease writing measure_operator_cost
  • unit tests for optimal_cost
  • Various fixes moving towards invocation interpretation
  • Fixes to tests, agg/agg_spec
  • Renaming and continued task invocation work
  • Rapidcheck for serialization
  • OpTaskSpec interface
  • Split up task invocation compilation logic
  • add top-level search
  • minor upd
  • split graph utils apart
  • rapidcheck fix, among others
  • Continue pulling task invocations together at the top level
  • Start sketching out high level flow and c ffi
  • More ffi work
  • split unity_algorithm into multiple files & minor fix
  • add operator method for MultiDiGraphView, MultiDiGraph
  • add method get_outgoing_edges for DiGraphView and MultiDiGraphView
  • add implement for get_sinks
  • add the method that need to implement in PR-README.md
  • fix the AdjacencyMultiDiGraph::AdjacencyMultiDiGraph
  • finish DirectedEdgeQuery query_intersection method
  • Start to define bindings and serialization
  • add query_intersection for DirectedEdgeQuery
  • finish 24 methods
  • add constructor function for class OutputMultiDiEdge, InputMultiDiEdge, AdjacencyMultiDiGraph, MultiDiInput
  • implement maybe_owned_ref:: aybe_owned_ref(T* ptr)
  • add maybe_owned_ref::maybe_owned_ref(std::shared_ptr ptr)
  • Task spec cleanup
  • Add task_spec readme
  • add num_nodes for GraphView
  • Add argument types to task_spec readme
  • Attempt to fix lib README mermaid
  • add query_edges for MultiDiGraphView, UndirectedGraphView
  • Add cow capabilities for MultiDiGraph
  • fix the comments and add cow_ptr_t
  • fix the comments and ignore the cow_ptr_t
  • fix the code format
  • Update build instructions for repo-refactor
  • Format
  • Prevent format.sh from formatting triton/
  • fix the cow_ptr_t
  • Add first part of graph readme and refactor multidigraph indices
  • Revert "fix the cow_ptr_t"
  • Fix fa glyphs in graph/README.md
  • Fix formatting issue
  • Add cow_ptr_t fix along the lines of @lambda7xx add cow_ptr_t.h #758
  • Add example graph->view coercion
  • Simplify and remove some unecessary copies from cow_ptr_t
  • Fix json and some other pcg bugs
  • Find good middle path for constructor creation
  • Cleanup new visitable aggregate interface
  • Move most of op-attrs over to new visitable interface
  • Add fixes and refactoring for computation graph json serialization
  • Start on internals diagram
  • Point internals diagram link to raw svg
  • Label Chart Elements (Label Chart Elements #828)
  • Add dynamo to diagram
  • Add participant reminders in tracing diagram
  • Add code snippet to pytorch diagram
  • Update v1/parallel_tensor.h to new visitable interface
  • Start working on serialization
  • Attempt fix of the cuda_fp16.h build issue (Fix cuda_fp16.h not found issue #834)
  • Move views.h over to non-virtual interface
  • fix the inplace_sorted_by in container.h
  • Add temporary workaround for nccl/cudnn/cuda build on sapling (Fix/sapling nccl hack #838)
  • start to implement the contract revelant
  • finish other method except unsafe related
  • add implement for the unsafe_create
  • A bunch of runtime build fixes after task_spec move
  • add test for algorithm
  • Further runtime-build-focused fixes
  • Update some visitable and typedefs
  • Formatting
  • Update formatting script
  • Formatting
  • add test for algorithm
  • add test for test_stack_map.cc
  • fix bidict and add test for bidict
  • add test for stack_vector and find a bug in stack_vector emplace_back method
  • add test for stack_string
  • add test for test_disjoint_set and fix the bug of test_disjoint_set
  • refine the code according to the PR
  • refine the PR
  • fix the doctest
  • refine the code
  • fix the bug
  • modify the get_imm_post_dominator
  • fix the lib/CMakeLists.txt
  • refine the get_weakly_connected_components
  • modfiy the lib/CMakeLists.txt
  • add test for containers and fix bug for containers.h
  • Fix container.h (Fix container.h #850)
  • refine the utils
  • complete get_subgraph for LabelledOpenMultiDiGraph & minor fix
  • use new pcg interfaces
  • remove duplidated subgraph view
  • fix the get_mutable
  • implement the add_nodes(DiGraph)
  • use DiGraph::create() to replacee AdjacncyDiGraph
  • use DiGraphView to implement the DiGraph::query_nodes
  • use MultiDiGraph in test
  • remove std::hash for JoinNodeKey
  • use the fmt
  • use the optional to replace tl::optional in algorithm.cc
  • Flesh out the various wrapper and interface types for utils/graph
  • Formatting
  • refine the code
  • format the code
  • merge the conflict
  • format
  • need to implement the query_keys and query_values in query_set.h
  • implement the query_values and query_keys for std::unordered_map and bidict fix the bug of containers
  • format the code
  • fix the bug of query_set.h
  • add test_adjacency_multidigraph.cc
  • add algorithm test and format the code
  • remove the unuseful
  • leave the ViewOpenMultiDiGraphAsMultiDiGraph::query_edges
  • implement the ViewOpenMultiDiGraphAsMultiDiGraph::query_edges
  • format the code
  • Fix some errors in compiler after merge
  • add test for flatmap
  • fix the bug of containers
  • fix some bug of tuple and leave tuple_slice_t, tuple_head_t, tuple_tail_t and get function with invalid index to test in the future
  • add test for record_format
  • Cleanup new required implementation
  • Format
  • Generalize Containers.h (Generalize Containers.h #862)
  • fix the bug in algorithm.cc and add test for method like get_imm_dominators, get_dominators, get_neighbors, get_sinks, get_bfs, get_predecessors
  • format the code
  • add test for test_dot_file
  • add test for deduplicated_priority_queue.h
  • add test for random_utils.cc
  • add unit tests for machine mapping and dp algorithm
  • add unit test for unity algorithm
  • fix compile errors from filter and support_interator_tag
  • fix compile errors from filter and support_interator_tag (fix compile errors from filter and support_interator_tag #877)
  • minor fixes for compiler
  • Algorithms things
  • use the unsafe_create
  • try to fix the undirected
  • make UndirectedGraphView(std::shared_ptr ptr) private
  • change the return type of get_neightbors
  • use for loop in add_nodes
  • remove friend in MultiDiGraphView
  • remove friend and format the code
  • format the code and fix the public
  • use filter in views
  • do not use loop in test_algorithms
  • use add_nodes, add_edges in MultiDiGraph
  • use add_nodes in DiGraph
  • modify the test
  • add comments for unsafe_create
  • refine the comment for unsafe_create
  • fix the changes and remove operator==
  • remove the add_nodes and add_edges in DiGraph
  • remove with_src_node
  • add get_neightbors for UndirectedGraphView
  • have some bug for get_connected_components(UndirectedGraphView const & g)
  • add test for weakly_connect_components
  • add test for test_seq
  • remove tl::nullopt
  • remove comment
  • remove the filter_keys for bidict
  • have some problem in JoinNodeKey
  • remove the t1::nullopt
  • update
  • use includes to replace allowd_values
  • fix the cmake
  • fix test_algorithms
  • remove the comment
  • do not use declaration
  • remove the cmake
  • add struct should_only_be_used_internally_tag_t
  • finish the graphview
  • remove unsafe_create
  • fix the cmake
  • check the whole std::unordered_map in get_imm_dominators
  • refine the comment for unsafe_create
  • fix the bug in traversal.cc(udi &udi::operator++())
  • fix the cmake
  • try to fix the disjoint_set
  • fix the test_disjoint_set.cc
  • try to fix the stack_vector
  • fix the tuple.h
  • use subcase in test_bidict
  • add begin/end for bidict
  • fix the test_container
  • fix the tet_container
  • check the whole container
  • fix the test_container
  • refine the test_container
  • refine the test_dot_file.cc
  • adjust the test_dot_file
  • clean up generator codes and minor fix
  • format
  • format
  • Remove fmt submodule in lib (Remove fmt submodule in lib #899)
  • use check_eq in test_disjoint_set.cc
  • use check to replace chck_eq in the test_stack_vector
  • use check to replace the check_eq
  • optimize the test_random_utils.cc
  • refine the stack_map
  • refine the test_tuple
  • add comments for test_variant
  • fix the undirect.h
  • add internal_only_tag.h
  • fix the undirect.h
  • Clang format 16 (Clang format 16 #911)
  • use unsafe_create_without_ownerhip to replace unsafe_create
  • use req in struct JoinNodeKey
  • fix the struct JoinNodeKey
  • add UndirectedGraphView as_undirected(MultiDiGraphView const &)
  • refine the get_imm_post_dominator
  • refine the comment for unsafe_create_without_ownership
  • fix the UndirectedGraph::operator UndirectedGraphView
  • refine the test
  • has some bug about the get_neighbors
  • convert DiGraph to Graph
  • remove the unsafe
  • refine the node
  • refine the query_intersection
  • refine the query_intersection
  • refine the glgorith,s
  • try to debug
  • add more test for tuple
  • add more test for stack_string
  • add test for variant
  • Ci update (Ci update #898)
  • Add visitable_formatter
  • Add FF_VISIT_FMTABLE
  • Update formatting to match repo-refactor
  • Misc post-merge fixes
  • serial parallel composition
  • remove commited out codes
  • view MultiDiGraph as labelled
  • make machine mapping immutable
  • minor fix & format
  • move general codes into proper places
  • format
  • minor fix & format
  • Build fixes
  • Update test to use node ports
  • minor fix
  • Undo changes to lib/CMakeLists.txt
  • Add exception throw to value_all
  • add unit tests for compiler (add unit tests for compiler #870)
  • Move over to GraphInternal
  • Add GraphInternals implementation
  • Many miscellaneous fixes to utils
  • update substitutions to align with latest changes
  • format
  • Add first draft of docs for visitable (Add first draft of docs for visitable #890)
  • Change constructor to not be protected
  • draft substitutions
  • format
  • Build Attention and Aggregate Spec (Build Attention and Aggregate Spec #886)
  • Fix AdjacencyMultiDiGraph default construction error (Fix AdjacencyMultiDiGraph default construction error #1005)
  • Remove outdated runtime files (Remove outdated runtime files #1006)
  • Delete include (Delete include #1007)
  • further draft substitution
  • format
  • Added missing iterator include (Added missing iterator include #1015)
  • Rename DeviceSpecificArg to DeviceSpecific (Rename DeviceSpecificArg to DeviceSpecific #1009)
  • Per lib build checks (Per lib build checks #1022)
  • minor fix
  • Fix test errors in utils
  • Format
  • Add missing constructor
  • refactor the pattern graph to be OutputLabelledOpenMultiDiGraph
  • format
  • minor fix
  • Small set of graph fixes
  • updates
  • Address lambda comments
  • Format
  • Fix strange nccl build issue from -w flag
  • readme for substitutions
  • fix the cmake
  • format
  • format
  • check substitution validity
  • Fix fmt bugs
  • Revert lambda lib/CMakeLists.txt changes
  • initialize tests for substitutions
  • Fix bug in pcg build
  • Bump c++ version to 17 (Update cmake to use C++17 #1067)
  • fix
  • format
  • remove output tensor computation
  • start to implement the softmax_kernel
  • add API inferface
  • softmax kernels version0.1
  • combine
  • concat
  • add empty method
  • copy some old code and implement topK version 0.1
  • add API method
  • transpose version0.1
  • start to do the Repartition
  • modify the backward and forward
  • start to implement the init
  • start measure_operator_cost
  • partition version 0.1
  • implement get_operator_attrs
  • get parallel operator attributes & minor fix
  • format
  • concat
  • combine
  • format
  • Conv 2D Op (Conv 2D Op #1112)
  • match open graphs
  • conv2d typo (Conv2D Typo #1135)
  • cuda
  • format
  • format and cuda
  • delete comments
  • format and fix cc
  • format
  • minor fix
  • Implement substitutions (Implement substitutions #1011)
  • update the softmax
  • Serialize jobs in per-lib-checks workflow (Serialize jobs in per-lib-checks workflow #1149)
  • update the topk
  • update the transpose
  • add update and leave task to implement
  • update the reduce
  • use exceptions
  • use exceptions
  • use exceptions in partition.cc
  • fix signatures and bind_arg
  • fix
  • fix type error
  • refine the softmax by binding new thing in init
  • leave the index
  • format the code
  • format repo-refactor (Format Repo Refactor #1168)
  • combine
  • finish concat
  • concat
  • add delete
  • conv2d typo fix
  • finish element_binary
  • element_binary
  • conv2d
  • concat
  • concat
  • finish concat
  • Fix signature return
  • Comment CHECK_FMTABLE
  • Add signature for other ops
  • Repo refactor ci (Repo refactor ci  #1083)
  • Call fwd sig
  • Remove namespace std
  • Format
  • Fix signature
  • Dropout Op (Dropout Op #1134)
  • Flat Operator (Flat Operator #1137)
  • Batch Norm Op (Batch Norm Op #1110)
  • fix the init_task
  • add topk
  • add allocator to allocate memory for index_ptr
  • remove the old comment
  • fix the input_tensor
  • start to update the kernel
  • fix the transpose_kernels cu
  • fix the transpose op
  • subsitutions build
  • fmt
  • fmt
  • fmt
  • fix issues caused by merge
  • Purge MOE operators (Purge MOE operators #1177)
  • Cast Op (Cast Op #1111)
  • Update lib/runtime/src/ops/softmax.cc
  • Update lib/runtime/src/ops/softmax.cc
  • Update lib/runtime/src/ops/softmax.cc
  • fix
  • fix the error
  • fix the kernel
  • fix the topk error and add indeices
  • fix the error
  • format the code
  • format the code
  • Split OP ( Split OP #1107)
  • fix the typo
  • fix the typo
  • fix the typo
  • fix the semi
  • fix the format
  • Replicate OP (Replicate OP  #1101)
  • implement some missing functions
  • format
  • Reshape OP ( Reshape OP #1100)
  • Reverse OP (Reverse OP #1105)
  • Pool2D OP (Pool2D OP #1182)
  • Reduce OP (Reduce OP #1118)
  • Reduction OP (Reduction OP #1120)
  • Update submodule (Update fmt submodule #1212)
  • substitutions tests pass
  • fmt
  • Batch Matmul Op (Batch Matmul Op #1023)
  • improve at for OutputLabelledOpenMultiDiGraph
  • graph get_ptr fix
  • fmt
  • Embedding (Embedding Operator #1256)
  • update fmt
  • Hip kernel fix (Hip kernel fix #1178)
  • remove unnecessary virtual
  • format
  • linear operator (linear operator #1180)
  • LayerNorm OP draft (LayerNorm OP draft #1186)
  • Element Unary Op (Element Unary Op #1257)
  • Remove unnecessary dependencies and allow using external installs (Remove unnecessary dependencies and allow using external installs #1321)
  • Re-merge Finish implementing compiler #1229 (Re-merge #1229 #1346)
  • Add external tl-expected via nix, add proj via flake instead of submodule (Add option to use external tl::expected #1347)
  • Commit proj toml and update proj version (Commit proj toml and update proj version #1350)
  • Kernel build (Kernel build #1366)
  • Code Coverage Support (Code Coverage Support #1380)
  • Resurrect substitution-to-dot (Resurrect substitution-to-dot #1351)
  • Hip Refactor (Hip Refactor #1359)
  • Computation Graph and Builder (Computation Graph and Builder #1388)
  • Local allocator (Local allocator #1386)
  • Code Coverage Support (Code Coverage Support #1396)
  • Hip Refactor for optimizer, partition and pool (Hip Refactor for optimizer, partition and pool #1379)
  • Graph Documentation (Graph Documentation #1391)
  • Local Execution: Op refactor (Local Execution: Op refactor #1389)
  • Hip Refactor for element_binary_kernels, unary kernels, and embedding kernels (Hip Refactor for element_binary_kernels, unary kernels, and embedding kernels #1369)
  • PCG serialization, rapidcheck, dtgen, and shape inference (PCG serialization, rapidcheck, dtgen, and shape inference #1394)
  • filtering out dtgen related files in code coverage report (filtering out dtgen related files in code coverage report  #1406)
  • refactor for softmax, split, topk, transpose (refactor for softmax, split, topk, transpose #1404)
  • Hip refactor for loss, dropout, flat, and gather ( Hip refactor for loss, dropout, flat, and gather #1373)
  • hip refactor for reduce, reduction, replicate, reshape and reverse (hip refactor for reduce, reduction, replicate, reshape and reverse #1403)
  • Hip refactor for attention, batch, combine, cast, conv (Hip refactor for attention, batch, combine, cast, conv #1402)
  • Local backing (Local backing #1400)
  • Add ParallelComputationGraphBuilder (Add ParallelComputationGraphBuilder #1411)
  • Local Cost Estimator (Local Cost Estimator #1410)
  • Run proj dtgen in CI (Run proj dtgen in CI #1424)
  • Add unit tests for subset of kernels (Add unit tests for subset of kernels #1384)
  • Implementations for methods for machine_views and associated modules (Implementations for methods for machine_views and associated modules  #1429)
  • Add DataflowGraph and fix part of substitutions (Add DataflowGraph and fix part of substitutions #1449)
  • Implement get_allowed_machine_views (Implement get_allowed_machine_views #1455)
  • add compiler to CI (Add compiler to CI #1457)
  • Local execution tests (Local execution tests #1418)
  • Add Transformer Model PCG (Add Transformer Model PCG #1453)
  • Re-enable substitutions (Re-enable substitutions #1471)
  • Add tool for exporting and visualizing model architectures and SP decompositions (Add tool for exporting and visualizing model architectures and SP decompositions #1490)
  • Add interface for differentiating inputs and weights in CG & PCG (Add interface for differentiating inputs and weights in CG & PCG #1493)
  • Add Inception-v3 model (Add Inception-v3 model #1495)
  • Add Candle Uno Model PCG (Add Candle Uno Model CG #1479)
  • Add BERT model computation graph (Add BERT model computation graph #1488)
  • Unity device mapping algorithm (Unity device mapping algorithm #1459)
  • Fix concretize_abstract_tensor_set_movement (Fix concretize_abstract_tensor_set_movement #1519)
  • Unordered StridedRectangle, get_allowed_machine_views (New MachineView representation #1458)
  • Utils: Refactor and Test Updates (Utils: Refactor and Test Updates #1464)
  • Changed ff_dim to ff_dim_t, added in nonnegative_int type
  • Add runs-on repo config (Add runs-on repo config #1559)
  • Update README to reference flexflow-serve (Update README to reference flexflow-serve #1560)
  • added recurse_n (added recurse_n #1563)
  • Memory optimization algorithm (Memory optimization algorithm #1523)
  • Enable cross-directory ccache (Enable cross-directory ccache #1539)
  • Changed ff_dim_t to use nonnegative_int, added relative_ff_dim_t that uses int
  • Adding value_type and ordered_value_type to some files
  • Temporarily remove stack_contents
  • Format

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@victorli2002 victorli2002 deleted the nn_int branch January 22, 2025 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants