Proper handling of repeated fp16 conversion. #310
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
When converting a model to fp16, PR #293 changes it such that we raise an error if the model already contains
Cast
nodes or fp16 types, while still allowing the user to force the conversion. This means that a user could repeatedly convert a model to fp16. However, the converter currently exhibits unexpected behaviors while handling repeated fp16 conversions, including creating orphaned and self-recurring nodes (see issue #276). This PR fixes the converter behavior when handling repeated fp16 conversions.This PR is the combination of the following 3 fixes. I'm putting the 3 fixes in one PR to not lose context about the solution to issue #276. For readability in git blame, I'm happy to break it up into 3 separate PRs.
Fix 1: When the converter inserts
Cast
nodes, use UUID for naming.Previously, we use a fixed string format to name the
Cast
nodes inserted by the converter, which results in name collision if we repeatedly run the converter. The converter currently does not handle name collisions at all, resulting in unexpected behavior (this is a separate issue that could be fixed).This PR changes to using UUIDs when naming inserted
Cast
nodes to avoid name collision.Fix 2: Add
Cast
output to value info block list.The converter has a notion of "value info block list"; the converter changes every
value_info
in the input graph to fp16, except for those in the value info block list. Currently, we only add the value info from the output of "op block list" or "node block list" (i.e. ops and nodes that do not go through the fp16 conversion). However, when a graph containsCast
nodes, the output ofCast
nodes cannot change types because otherwise it invalidates the very cast operation itself.This PR adds the output of
Cast
nodes to value info block list.NOTE: An alternative fix is to add
Cast
nodes to theDEFAULT_OP_BLOCK_LIST
. I dislike it because it breaks the semantics that the default list contains "ops that are not supported for float16 in ONNX Runtime" according the manual here.Fix 3: Only remove cast pairs with fp32 input types.
The converter currently has a heuristics to remove "unnecessary" cast node pairs (defined in
remove_unnecessary_cast_node
) with the following pattern:upstream --> cast_to_fp16 --> cast_to_fp32 --> downstream
,upstream --> downstream
.However, the cast node pair is only truly "unnecessary" if the input type is fp16. This PR adds an additional guard to remove cast node pairs with the following pattern:
upstream --fp32--> cast_to_fp16 --> cast_to_fp32 --> downstream
,upstream --> downstream
,while remaining the following pattern intact:
upstream --fp16--> cast_to_fp16 --> cast_to_fp32 --> downstream
.Testing
Using the ONNX model submitted in #276, run a simple test script:
Resulting graph before fix
Resulting graph after fix
Note on redundant cast nodes
Note that after the fix, repeated conversion of the same model to fp16 will result in redundant cast node chains (e.g.
--fp32--> cast_to_fp16 --> cast_to_fp16 --> cast_to_fp16
). This is a result of theremove_unnecessary_cast_node
function using pattern matching heuristics instead of a true graph walking algorithm. While it is out of the scope of this PR, we could potentially implement an algorithm that inspects all "cast chains" in the graph and shorten them to a single "effective cast" operator.Issues closed
Closes #276