-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lower polygeist.subindex through memref.reinterpret_cast #147
base: main
Are you sure you want to change the base?
Conversation
092c1ad
to
3e0cdb7
Compare
// CHECK: %[[VAL_1:.*]] = memref.alloca() : memref<30x30xi32> | ||
// CHECK: %[[VAL_2:.*]] = arith.constant 30 : index | ||
// CHECK: %[[VAL_3:.*]] = arith.muli %[[VAL_0]], %[[VAL_2]] : index | ||
// CHECK: %[[VAL_4:.*]] = memref.reinterpret_cast %[[VAL_1]] to offset: {{\[}}%[[VAL_3]]], sizes: [30], strides: [1] : memref<30x30xi32> to memref<30xi32> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need to extract the previous stride and then add this new value. Lest this not work if you have two such operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been done in the updated commit (see the new test for reference). Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm not understanding, or I'm still not seeing it.
Can you add a subindex of subindex test (where both subindices are just offsets, rather than rank reducing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, could you have the memref argument be passed in as an argument?
In other words I would've expected %[[VAL_3:.*]]
to set the new offset = oldoffset + index * dimsize, whereas it is currently just index * dimsize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a subindex of subindex test (where both subindices are just offsets, rather than rank reducing?
I've added an extra test for the case of a non-rank reducing subindex operation. Again, this initial PR only covers the case of statically sized memrefs (which takes care of the most pressing issues on my sides). subindex to subindex of statically sized memories should therefore hold transitively.
Additionally, could you have the memref argument be passed in as an argument?
In other words I would've expected %[[VAL_3:.*]] to set the new offset = oldoffset + index * dimsize, whereas it is currently just index * dimsize
Not sure i understand what you mean by this. Could you show a snippet of the IR that you expect here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my worry is that you don't appear to be adding to the offset (e.g. you are setting the new offset). Suppose you have two subindex's in a row, the total offset should include terms from both subindex operations. Is that the case presently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless i am misunderstanding the reinterpret_cast operation, the offset is included in the value that the reinterpret_cast returns. So when two subindex operations are in a row, the first one will be converted to a reinterpret cast that has the same semantics as the initial subindex operation. Lowering the second is therefore independent of this.
But again an IR example of expected behaviour here might clarify any confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
llvm.func @printf(!llvm.ptr<i8>, ...) -> i32
llvm.mlir.global internal constant @str0("data[%d][%d]=%d\0A\00")
func @set(%arg0: memref<?xi32>) attributes {llvm.linkage = #llvm.linkage<external>} {
%c3_i32 = arith.constant 3 : i32
affine.store %c3_i32, %arg0[0] : memref<?xi32>
return
}
func @main() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
%c3_i32 = arith.constant 3 : i32
%c2 = arith.constant 2 : index
%c1 = arith.constant 1 : index
%c0_i32 = arith.constant 0 : i32
%c0 = arith.constant 0 : index
%0 = llvm.mlir.undef : i32
%1 = memref.alloca() : memref<2x3xi32>
affine.store %c0_i32, %1[0, 0] : memref<2x3xi32>
affine.store %c0_i32, %1[0, 1] : memref<2x3xi32>
affine.store %c0_i32, %1[0, 2] : memref<2x3xi32>
affine.store %c0_i32, %1[1, 0] : memref<2x3xi32>
affine.store %c0_i32, %1[1, 1] : memref<2x3xi32>
affine.store %c0_i32, %1[1, 2] : memref<2x3xi32>
%2 = "polygeist.subindex"(%1, %c1) : (memref<2x3xi32>, index) -> memref<?xi32>
%3 = "polygeist.subindex"(%2, %c2) : (memref<?xi32>, index) -> memref<?xi32>
affine.store %c3_i32, %3[0] : memref<?xi32>
scf.for %arg0 = %c0 to %c2 step %c1 {
%4 = arith.index_cast %arg0 : index to i32
scf.for %arg1 = %c0 to %c2 step %c1 {
%5 = arith.index_cast %arg1 : index to i32
%6 = llvm.mlir.addressof @str0 : !llvm.ptr<array<17 x i8>>
%7 = llvm.getelementptr %6[%c0_i32, %c0_i32] : (!llvm.ptr<array<17 x i8>>, i32, i32) -> !llvm.ptr<i8>
%8 = memref.load %1[%arg0, %arg1] : memref<2x3xi32>
%9 = llvm.call @printf(%7, %4, %5, %8) : (!llvm.ptr<i8>, i32, i32, i32) -> i32
}
}
return %0 : i32
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such cases are not covered by the PR due to the dynamically sized memrefs.
Again, this PR intends to lower polygeist.subindex
operations with static memrefs. By having this, even if it is only a subset of all possible polygeist.subindex
operations that are converted, we still expand the set of C programs that Polygeist can emit using upstream MLIR dialect operations. It is vital that the Polygeist dialect operations are converted for any downstream tools to be able to consume the output IR.
I am not excluding that I'll in the future look more closely into how polygeist.subindex
operations with dynamically shaped memrefs can be lowered into something meaningful, but for now, our usecase requires statically shaped memrefs and as such that is the most pressing issue to have resolved in Polygeist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh apologies, let me make that a statically sized subindex in that case (also regardless running the pass on the above crashes, which shuoldn't happen...)
Regardless, my concern here is (perhaps because of unfamiliarity with reinterpret_cast) that one of the offsets will be lost.
module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
llvm.func @printf(!llvm.ptr<i8>, ...) -> i32
llvm.mlir.global internal constant @str0("data[%d][%d]=%d\0A\00")
func @set(%arg0: memref<?xi32>) attributes {llvm.linkage = #llvm.linkage<external>} {
%c3_i32 = arith.constant 3 : i32
affine.store %c3_i32, %arg0[0] : memref<?xi32>
return
}
func @main() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
%c3_i32 = arith.constant 3 : i32
%c2 = arith.constant 2 : index
%c1 = arith.constant 1 : index
%c0_i32 = arith.constant 0 : i32
%c0 = arith.constant 0 : index
%0 = llvm.mlir.undef : i32
%1 = memref.alloca() : memref<2x3xi32>
affine.store %c0_i32, %1[0, 0] : memref<2x3xi32>
affine.store %c0_i32, %1[0, 1] : memref<2x3xi32>
affine.store %c0_i32, %1[0, 2] : memref<2x3xi32>
affine.store %c0_i32, %1[1, 0] : memref<2x3xi32>
affine.store %c0_i32, %1[1, 1] : memref<2x3xi32>
affine.store %c0_i32, %1[1, 2] : memref<2x3xi32>
%2 = "polygeist.subindex"(%1, %c1) : (memref<2x3xi32>, index) -> memref<3xi32>
%3 = "polygeist.subindex"(%2, %c2) : (memref<3xi32>, index) -> memref<?xi32>
affine.store %c3_i32, %3[0] : memref<?xi32>
scf.for %arg0 = %c0 to %c2 step %c1 {
%4 = arith.index_cast %arg0 : index to i32
scf.for %arg1 = %c0 to %c2 step %c1 {
%5 = arith.index_cast %arg1 : index to i32
%6 = llvm.mlir.addressof @str0 : !llvm.ptr<array<17 x i8>>
%7 = llvm.getelementptr %6[%c0_i32, %c0_i32] : (!llvm.ptr<array<17 x i8>>, i32, i32) -> !llvm.ptr<i8>
%8 = memref.load %1[%arg0, %arg1] : memref<2x3xi32>
%9 = llvm.call @printf(%7, %4, %5, %8) : (!llvm.ptr<i8>, i32, i32, i32) -> i32
}
}
return %0 : i32
}
}
3e0cdb7
to
120bc76
Compare
afc0b95
to
ecbd27b
Compare
@wsmoses mind taking another look at this? |
ecbd27b
to
dab65e8
Compare
This should be a (hopefully) foolproof method of performing indexing into a memref. A reintrepret_cast is inserted with a dynamic index calculated from the subindex index operand + the product of the sizes of the target type. This has been added as a separate conversion pass instead of through the canonicalization drivers. When added as a canonicalization, the conversion may preemptively apply, resulting in sub-par IR. Nevertheless, i think it has its merits to have a polygeist op lowering pass which can be used as a fallback to convert the dialect operations, if canonicalization fails. For now, just added support for statically shaped memrefs (enough to fix the regression on my side) but should be possible for dynamically shaped as well.
ff96f5c
to
524ccc8
Compare
This should be a (hopefully) foolproof method of performing indexing into a memref. A reintrepret_cast is inserted with a dynamic index calculated from the subindex index operand + the product of the sizes of the target type.
This has been added as a separate conversion pass instead of through the canonicalization drivers. When added as a canonicalization, the conversion may preemptively apply, resulting in sub-par IR. Nevertheless, i think it has its merits to have a polygeist op lowering pass which can be used as a fallback to convert the dialect operations, if canonicalization fails.
For now, just added support for statically shaped memrefs (enough to fix the regression on my side) but should be possible for dynamically shaped as well.
Sort of fixes #141