Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower polygeist.subindex through memref.reinterpret_cast #147

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mortbopet
Copy link
Contributor

@mortbopet mortbopet commented Jan 7, 2022

This should be a (hopefully) foolproof method of performing indexing into a memref. A reintrepret_cast is inserted with a dynamic index calculated from the subindex index operand + the product of the sizes of the target type.

This has been added as a separate conversion pass instead of through the canonicalization drivers. When added as a canonicalization, the conversion may preemptively apply, resulting in sub-par IR. Nevertheless, i think it has its merits to have a polygeist op lowering pass which can be used as a fallback to convert the dialect operations, if canonicalization fails.

For now, just added support for statically shaped memrefs (enough to fix the regression on my side) but should be possible for dynamically shaped as well.

Sort of fixes #141

@mortbopet mortbopet marked this pull request as draft January 7, 2022 10:27
// CHECK: %[[VAL_1:.*]] = memref.alloca() : memref<30x30xi32>
// CHECK: %[[VAL_2:.*]] = arith.constant 30 : index
// CHECK: %[[VAL_3:.*]] = arith.muli %[[VAL_0]], %[[VAL_2]] : index
// CHECK: %[[VAL_4:.*]] = memref.reinterpret_cast %[[VAL_1]] to offset: {{\[}}%[[VAL_3]]], sizes: [30], strides: [1] : memref<30x30xi32> to memref<30xi32>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may need to extract the previous stride and then add this new value. Lest this not work if you have two such operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been done in the updated commit (see the new test for reference). Let me know what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm not understanding, or I'm still not seeing it.

Can you add a subindex of subindex test (where both subindices are just offsets, rather than rank reducing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, could you have the memref argument be passed in as an argument?

In other words I would've expected %[[VAL_3:.*]] to set the new offset = oldoffset + index * dimsize, whereas it is currently just index * dimsize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a subindex of subindex test (where both subindices are just offsets, rather than rank reducing?

I've added an extra test for the case of a non-rank reducing subindex operation. Again, this initial PR only covers the case of statically sized memrefs (which takes care of the most pressing issues on my sides). subindex to subindex of statically sized memories should therefore hold transitively.

Additionally, could you have the memref argument be passed in as an argument?
In other words I would've expected %[[VAL_3:.*]] to set the new offset = oldoffset + index * dimsize, whereas it is currently just index * dimsize

Not sure i understand what you mean by this. Could you show a snippet of the IR that you expect here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my worry is that you don't appear to be adding to the offset (e.g. you are setting the new offset). Suppose you have two subindex's in a row, the total offset should include terms from both subindex operations. Is that the case presently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless i am misunderstanding the reinterpret_cast operation, the offset is included in the value that the reinterpret_cast returns. So when two subindex operations are in a row, the first one will be converted to a reinterpret cast that has the same semantics as the initial subindex operation. Lowering the second is therefore independent of this.

But again an IR example of expected behaviour here might clarify any confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  llvm.func @printf(!llvm.ptr<i8>, ...) -> i32
  llvm.mlir.global internal constant @str0("data[%d][%d]=%d\0A\00")
  func @set(%arg0: memref<?xi32>) attributes {llvm.linkage = #llvm.linkage<external>} {
    %c3_i32 = arith.constant 3 : i32
    affine.store %c3_i32, %arg0[0] : memref<?xi32>
    return
  }
  func @main() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
    %c3_i32 = arith.constant 3 : i32
    %c2 = arith.constant 2 : index
    %c1 = arith.constant 1 : index
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %0 = llvm.mlir.undef : i32
    %1 = memref.alloca() : memref<2x3xi32>
    affine.store %c0_i32, %1[0, 0] : memref<2x3xi32>
    affine.store %c0_i32, %1[0, 1] : memref<2x3xi32>
    affine.store %c0_i32, %1[0, 2] : memref<2x3xi32>
    affine.store %c0_i32, %1[1, 0] : memref<2x3xi32>
    affine.store %c0_i32, %1[1, 1] : memref<2x3xi32>
    affine.store %c0_i32, %1[1, 2] : memref<2x3xi32>
    %2 = "polygeist.subindex"(%1, %c1) : (memref<2x3xi32>, index) -> memref<?xi32>
    %3 = "polygeist.subindex"(%2, %c2) : (memref<?xi32>, index) -> memref<?xi32>
    affine.store %c3_i32, %3[0] : memref<?xi32>
    scf.for %arg0 = %c0 to %c2 step %c1 {
      %4 = arith.index_cast %arg0 : index to i32
      scf.for %arg1 = %c0 to %c2 step %c1 {
        %5 = arith.index_cast %arg1 : index to i32
        %6 = llvm.mlir.addressof @str0 : !llvm.ptr<array<17 x i8>>
        %7 = llvm.getelementptr %6[%c0_i32, %c0_i32] : (!llvm.ptr<array<17 x i8>>, i32, i32) -> !llvm.ptr<i8>
        %8 = memref.load %1[%arg0, %arg1] : memref<2x3xi32>
        %9 = llvm.call @printf(%7, %4, %5, %8) : (!llvm.ptr<i8>, i32, i32, i32) -> i32
      }
    }
    return %0 : i32
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such cases are not covered by the PR due to the dynamically sized memrefs.

Again, this PR intends to lower polygeist.subindex operations with static memrefs. By having this, even if it is only a subset of all possible polygeist.subindex operations that are converted, we still expand the set of C programs that Polygeist can emit using upstream MLIR dialect operations. It is vital that the Polygeist dialect operations are converted for any downstream tools to be able to consume the output IR.
I am not excluding that I'll in the future look more closely into how polygeist.subindex operations with dynamically shaped memrefs can be lowered into something meaningful, but for now, our usecase requires statically shaped memrefs and as such that is the most pressing issue to have resolved in Polygeist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh apologies, let me make that a statically sized subindex in that case (also regardless running the pass on the above crashes, which shuoldn't happen...)

Regardless, my concern here is (perhaps because of unfamiliarity with reinterpret_cast) that one of the offsets will be lost.

module attributes {llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  llvm.func @printf(!llvm.ptr<i8>, ...) -> i32
  llvm.mlir.global internal constant @str0("data[%d][%d]=%d\0A\00")
  func @set(%arg0: memref<?xi32>) attributes {llvm.linkage = #llvm.linkage<external>} {
    %c3_i32 = arith.constant 3 : i32
    affine.store %c3_i32, %arg0[0] : memref<?xi32>
    return
  }
  func @main() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
    %c3_i32 = arith.constant 3 : i32
    %c2 = arith.constant 2 : index
    %c1 = arith.constant 1 : index
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %0 = llvm.mlir.undef : i32
    %1 = memref.alloca() : memref<2x3xi32>
    affine.store %c0_i32, %1[0, 0] : memref<2x3xi32>
    affine.store %c0_i32, %1[0, 1] : memref<2x3xi32>
    affine.store %c0_i32, %1[0, 2] : memref<2x3xi32>
    affine.store %c0_i32, %1[1, 0] : memref<2x3xi32>
    affine.store %c0_i32, %1[1, 1] : memref<2x3xi32>
    affine.store %c0_i32, %1[1, 2] : memref<2x3xi32>
    %2 = "polygeist.subindex"(%1, %c1) : (memref<2x3xi32>, index) -> memref<3xi32>
    %3 = "polygeist.subindex"(%2, %c2) : (memref<3xi32>, index) -> memref<?xi32>
    affine.store %c3_i32, %3[0] : memref<?xi32>
    scf.for %arg0 = %c0 to %c2 step %c1 {
      %4 = arith.index_cast %arg0 : index to i32
      scf.for %arg1 = %c0 to %c2 step %c1 {
        %5 = arith.index_cast %arg1 : index to i32
        %6 = llvm.mlir.addressof @str0 : !llvm.ptr<array<17 x i8>>
        %7 = llvm.getelementptr %6[%c0_i32, %c0_i32] : (!llvm.ptr<array<17 x i8>>, i32, i32) -> !llvm.ptr<i8>
        %8 = memref.load %1[%arg0, %arg1] : memref<2x3xi32>
        %9 = llvm.call @printf(%7, %4, %5, %8) : (!llvm.ptr<i8>, i32, i32, i32) -> i32
      }
    }
    return %0 : i32
  }
}

@mortbopet mortbopet force-pushed the subindex_lowering2 branch 2 times, most recently from afc0b95 to ecbd27b Compare January 29, 2022 08:54
@mortbopet mortbopet marked this pull request as ready for review January 29, 2022 08:54
@mortbopet mortbopet requested a review from wsmoses January 29, 2022 08:54
@mortbopet
Copy link
Contributor Author

@wsmoses mind taking another look at this?

mortbopet added 6 commits May 24, 2022 11:01
This should be a (hopefully) foolproof method of performing indexing into a memref. A reintrepret_cast is inserted with a dynamic index calculated from the subindex index operand + the product of the sizes of the target type.

This has been added as a separate conversion pass instead of through the canonicalization drivers. When added as a canonicalization, the conversion may preemptively apply, resulting in sub-par IR. Nevertheless, i think it has its merits to have a polygeist op lowering pass which can be used as a fallback to convert the dialect operations, if canonicalization fails.

For now, just added support for statically shaped memrefs (enough to fix the regression on my side) but should be possible for dynamically shaped as well.
@mortbopet mortbopet force-pushed the subindex_lowering2 branch from ff96f5c to 524ccc8 Compare May 24, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SubToSubView canonicalization disabled
2 participants