Composability of higher order functions #736

dsharlet · 2023-09-11T19:43:42Z

dsharlet
Sep 11, 2023

Hi all, I started playing with mojo in the last few days. I wrote a more comprehensive set of notes from my experience here: https://github.com/dsharlet/mojo_comments

One of the issues mentioned there is that as best I understand it, one needs to create a new layer of functions in order to apply higher order functions to them. For example, a common pattern is I want to tile two loops, vectorize_unroll x and unroll y in the inner tile, and parallelize the outer tile loop over y. This requires 4 higher order functions, and 4 different functions to apply them to! For example:

fn matmul_tile_output(
    C: Matrix, A: Matrix, B: Matrix, rt: Runtime
):

  @parameter
  fn calc_tile[tile_j: Int, tile_i: Int](jo: Int, io: Int):
    # Zero the output tile.
    for i in range(io, io + tile_i):
      for j in range(jo, jo + tile_j):
        C.store[1](i, j, 0)

    for k in range(0, A.cols):
      @parameter
      fn calc_tile_row[i: Int]():
        @parameter
        fn calc_tile_cols[nelts: Int](j: Int):
          C.store[nelts](io + i, jo + j, C.load[nelts](io + i, jo + j) + A[io + i, k] * B.load[nelts](k, jo + j))

        vectorize_unroll[nelts, tile_j // nelts, calc_tile_cols](tile_j)

      unroll[tile_i, calc_tile_row]()

  alias tile_i = 4
  alias tile_j = nelts*4
  tile[calc_tile, tile_j, tile_i](C.cols, C.rows)

(I haven't parallelized the outer i loop yet, otherwise there would be 4 functions here, not 3.)

Lambdas seem like a possible workaround, e.g.:

@parameter
fn calc_tile[tile_w: Int, tile_h: Int](xo: Int, yo: Int)
  ...

parallelize[lambda yo: tile[calc_tile, tile_w, tile_h](a.width, range(yo*tile_h, (yo + 1)*tile_h))](a.height // tile_h)

But obviously this will get messy really quickly. It also requires the higher order functions to understand range, not just sizes (which I think would be very helpful anyways), otherwise this requires modifying the body itself of the function to add an offset. The above technique would require two lambdas to do this without modifying the code being tiled/parallelized!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composability of higher order functions #736

{{title}}

Replies: 0 comments

Select a reply

Composability of higher order functions #736

dsharlet Sep 11, 2023

Replies: 0 comments

dsharlet
Sep 11, 2023