You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, I started playing with mojo in the last few days. I wrote a more comprehensive set of notes from my experience here: https://github.com/dsharlet/mojo_comments
One of the issues mentioned there is that as best I understand it, one needs to create a new layer of functions in order to apply higher order functions to them. For example, a common pattern is I want to tile two loops, vectorize_unroll x and unroll y in the inner tile, and parallelize the outer tile loop over y. This requires 4 higher order functions, and 4 different functions to apply them to! For example:
fn matmul_tile_output(
C: Matrix, A: Matrix, B: Matrix, rt: Runtime
):
@parameter
fn calc_tile[tile_j: Int, tile_i: Int](jo: Int, io: Int):
# Zero the output tile.
for i in range(io, io + tile_i):
for j in range(jo, jo + tile_j):
C.store[1](i, j, 0)
for k in range(0, A.cols):
@parameter
fn calc_tile_row[i: Int]():
@parameter
fn calc_tile_cols[nelts: Int](j: Int):
C.store[nelts](io + i, jo + j, C.load[nelts](io + i, jo + j) + A[io + i, k] * B.load[nelts](k, jo + j))
vectorize_unroll[nelts, tile_j // nelts, calc_tile_cols](tile_j)
unroll[tile_i, calc_tile_row]()
alias tile_i = 4
alias tile_j = nelts*4
tile[calc_tile, tile_j, tile_i](C.cols, C.rows)
(I haven't parallelized the outer i loop yet, otherwise there would be 4 functions here, not 3.)
But obviously this will get messy really quickly. It also requires the higher order functions to understand range, not just sizes (which I think would be very helpful anyways), otherwise this requires modifying the body itself of the function to add an offset. The above technique would require two lambdas to do this without modifying the code being tiled/parallelized!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi all, I started playing with mojo in the last few days. I wrote a more comprehensive set of notes from my experience here: https://github.com/dsharlet/mojo_comments
One of the issues mentioned there is that as best I understand it, one needs to create a new layer of functions in order to apply higher order functions to them. For example, a common pattern is I want to tile two loops, vectorize_unroll x and unroll y in the inner tile, and parallelize the outer tile loop over y. This requires 4 higher order functions, and 4 different functions to apply them to! For example:
(I haven't parallelized the outer i loop yet, otherwise there would be 4 functions here, not 3.)
Lambdas seem like a possible workaround, e.g.:
But obviously this will get messy really quickly. It also requires the higher order functions to understand
range
, not just sizes (which I think would be very helpful anyways), otherwise this requires modifying the body itself of the function to add an offset. The above technique would require two lambdas to do this without modifying the code being tiled/parallelized!Beta Was this translation helpful? Give feedback.
All reactions