-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task Fusion, Constant Conversion Optimization, and 27pt stencil benchmark #150
Open
shivsundram
wants to merge
55
commits into
nv-legate:branch-24.03
Choose a base branch
from
shivsundram:shiv1/op_fusion3
base: branch-24.03
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
c7575d8
double add fused op
shivsundram 9c3879c
all inputs serialized for fused op
shivsundram 0a5fae3
fusion via inlining, as well as function based fusion in fused_binary
shivsundram 2b0b3b0
scalars reductions and opids, need to remove dynamic allocations
shivsundram c301582
fusion metadata passed via serialization now
shivsundram 01f9956
reuse serializer
shivsundram 72e03b3
some timing scripts, also stuff
shivsundram 4232c3c
remove profiling files
shivsundram 5c67081
re add in examples
shivsundram 983985a
partial fusion
2ce8002
merge attempt 1
caac1ee
finishing merge
ba128b1
op registry working
5bc9d7b
add new fused dir
3c9698b
Update the package version for the release
marcinz 02d8ce6
Fix #111
magnatelee 37273ae
Merge pull request #116 from magnatelee/typo-fix
magnatelee b698b33
Decrease relative tolerance in allclose for float16 values
marcinz ac314de
Revert "Decrease relative tolerance in allclose for float16 values"
marcinz 4167c7a
Allow greater margin of error for tensordot with float16
marcinz 395ff6d
gpu fused op
465a044
reduction fix
65ffccf
merge
5d3eab1
merge again
615c95a
re add cuda fused
13f95ad
fixing fuse file
ce77d59
more fused stuff
00897a6
Merge branch 'shiv1/op_fusion2' of github.com:shivsundram/legate.nump…
0d60b82
last merge fixes
e840297
constant optimization
8eb7694
better constant opt
becf41a
batch syncs for black scholes
b52c7bd
fused op cleanup
8c97fe0
merging in new branch
1ae85c5
black scholes adjustment
7a58b6d
add missing header
5a38ef1
27 pt stencil
f230fcb
only do constant optimization for deferred arrays for some reason
7922c3d
remove old files, change to constant optimization
d5908e1
cleanup
dce6226
cleanup
1c9fd1f
cleanup
1665cb5
constant opt adjustment
e541cf8
merging
aedf22d
merging
a3dd95a
undo last change
1fe56d3
cleanup fused op
c8c69b8
more cleanup
4c104dd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b4302a3
omp changes
e265929
merge conflict
92d8590
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 7391628
one more merge conflict
00687d9
merge conflict
1783d65
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
#!/usr/bin/env python | ||
|
||
# Copyright 2021 NVIDIA Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
from __future__ import print_function | ||
|
||
import argparse | ||
import datetime | ||
import math | ||
|
||
from benchmark import run_benchmark | ||
|
||
import cunumeric as np | ||
|
||
|
||
def initialize(N): | ||
print("Initializing stencil grid...") | ||
grid = np.zeros((N + 2, N + 2, N + 2)) | ||
grid[:, :, 0] = -273.15 | ||
grid[:, 0, :] = -273.15 | ||
grid[0, :, :] = -273.15 | ||
grid[:, :, -1] = 273.15 | ||
grid[:, -1, :] = 273.15 | ||
grid[-1, :, :] = 273.15 | ||
|
||
return grid | ||
|
||
|
||
def run(grid, I, N): # noqa: E741 | ||
print("Running Jacobi 27 stencil...") | ||
|
||
# one | ||
g000 = grid[0:-2, 0:-2, 0:-2] | ||
g001 = grid[0:-2, 0:-2, 1:-1] | ||
g002 = grid[0:-2, 0:-2, 2:] | ||
|
||
g010 = grid[0:-2, 1:-1, 0:-2] | ||
g011 = grid[0:-2, 1:-1, 1:-1] | ||
g012 = grid[0:-2, 1:-1, 2:] | ||
|
||
g020 = grid[0:-2, 2:, 0:-2] | ||
g021 = grid[0:-2, 2:, 1:-1] | ||
g022 = grid[0:-2, 2:, 2:] | ||
|
||
# two | ||
g100 = grid[1:-1, 0:-2, 0:-2] | ||
g101 = grid[1:-1, 0:-2, 1:-1] | ||
g102 = grid[1:-1, 0:-2, 2:] | ||
|
||
g110 = grid[1:-1, 1:-1, 0:-2] | ||
g111 = grid[1:-1, 1:-1, 1:-1] | ||
g112 = grid[1:-1, 1:-1, 2:] | ||
|
||
g120 = grid[1:-1, 2:, 0:-2] | ||
g121 = grid[1:-1, 2:, 1:-1] | ||
g122 = grid[1:-1, 2:, 2:] | ||
|
||
# three | ||
g200 = grid[2:, 0:-2, 0:-2] | ||
g201 = grid[2:, 0:-2, 1:-1] | ||
g202 = grid[2:, 0:-2, 2:] | ||
|
||
g210 = grid[2:, 1:-1, 0:-2] | ||
g211 = grid[2:, 1:-1, 1:-1] | ||
g212 = grid[2:, 1:-1, 2:] | ||
|
||
g220 = grid[2:, 2:, 0:-2] | ||
g221 = grid[2:, 2:, 1:-1] | ||
g222 = grid[2:, 2:, 2:] | ||
|
||
for i in range(I): | ||
g00 = g000 + g001 + g002 | ||
g01 = g010 + g011 + g012 | ||
g02 = g020 + g021 + g022 | ||
g10 = g100 + g101 + g102 | ||
g11 = g110 + g111 + g112 | ||
g12 = g120 + g121 + g122 | ||
g20 = g200 + g201 + g202 | ||
g21 = g210 + g211 + g212 | ||
g22 = g220 + g221 + g222 | ||
|
||
g0 = g00 + g01 + g02 | ||
g1 = g10 + g11 + g12 | ||
g2 = g20 + g21 + g22 | ||
|
||
res = g0 + g1 + g2 | ||
work = 0.037 * res | ||
g111[:] = work | ||
total = np.sum(g111) | ||
return total / (N ** 2) | ||
|
||
|
||
def run_stencil(N, I, timing): # noqa: E741 | ||
start = datetime.datetime.now() | ||
grid = initialize(N) | ||
average = run(grid, I, N) | ||
# This will sync the timing because we will need to wait for the result | ||
assert not math.isnan(average) | ||
stop = datetime.datetime.now() | ||
print("Average energy is %.8g" % average) | ||
delta = stop - start | ||
total = delta.total_seconds() * 1000.0 | ||
if timing: | ||
print("Elapsed Time: " + str(total) + " ms") | ||
return total | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument( | ||
"-i", | ||
"--iter", | ||
type=int, | ||
default=100, | ||
dest="I", | ||
help="number of iterations to run", | ||
) | ||
parser.add_argument( | ||
"-n", | ||
"--num", | ||
type=int, | ||
default=100, | ||
dest="N", | ||
help="number of elements in one dimension", | ||
) | ||
parser.add_argument( | ||
"-t", | ||
"--time", | ||
dest="timing", | ||
action="store_true", | ||
help="perform timing", | ||
) | ||
parser.add_argument( | ||
"-b", | ||
"--benchmark", | ||
type=int, | ||
default=1, | ||
dest="benchmark", | ||
help="number of times to benchmark this application (default 1 " | ||
"- normal execution)", | ||
) | ||
args = parser.parse_args() | ||
run_benchmark( | ||
run_stencil, args.benchmark, "Stencil", (args.N, args.I, args.timing) | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above code is a scalar constant optimization, which avoids dispatching
CONVERT
operations (for a scalar constant), as the constant's value is embedded in the code and thus already known