DaCe 0.14.3
What's Changed
Scope Schedules
The schedule type of a scope (e.g., a Map) is now also determined by the surrounding storage. If the surrounding storage is ambiguous, dace will fail with a nice exception. This means that codes such as the one below:
@dace.program
def add(a: dace.float32[10, 10] @ dace.StorageType.GPU_Global,
b: dace.float32[10, 10] @ dace.StorageType.GPU_Global):
return a + b @ b
will now automatically run the +
and @
operators on the GPU.
DaCe Profiler
Easier interface for profiling applications: dace.profile
and dace.instrument
can now be used within Python with a simple API:
with dace.profile(repetitions=100) as profiler:
some_program(...)
# ...
other_program(...)
# Print all execution times of the last called program (other_program)
print(profiler.times[-1])
Where instrumentation is applied can be controlled with filters in the form of strings and wildcards, or with a function:
with dace.instrument(dace.InstrumentationType.GPU_Events,
filter='*add??') as profiler:
some_program(...)
# ...
other_program(...)
# Print instrumentation report for last call
print(profiler.reports[-1])
With dace.builtin_hooks.instrument_data
, the same technique can be applied to instrument data containers.
Improved Data Instrumentation
Data container instrumentation can further now be used conditionally, allowing saving and restoring of data container contents only if certain conditions are met. In addition to this, data instrumentation now saves the SDFG's symbol values at the time of dumping data, allowing an entire SDFG's state / context to be restored from data reports.
Restricted SSA for Scalars and Symbols
Two new passes (ScalarFission
and StrictSymbolSSA
) allow fissioning of scalar data containers (or arrays of size 1) and symbols into separate containers and symbols respectively, based on the scope or reach of writes to them. This is a form of restricted SSA, which performs SSA wherever possible without introducing Phi-nodes. This change is made possible by a set of new analysis passes that provide the scope or reach of each write to scalars or symbols.
Extending Cutout Capabilities
SDFG Cutouts can now be taken from more than one state.
Additionally, taking cutouts that only access a subset of a data containre (e.g., A[2:5]
from a data container A
of size N
) results in the cutout receiving an "Alibi Node" to represent only that subset of the data (A_cutout[0:3] -> A[2:5]
, where A_cutout
is of size 4). This allows cutouts to be significantly smaller and have a smaller memory footprint, simplifying debugging and localized optimization.
Finally, cutouts now contain an exact description of their input and output configuration. The input configuration is anything that may influence a cutout's behavior and may contain data before the cutout is executed in the context of the original SDFG. Similarly, the output configuration is anything that a cutout writes to, that may be read externally or may influence the behavior of the remaining SDFG. This allows isolating all side effects of changes to a particular cutout, allowing transformations to be tested and verified in isolation and simplifying debugging.
Bug Fixes, Compatability Improvements, and Other Changes
- SymPy 1.12 Compatibility by @alexnick83 in #1256
- GPU Grid-Strided Tiling by @C-TC in #1249
- Fix MapInterchange for Maps with dynamic inputs by @alexnick83 in #1244
- Assortment of fixes for dynamic Maps on GPU (dynamic thread blocks) by @alexnick83 in #1246
- Tuning Compatibility Fixes by @lukastruemper in #1234
- Inline preprocessor command by @tbennun in #1242
unsqueeze_memlet
fixes by @alexnick83 in #1203- Fix-intermediate-nodes by @alexnick83 in #1212
- Fix for LoopToMap when applied on multi-nested loops by @alexnick83 in #1207
- Fix-nested-sdfg-deepcopy by @alexnick83 in #1221
- Fix integer division in Python frontend by @tbennun in #1196
- Fix augmented assignment on scalar in condition by @tbennun in #1225
- Fix internal subscript access if already existed by @tbennun in #1228
- Fix atomic operation detection for exactly-overlapping ranges by @tbennun in #1230
- Fix-gpu-transform-copy-out by @alexnick83 in #1231
- Fix-interstate-free-symbols by @alexnick83 in #1238
- Fix nested access with nested symbol dependency by @alexnick83 in #1239
- Fix import in the transformations tutorial. by @lamyiowce in #1210
- LoopToMap detects shared transients by @alexnick83 in #1200
- Faster CI and reachability checks for codecov.io by @tbennun in #1213
- Map-fission-single-data-multi-connectors by @alexnick83 in #1216
- Add library path to HIP CMake by @tbennun in #1219
- BatchedMatMul: MKL gemm_batch support by @lukastruemper in #1181
Full Changelog: v0.14.2...v0.14.3
Please let us know if there are any regressions with this new release.