Release DaCe 0.14.3 · spcl/dace

What's Changed

Scope Schedules

The schedule type of a scope (e.g., a Map) is now also determined by the surrounding storage. If the surrounding storage is ambiguous, dace will fail with a nice exception. This means that codes such as the one below:

@dace.program
def add(a: dace.float32[10, 10] @ dace.StorageType.GPU_Global, 
        b: dace.float32[10, 10] @ dace.StorageType.GPU_Global):
    return a + b @ b

will now automatically run the + and @ operators on the GPU.

(#1262 by @tbennun)

DaCe Profiler

Easier interface for profiling applications: dace.profile and dace.instrument can now be used within Python with a simple API:

with dace.profile(repetitions=100) as profiler:
    some_program(...)
    # ...
    other_program(...)

# Print all execution times of the last called program (other_program)
print(profiler.times[-1])

Where instrumentation is applied can be controlled with filters in the form of strings and wildcards, or with a function:

with dace.instrument(dace.InstrumentationType.GPU_Events, 
                     filter='*add??') as profiler:
    some_program(...)
    # ...
    other_program(...)

# Print instrumentation report for last call
print(profiler.reports[-1])

With dace.builtin_hooks.instrument_data, the same technique can be applied to instrument data containers.

(#1197 by @tbennun)

Improved Data Instrumentation

Data container instrumentation can further now be used conditionally, allowing saving and restoring of data container contents only if certain conditions are met. In addition to this, data instrumentation now saves the SDFG's symbol values at the time of dumping data, allowing an entire SDFG's state / context to be restored from data reports.

(#1202, #1208 by @phschaad)

Restricted SSA for Scalars and Symbols

Two new passes (ScalarFission and StrictSymbolSSA) allow fissioning of scalar data containers (or arrays of size 1) and symbols into separate containers and symbols respectively, based on the scope or reach of writes to them. This is a form of restricted SSA, which performs SSA wherever possible without introducing Phi-nodes. This change is made possible by a set of new analysis passes that provide the scope or reach of each write to scalars or symbols.

(#1198, #1214 by @phschaad)

Extending Cutout Capabilities

SDFG Cutouts can now be taken from more than one state.

Additionally, taking cutouts that only access a subset of a data containre (e.g., A[2:5] from a data container A of size N) results in the cutout receiving an "Alibi Node" to represent only that subset of the data (A_cutout[0:3] -> A[2:5], where A_cutout is of size 4). This allows cutouts to be significantly smaller and have a smaller memory footprint, simplifying debugging and localized optimization.

Finally, cutouts now contain an exact description of their input and output configuration. The input configuration is anything that may influence a cutout's behavior and may contain data before the cutout is executed in the context of the original SDFG. Similarly, the output configuration is anything that a cutout writes to, that may be read externally or may influence the behavior of the remaining SDFG. This allows isolating all side effects of changes to a particular cutout, allowing transformations to be tested and verified in isolation and simplifying debugging.

(#1201 by @phschaad)

Bug Fixes, Compatability Improvements, and Other Changes

SymPy 1.12 Compatibility by @alexnick83 in #1256
GPU Grid-Strided Tiling by @C-TC in #1249
Fix MapInterchange for Maps with dynamic inputs by @alexnick83 in #1244
Assortment of fixes for dynamic Maps on GPU (dynamic thread blocks) by @alexnick83 in #1246
Tuning Compatibility Fixes by @lukastruemper in #1234
Inline preprocessor command by @tbennun in #1242
unsqueeze_memlet fixes by @alexnick83 in #1203
Fix-intermediate-nodes by @alexnick83 in #1212
Fix for LoopToMap when applied on multi-nested loops by @alexnick83 in #1207
Fix-nested-sdfg-deepcopy by @alexnick83 in #1221
Fix integer division in Python frontend by @tbennun in #1196
Fix augmented assignment on scalar in condition by @tbennun in #1225
Fix internal subscript access if already existed by @tbennun in #1228
Fix atomic operation detection for exactly-overlapping ranges by @tbennun in #1230
Fix-gpu-transform-copy-out by @alexnick83 in #1231
Fix-interstate-free-symbols by @alexnick83 in #1238
Fix nested access with nested symbol dependency by @alexnick83 in #1239
Fix import in the transformations tutorial. by @lamyiowce in #1210
LoopToMap detects shared transients by @alexnick83 in #1200
Faster CI and reachability checks for codecov.io by @tbennun in #1213
Map-fission-single-data-multi-connectors by @alexnick83 in #1216
Add library path to HIP CMake by @tbennun in #1219
BatchedMatMul: MKL gemm_batch support by @lukastruemper in #1181

Full Changelog: v0.14.2...v0.14.3

Please let us know if there are any regressions with this new release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DaCe 0.14.3