Skip to content

Commit

Permalink
Add more comments around getKernelRuntimeFor (#2871)
Browse files Browse the repository at this point in the history
Co-authored-by: Christian Sarofeen <[email protected]>
Co-authored-by: Jingyue Wu <[email protected]>
  • Loading branch information
3 people authored Sep 2, 2024
1 parent fabbc1a commit dca416d
Showing 1 changed file with 37 additions and 3 deletions.
40 changes: 37 additions & 3 deletions csrc/kernel_cache.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -661,10 +661,39 @@ DynamicTransformInitialInfo& FusionExecutorCache::initialInfo() {
return initial_info_.value();
}

// getKernelRuntimeFor inspects the inputs to find a usable FusionKernelRuntime
// as quickly as possible. To do so we cache at multiple levels:
// A. If we have seen these inputs before, we re-use the FusionKernelRuntime
// we used last time. Here, we mean the same input tensor sizes, as well as
// same input scalars if they are used to compute an intermediate or output
// tensor size.
// B. We check how we should concretize the dynamic fusion using these
// inputs. If we have not concretized the fusion this way previously, then we
// concretize it and create a new FusionKernelRuntime, which means segmenting
// and compiling new kernels. Otherwise, we check whether we can re-use any of
// the previously-segmented runtimes.
// i. We look at all FusionKernelRuntimes that have been used with
// this concretized fusion.
// ii. For each of those runtimes, we compare the heuristic parameters for
// each segment to those that we compute using the current inputs.
// If we do not find any runtimes whose heuristic parameters match, then we
// create a new FusionKernelRuntime, which means segmenting and compiling all
// new kernels.
//
// In summary, we have the following paths, in order of hottest to coldest:
// 1. Input ID cache hit: re-use runtime used last time these inputs were seen
// 2. Concretization match, runtime heuristic params match: re-use runtime
// after checking concretization/heuristics.
// 3. Concretization match but no runtime heuristic params match. Segment
// to create new FusionKernelRuntime
// 4. Concretization is unseen: Segment to create a new FusionKernelRuntime
// For re-used shapes, path 1 is most relevant. For dynamic shape problems with
// a large number of unique shapes, path 2 is important. Paths 3 and 4 are slow
// since they both involve re-segmentation and re-compilation of the Fusion.
FusionKernelRuntime* FusionExecutorCache::getKernelRuntimeFor(
const KernelArgumentHolder& args,
std::optional<PrimDataType> forced_index_type) {
// Check for id hit case
// Check for id hit case (Path 1)
auto unique_id_opt = args.getCacheId();
NVF_CHECK(
unique_id_opt.has_value(),
Expand Down Expand Up @@ -696,7 +725,7 @@ FusionKernelRuntime* FusionExecutorCache::getKernelRuntimeFor(
}

// Initialize or fetch vector of FusionKernelRuntime objects associated with
// each pair of device ID and
// each pair of device ID and concretization info.
auto config = std::make_pair(args.getDeviceIndex(), conc_info);
auto& kernel_runtimes = kernel_runtimes_.try_emplace(config).first->second;
auto result =
Expand All @@ -712,13 +741,17 @@ FusionKernelRuntime* FusionExecutorCache::getKernelRuntimeFor(

FusionKernelRuntime* kernel_runtime = nullptr;

// reusing indicates whether we are following Path 2 (true) or Paths 3/4
// (false)
bool reusing = false;
// By default, we try to avoid recompiling whenever possible. However, this
// can lead to suboptimal code if we only check that a compiled kernel is able
// to run with some inputs, instead of whether it is optimal to do so. The
// NVFUSER_DISABLE=kernel_reuse option is a coarse tool that just enforces
// that whenever we encounter a new set of input shapes we segment and compile
// a new FusionKernelRuntime.
// a new FusionKernelRuntime. Effectively, this option disables Paths 2 and 3
// above so that we only have Path 1 (hottest re-use path) and Path 4 (full
// recompile).
if (!isOptionDisabled(DisableOption::KernelReuse)) {
auto reuse_it = std::find_if(
kernel_runtimes.begin(),
Expand All @@ -740,6 +773,7 @@ FusionKernelRuntime* FusionExecutorCache::getKernelRuntimeFor(
}

if (!reusing) {
// Paths 3 or 4
// cache miss, need to re-build an optimized graph for this case

// Clone fusion_ so that we can safely use an ExpressionEvaluator on it, for
Expand Down

0 comments on commit dca416d

Please sign in to comment.