[Question] Is this a bug and should I be expecting a segmentation in this example? #3576

jjsjann123 · 2024-12-12T06:19:50Z

In the code snippet below, where we use expand to change a broadcast dimension to non-broadcast, I'm seeing our scheduler forced to segment and generating two kernels.
But if I switch the expand into a pad, we can actually handle it in a single kernel.

Is this an expected behavior? I'm suspecting this is because expand doesn't have a dependency between the broadcast ID and the non-broadcast ID.

TEST_F(PointwiseTest, WhyIsItSegmenting) {
  preseg_passes::OptimizationPassGuard<preseg_passes::MarkAliasesPreparePass>
      optimization_guard(false);
  auto fusion_ptr = std::make_unique<Fusion>();
  auto fusion = fusion_ptr.get();
  FusionGuard fg(fusion);

  // tv1 {b1, i0}
  TensorView* tv0 = TensorViewBuilder().shape({1, -1}).build();
  fusion->addInput(tv0);
  // tv1 {i2, b1, i0}
  TensorView* tv1 = TensorViewBuilder().shape({-1, 1, -1}).build();
  fusion->addInput(tv1);

  // tv2 {i2, b1, i0}
  auto tv2 = add(tv1, tv0);
  fusion->addOutput(tv2);

  // tv3 {i3, i0}
#if 0
  auto tv3 = pad(tv0, {IrBuilder::create<Val>(0L), IrBuilder::create<Val>(0L), IrBuilder::create<Val>(4L), IrBuilder::create<Val>(4L)});
#else
  auto tv3 = expand(tv0,
      {IrBuilder::create<Val>(9),
       tv0->axis(1)->extent()});
#endif
  // tv4 {i3*i0}
  auto tv4 = reshape(tv3, {9, 5}, {45});
  fusion->addOutput(tv4);
 
  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor input0 = at::empty_strided({1, 5}, {5, 1}, options);
  at::Tensor input1 = at::empty_strided({7, 1, 5}, {5, 5, 1}, options);
  auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
  testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}

The text was updated successfully, but these errors were encountered:

jjsjann123 · 2024-12-12T17:59:47Z

It's surprising to me that codegen can handle resize more flexibly than expand. Not sure about performance though.... I should go double check the actual kernel. I'm not sure how expand is lowered to codegen and whether having a single kernel does indeed win on kernel time.

naoyam · 2024-12-13T06:23:57Z

For the pointwise scheduler, this should be segmented because there's no single tensor that has all the concrete IDs. Not saying scheduling this fusion as a single kernel is absolutely impossible, but because of how the current pointwise scheduler works, this should be segmented.

In the case of the padding, that's probably because the initial version of resize support was focused on being able to schedule resize-based ops within each of the pointwise and reduction schedulers, so it was designed to be flexible without too much thinking of actual implications. After all, it was supposed to be a short-term preliminary version, but we haven't spent much effort since then until recently.

jjsjann123 mentioned this issue Dec 12, 2024

pointwise scheduler fails to validate reference tv #3513

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Is this a bug and should I be expecting a segmentation in this example? #3576

[Question] Is this a bug and should I be expecting a segmentation in this example? #3576

jjsjann123 commented Dec 12, 2024

jjsjann123 commented Dec 12, 2024

naoyam commented Dec 13, 2024

[Question] Is this a bug and should I be expecting a segmentation in this example? #3576

[Question] Is this a bug and should I be expecting a segmentation in this example? #3576

Comments

jjsjann123 commented Dec 12, 2024

jjsjann123 commented Dec 12, 2024

naoyam commented Dec 13, 2024