Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is this a bug and should I be expecting a segmentation in this example? #3576

Open
jjsjann123 opened this issue Dec 12, 2024 · 2 comments

Comments

@jjsjann123
Copy link
Collaborator

In the code snippet below, where we use expand to change a broadcast dimension to non-broadcast, I'm seeing our scheduler forced to segment and generating two kernels.
But if I switch the expand into a pad, we can actually handle it in a single kernel.

Is this an expected behavior? I'm suspecting this is because expand doesn't have a dependency between the broadcast ID and the non-broadcast ID.

TEST_F(PointwiseTest, WhyIsItSegmenting) {
  preseg_passes::OptimizationPassGuard<preseg_passes::MarkAliasesPreparePass>
      optimization_guard(false);
  auto fusion_ptr = std::make_unique<Fusion>();
  auto fusion = fusion_ptr.get();
  FusionGuard fg(fusion);

  // tv1 {b1, i0}
  TensorView* tv0 = TensorViewBuilder().shape({1, -1}).build();
  fusion->addInput(tv0);
  // tv1 {i2, b1, i0}
  TensorView* tv1 = TensorViewBuilder().shape({-1, 1, -1}).build();
  fusion->addInput(tv1);

  // tv2 {i2, b1, i0}
  auto tv2 = add(tv1, tv0);
  fusion->addOutput(tv2);

  // tv3 {i3, i0}
#if 0
  auto tv3 = pad(tv0, {IrBuilder::create<Val>(0L), IrBuilder::create<Val>(0L), IrBuilder::create<Val>(4L), IrBuilder::create<Val>(4L)});
#else
  auto tv3 = expand(tv0,
      {IrBuilder::create<Val>(9),
       tv0->axis(1)->extent()});
#endif
  // tv4 {i3*i0}
  auto tv4 = reshape(tv3, {9, 5}, {45});
  fusion->addOutput(tv4);
 
  FusionExecutorCache executor_cache(std::move(fusion_ptr));
  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  at::Tensor input0 = at::empty_strided({1, 5}, {5, 1}, options);
  at::Tensor input1 = at::empty_strided({7, 1, 5}, {5, 5, 1}, options);
  auto cg_outputs = executor_cache.runFusionWithInputs({input0, input1});
  testValidate(fusion, cg_outputs, {input0, input1}, __LINE__, __FILE__);
}
@jjsjann123
Copy link
Collaborator Author

It's surprising to me that codegen can handle resize more flexibly than expand. Not sure about performance though.... I should go double check the actual kernel. I'm not sure how expand is lowered to codegen and whether having a single kernel does indeed win on kernel time.

@naoyam
Copy link
Collaborator

naoyam commented Dec 13, 2024

For the pointwise scheduler, this should be segmented because there's no single tensor that has all the concrete IDs. Not saying scheduling this fusion as a single kernel is absolutely impossible, but because of how the current pointwise scheduler works, this should be segmented.

In the case of the padding, that's probably because the initial version of resize support was focused on being able to schedule resize-based ops within each of the pointwise and reduction schedulers, so it was designed to be flexible without too much thinking of actual implications. After all, it was supposed to be a short-term preliminary version, but we haven't spent much effort since then until recently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants