Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially lower fusion to optimize deserialization performance. #558

Closed
wants to merge 113 commits into from

Conversation

rdspring1
Copy link
Collaborator

@rdspring1 rdspring1 commented Jul 4, 2023

This PR optimizes the deserialization time by only running the analyze step in GpuLower. We skip all the passes while retaining the necessary information to run a kernel. Deserialization is estimated to be ~75% faster than recompiling GpuLower completely.

After kernel compilation, we only need the information in GpuLower::KernelSummary to create a new ExecutorEntry at runtime. The kir::Allocate nodes are generated directly without the rest of the lowering process.

Our approach for creating kir::Allocate nodes is based on the ExpressionEvaluator. During serialization, we store a set of base IterDomains and a series of operations to create the TensorViews domains. The kir::Allocatenodes are not inserted into thekir::Kernel` because they are only used to calculate the size of buffers at kernel runtime.

TODOs:

  • ExpressionSerializer and ExpressionBuilder builds TensorView and kir::Allocate nodes for global intermediate buffers and dynamic shared memory.
  • Store all information to run fusion in KernelSummary, which is stored in FusionExecutor.
  • Partially recreate and then deserialize KernelSummary
  • Add support for LoadStoreOp
  • Create separate paths for RNGOps and TMA kir nodes

@rdspring1
Copy link
Collaborator Author

!build

@rdspring1 rdspring1 closed this Dec 13, 2024
@rdspring1
Copy link
Collaborator Author

Need to update for new fusion executor dispatch system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serde serde = serialization + deserialization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant