Training support #93

pommedeterresautee · 2022-10-12T10:17:22Z

Right now, only inference (forward pass) has been implemented for all Triton kernels.
To support training, backward pass should be implemented.
Each kernel is already wrapped in torch.autograd.Function so it will be just adding a method to an existing class and no refactoring should be needed.

Expected improvement in training are those of Flash attention:

2-4X faster training (just for the attention, linear layer and layernorm/rmsnorm should bring their own gains)
Support of very long sequences (up to 16K tokens) with a limited computation budget
Reduction of memory footprint is especially useful for contrastive learning (Is it safe to have Wqkv be float16? Dao-AILab/flash-attention#49 (comment) about a 10X batch size increase!)

Below are pointers that can be used for the implementation and a list of our main modifications to the original fw implementation:

Flash attention : it’s based on https://github.com/openai/triton/blob/master/python/tutorials/06-fused-attention.py#L238 We have mostly added multi head + batch + mask (broadcastable) + any seq len support
Linear layer : https://github.com/facebookresearch/xformers/blob/main/xformers/triton/fused_linear_layer.py#L53 we have changed the matmul implementation and added support for more activations
Layernorm / rmsnorm : https://github.com/openai/triton/blob/master/python/tutorials/05-layer-norm.py#L70 we have changed the implementation (Welford formula) and added support for rmsnorm

Training support will require adding to the bw pass of each kernel the modifications, we did if it makes sense (to have parity).

supersede #11

The text was updated successfully, but these errors were encountered:

pommedeterresautee added the enhancement New feature or request label Oct 12, 2022

pommedeterresautee mentioned this issue Oct 12, 2022

Add support of training to triton kernels #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training support #93

Training support #93

pommedeterresautee commented Oct 12, 2022

Training support #93

Training support #93

Comments

pommedeterresautee commented Oct 12, 2022