Beam search logit refactor #771

rhenry-nv · 2020-12-04T22:43:25Z

Description

Refactors the beam search when --n-best is specified so the retrieval of logits from the GPU is batched.

On a standard transformer with 3 decoder layers, this change saw improvements of up tp 10% when --n-best is specified. It also has the benefit of reducing CPU - GPU communication.

This does not contain a table similar to other PRs from #743 since the model motivating this change is different from the proxy model used in #743.

List of changes:

Refactors beam search and adds new tensor operators to support batched retrieval.

Added dependencies: none

How to test

Ran regression tests and they passed.

CMake command: cmake .. -DCOMPILE_CPU=on -DCOMPILE_CUDA=on -DUSE_SENTENCEPIECE=on -DUSE_STATIC_LIBS=off -DCOMPILE_SERVER=off -DUSE_FBGEMM=on -DCOMPILE_CUDA_SM35=off -DCOMPILE_CUDA_SM50=off -DCOMPILE_CUDA_SM60=off -DCOMPILE_CUDA_SM70=on -DCOMPILE_CUDA_SM75=off -DCOMPILE_TESTS=on

Ubuntu - 18.04.3 LTS
nvcc - 10.1.243
gcc - 7.5.0

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

…logits instead of one memcpy per logit.

…CPU backend. Should be more performant + the code is more readable.

…to top of modified files

…efactor

rhenry-nv added 7 commits December 4, 2020 12:15

Batches logit retrieval from device to once memcpy is issued for all …

8e14bef

…logits instead of one memcpy per logit.

Changed getherFromIndices to use pointers instead of get and set for …

3036e47

…CPU backend. Should be more performant + the code is more readable.

Changes class variable to follow Marian convention and adds licenses …

112f148

…to top of modified files

Fixes compile errors on other operating systems

5b35482

Adds licenses

11cfc59

Update change log

23af079

Merge remote-tracking branch 'public/master' into beam_search_logit_r…

f9e1180

…efactor

rhenry-nv mentioned this pull request Apr 9, 2021

Adds better Affine support for GPUs when using CUDA 11. Introduces a new bias addition kernel for CUDA < 11 #778

Merged

4 tasks

snukky added the performance label Nov 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beam search logit refactor #771

Beam search logit refactor #771

rhenry-nv commented Dec 4, 2020

Beam search logit refactor #771

Are you sure you want to change the base?

Beam search logit refactor #771

Conversation

rhenry-nv commented Dec 4, 2020

Description

How to test

Checklist