Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beam search logit refactor #771

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

rhenry-nv
Copy link
Contributor

Description

Refactors the beam search when --n-best is specified so the retrieval of logits from the GPU is batched.

On a standard transformer with 3 decoder layers, this change saw improvements of up tp 10% when --n-best is specified. It also has the benefit of reducing CPU - GPU communication.

This does not contain a table similar to other PRs from #743 since the model motivating this change is different from the proxy model used in #743.

List of changes:

  • Refactors beam search and adds new tensor operators to support batched retrieval.

Added dependencies: none

How to test

Ran regression tests and they passed.

CMake command: cmake .. -DCOMPILE_CPU=on -DCOMPILE_CUDA=on -DUSE_SENTENCEPIECE=on -DUSE_STATIC_LIBS=off -DCOMPILE_SERVER=off -DUSE_FBGEMM=on -DCOMPILE_CUDA_SM35=off -DCOMPILE_CUDA_SM50=off -DCOMPILE_CUDA_SM60=off -DCOMPILE_CUDA_SM70=on -DCOMPILE_CUDA_SM75=off -DCOMPILE_TESTS=on

Ubuntu - 18.04.3 LTS
nvcc - 10.1.243
gcc - 7.5.0

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants