-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small optimizations #768
base: master
Are you sure you want to change the base?
Small optimizations #768
Conversation
…n. Reduced the memcpys from 3 to 1 for the general case.
… default stream. This have issue each thread issue calls to their own stream instead of the global default stream when marian is compiled to use a default stream per thread
… used for better tensor core usage.
src/data/corpus.cpp
Outdated
@@ -1,3 +1,8 @@ | |||
/* Part of this file was contributed by NVIDIA under license: | |||
* Copyright (C) 2020 NVIDIA Corporation | |||
* SPDX-License-Identifier: MIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? It seems the only change is the rounding of maxDims
. What NVidia contribution was made here?
In general, I would be opposed to changing the comment style for licence. Marian source code does not have license information in the source files directly, but rather in a separate license file. Please let's continue to follow that pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was the only contribution. However, I was told to include a notice even in files where I make one line changes. This is just me following instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I just saw the second part of your comment. Is there a process for adding NVIDIA to the license file? I'm not sure what the solution is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one that hasn't been updated since 2016 and doesn't even name Microsoft? I guess add it to your PR and shame Marcin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked internally and I can add NVIDIA to the main license file and remove the notices in all the other files!
I will take care of this in all the PRs I have submitted.
Edit: Let me know if the license change looks ok.
Expr indices; | ||
// I think this doesn't work if model split among gpus but not sure if it matters | ||
|
||
for (auto& state : states_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment what this logic does, as it is not obvious why the old code is not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing wrong with the old code. This just needs to check if we need to ship indices to the GPU. I will add a comment explaining what this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment in a recent commit but I'm not sure if it's clear enough. If so, feel free to resolve this.
if (state.output) { | ||
indices = state.output->graph()->indices(selIdx); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if neither? Is that a valid condition? If not, let's change this to else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think neither is a valid condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because it's a loop, sorry. But what happens if it never matches any of the conditions? Then indices
ends up being NULL. Is it worth to ABORT_IF
for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think indices
can only end up being NULL if all of the states' output
and cell
fields are null. In that case, we will return a vector of nulls which was the behavior of the original code. (Also, I think these values are NULL on the first run of a network so we want this behavior).
Description
This PR splits out some small optimizations from PR #743.
Performance improvements from this PR as measured on a Titan V using a proxy transformer model:
Times with 1 Stream
Times with two streams
List of changes:
How to test
I ran the regression tests. On Volta, some regression tests fail due to the additional use of tensor cores. This is because prior to CUDA 11, cublas does not use tensorcores if matrices fail to meet alignment requirements. This restriction is lifted in CUDA >= 11.
The differences in the float output from the failing regression tests are small. A list of them are provided here:
OS: Ubuntu 18.04.3 LTS
Compiler gcc 7.5.0
nvcc version: 10.1.243
cmake command:
cmake .. -DCOMPILE_CPU=on -DCOMPILE_CUDA=on -DUSE_SENTENCEPIECE=on -DUSE_STATIC_LIBS=off -DCOMPILE_SERVER=off -DUSE_FBGEMM=on -DCOMPILE_CUDA_SM35=off -DCOMPILE_CUDA_SM50=off -DCOMPILE_CUDA_SM60=off -DCOMPILE_CUDA_SM70=on -DCOMPILE_CUDA_SM75=off -DCOMPILE_TESTS=on
Checklist