Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between TRT engines for same models - TensorRT Issue cross-reference #113

Open
bernardrb opened this issue May 24, 2024 · 1 comment

Comments

@bernardrb
Copy link

bernardrb commented May 24, 2024

We are trying to recreate the results for EfficientViT-SAM, but are having issues with certain models. In this case, we reported issues with l2_encoder.onnx. In summary, these are the results:

L2 - FP16
{"all": 0.0, "large": 0.0, "medium": 0.0, "small": 0.0}

L2 - FP32
{"all": 79.12385607181146, "large": 83.05853600575689, "medium": 81.50597370444349, "small": 74.8830670481846}

All details about the setup is available through the link: Accuracy failure of TensorRT 8.6.3 when running trtexec built engine on GPU RTX4090

@ovunctuzel-bc
Copy link

ovunctuzel-bc commented May 29, 2024

I was able to resolve a similar issue by setting some layers in the attention block to FP32 precision. Might help with this case as well. I was able to retain 2x speedup compared to a full FP32 model.
See: #116 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants