Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README.md: Add 3rd Party Inference Speed Dashboard #2244

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Sep 22, 2024

  1. README.md: Add 3rd Party Inference Speed Dashboard

    Hi TensorRT-LLM team,
    
    As an NVIDIA Inception startup, I would like to add a community link resource about an inference speed dashboard.
    
    The inference speed dashboard feature includes:
    
    Comprehensive benchmarks with several optimization techniques like FP16, FP8, INT8-weight-only, and INT4-weight-only.
    Comprehensive benchmarks with model architectures like Llama3-8b, Gemma2-27b, RecurrentGemma-9b, and Mamba2-2.7b.
    Comprehensive batch sizes, input lengths, and output lengths. Batch sizes range from 1 to 32 (128 for Mamba), and input and output lengths range from 32 to 4096.
    Additionally, I plan to publish the source code (website) and benchmark script by the end of this year.
    matichon-vultureprime authored Sep 22, 2024
    Configuration menu
    Copy the full SHA
    d7bbc7b View commit details
    Browse the repository at this point in the history