-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test - Inference Dashboard #2
Comments
Inference Performance Dashboard for float32 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. This is inference run. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with float32 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Build Summarysee moreRun nameday_354_20_12_22_performance_float32_370 Commit hashespytorch commit: 88c581be87ac59ea1251f35a57b610ae81b9362d TorchDynamo config flagstorch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False Torch versiontorch: 2.0.0a0+git88c581b Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8302 |
Inference Performance Dashboard for float16 precisionExecutive Summarysee moreWe evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. This is inference run. For accuracy, we check the numerical correctness of forward pass outputs. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.Caveats
To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks. Passrate
Geometric mean speedup
Mean compilation time (seconds)
Peak memory footprint compression ratio (higher is better)
torchbench suite with float16 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
huggingface suite with float16 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
timm_models suite with float16 precisionsee morePerformance speedup
Accuracy
Compilation latency (sec)
Peak Memory Compression Ratio
Absolute latency (ms)
Build Summarysee moreRun nameday_355_21_12_22_performance_float16_459 Commit hashespytorch commit: 88c581be87ac59ea1251f35a57b610ae81b9362d TorchDynamo config flagstorch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False Torch versiontorch: 2.0.0a0+git88c581b Environment variablesTORCH_CUDA_ARCH_LIST = 8.0 GPU detailsCUDNN VERSION: 8302 |
Testing the inference numbers
The text was updated successfully, but these errors were encountered: