Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add t4 for llm perf leaderboard #238

Merged
merged 18 commits into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions .github/workflows/update_llm_perf_cuda_pytorch.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Update LLM Perf Benchmarks - CUDA PyTorch

on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
on:
workflow_dispatch: # Manual trigger
release: # Trigger on new release
types: [published]
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs commenting, and why on release ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can remove the comments

Good question. I think it would be more efficient to run the full benchmark with each release of the pip package, rather than on a daily basis. Running it daily seems wasteful, as the hardware remains unchanged, and we’re simply repeating the benchmark for every code change. Since users are likely to benchmark using the PyPI package, it makes more sense to align this workflow with each release. We could also run them manually if we discover any issues with our benchmarks. However, if you prefer running the benchmark daily, I can revert to that schedule. Just let me know your preference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess there's a misunderstanding. the daily trigger runs different benchmarks (different model+opt+quant) each time because it skips already benchmarked configurations. it is also a way to benchmark all configurations without being limited by the 6 hours time constraint of runners.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, it makes much more sense now. I removed the release schedule and left the original one


concurrency:
cancel-in-progress: true
Expand All @@ -18,7 +18,11 @@ jobs:
fail-fast: false
matrix:
subset: [unquantized, bnb, awq, gptq]
machine: [{ name: 1xA10, runs-on: [single-gpu, nvidia-gpu, a10, ci] }]

machine: [
{name: 1xA10, runs-on: {group: 'aws-g5-4xlarge-plus'}},
{name: 1xT4, runs-on: {group: 'aws-g4dn-2xlarge'}}
]

runs-on: ${{ matrix.machine.runs-on }}

Expand Down
2 changes: 1 addition & 1 deletion llm_perf/update_llm_perf_cuda_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ def benchmark_cuda_pytorch(model, attn_implementation, weights_config):
quantization_scheme=quant_scheme,
quantization_config=quant_config,
attn_implementation=attn_implementation,
hub_kwargs={"trust_remote_code": True},
model_kwargs={"trust_remote_code": True},
)

benchmark_config = BenchmarkConfig(
Expand Down
2 changes: 1 addition & 1 deletion llm_perf/update_llm_perf_leaderboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ def gather_benchmarks(subset: str, machine: str):

def update_perf_dfs():
for subset in ["unquantized", "bnb", "awq", "gptq"]:
for machine in ["1xA10", "1xA100"]:
for machine in ["1xA10", "1xA100", "1xT4"]:
try:
gather_benchmarks(subset, machine)
except Exception:
Expand Down
1 change: 1 addition & 0 deletions optimum_benchmark/backends/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ def __post_init__(self):
self.library,
revision=self.model_kwargs.get("revision", None),
token=self.model_kwargs.get("token", None),
trust_remote_code=self.model_kwargs.get("trust_remote_code", False),
)

if self.device is None:
Expand Down
5 changes: 4 additions & 1 deletion optimum_benchmark/task_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,7 @@ def infer_model_type_from_model_name_or_path(
library_name: Optional[str] = None,
revision: Optional[str] = None,
token: Optional[str] = None,
trust_remote_code: bool = False,
) -> str:
if library_name is None:
library_name = infer_library_from_model_name_or_path(model_name_or_path, revision=revision, token=token)
Expand All @@ -216,7 +217,9 @@ def infer_model_type_from_model_name_or_path(
break

else:
transformers_config = get_transformers_pretrained_config(model_name_or_path, revision=revision, token=token)
transformers_config = get_transformers_pretrained_config(
model_name_or_path, revision=revision, token=token, trust_remote_code=trust_remote_code
)
inferred_model_type = transformers_config.model_type

if inferred_model_type is None:
Expand Down
Loading