Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.0 #18

Merged
merged 28 commits into from
Nov 28, 2024
Merged

v0.4.0 #18

merged 28 commits into from
Nov 28, 2024

Conversation

XkunW
Copy link
Contributor

@XkunW XkunW commented Nov 28, 2024

PR Type

[Release]
v0.4.0

Short Description

  • Onboarded various new models and new model types: text embedding model and reward reasoning model.
  • Added metrics command that streams performance metrics for inference server.
  • Enabled more launch command options: --max-num-seqs, "--model-weights-parent-dir", --pipeline-parallelism, --enforce-eager.
  • Improved support for launching custom models.
  • Improved command response time.
  • Improved visuals for list command.

XkunW added 28 commits October 29, 2024 10:27
… models, added Llama 3.2 and Llama 3.1 Nemotron
…d list command based on model type, updated READMEs, removed debugging code
…ling for when errors that doesn't affect server launching show up in err logs
@XkunW XkunW merged commit d221dae into main Nov 28, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant