Skip to content
/ serve Public
forked from pytorch/serve

Serve, optimize and scale PyTorch models in production

License

Notifications You must be signed in to change notification settings

Kanya-Mo/serve

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TorchServe

Nightly build Docker Nightly build Benchmark Nightly Docker Regression Nightly KServe Regression Nightly Kubernetes Regression Nightly

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production.

Requires python >= 3.8

curl http://127.0.0.1:8080/predictions/bert -T input.txt

πŸš€ Quick start with TorchServe

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
pip install torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
pip install torchserve-nightly torch-model-archiver-nightly torch-workflow-archiver-nightly

πŸš€ Quick start with TorchServe (conda)

# Install dependencies
# cuda is optional
python ./ts_scripts/install_dependencies.py --cuda=cu121

# Latest release
conda install -c pytorch torchserve torch-model-archiver torch-workflow-archiver

# Nightly build
conda install -c pytorch-nightly torchserve torch-model-archiver torch-workflow-archiver

Getting started guide

🐳 Quick Start with Docker

# Latest release
docker pull pytorch/torchserve

# Nightly build
docker pull pytorch/torchserve-nightly

Refer to torchserve docker for details.

πŸ€– Quick Start LLM Deployment

#export token=<HUGGINGFACE_HUB_TOKEN>
docker build . -f docker/Dockerfile.llm -t ts/llm

docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token

curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"

Refer to [LLM deployment][docs/llm_deployment.md] for details and other methods.

⚑ Why TorchServe

πŸ€” How does TorchServe work

πŸ† Highlighted Examples

For more examples

πŸ›‘οΈ TorchServe Security Policy

SECURITY.md

πŸ€“ Learn More

https://pytorch.org/serve

πŸ«‚ Contributing

We welcome all contributions!

To learn more about how to contribute, see the contributor guide here.

πŸ“° News

πŸ’– All Contributors

Made with contrib.rocks.

βš–οΈ Disclaimer

This repository is jointly operated and maintained by Amazon, Meta and a number of individual contributors listed in the CONTRIBUTORS file. For questions directed at Meta, please send an email to [email protected]. For questions directed at Amazon, please send an email to [email protected]. For all other questions, please open up an issue in this repository here.

TorchServe acknowledges the Multi Model Server (MMS) project from which it was derived

About

Serve, optimize and scale PyTorch models in production

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 46.1%
  • Python 42.4%
  • C++ 6.5%
  • Shell 2.2%
  • Jupyter Notebook 1.8%
  • Dockerfile 0.5%
  • Other 0.5%