diff --git a/README.md b/README.md index 08586df612..4b49c3cfbe 100644 --- a/README.md +++ b/README.md @@ -52,155 +52,146 @@ limitations under the License. -A CPU runtime that takes advantage of sparsity within neural networks to reduce compute. Read [more about sparsification](https://docs.neuralmagic.com/user-guides/sparsification). -Neural Magic's DeepSparse is able to integrate into popular deep learning libraries (e.g., Hugging Face, Ultralytics) allowing you to leverage DeepSparse for loading and deploying sparse models with ONNX. -ONNX gives the flexibility to serve your model in a framework-agnostic environment. -Support includes [PyTorch,](https://pytorch.org/docs/stable/onnx.html) [TensorFlow,](https://github.com/onnx/tensorflow-onnx) [Keras,](https://github.com/onnx/keras-onnx) and [many other frameworks](https://github.com/onnx/onnxmltools). +[DeepSparse](https://github.com/neuralmagic/deepsparse) is a CPU inference runtime that takes advantage of sparsity within neural networks to execute inference quickly. Coupled with [SparseML](https://github.com/neuralmagic/sparseml), an open-source optimization library, DeepSparse enables you to achieve GPU-class performance on commodity hardware. + +
+ +
+ +For details of training a sparse model for deployment with DeepSparse, [check out SparseML](https://github.com/neuralmagic/sparseml). ## Installation -Install DeepSparse Community as follows: +DeepSparse is available in two editions: +1. DeepSparse Community is free for evaluation, research, and non-production use with our [DeepSparse Community License](https://neuralmagic.com/legal/engine-license-agreement/). +2. DeepSparse Enterprise requires a [trial license](https://neuralmagic.com/deepsparse-free-trial/) or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications. + +#### Install via Docker (Recommended) + +DeepSparse Community is available as a container image hosted on [GitHub container registry](https://github.com/neuralmagic/deepsparse/pkgs/container/deepsparse). ```bash -pip install deepsparse +docker pull ghcr.io/neuralmagic/deepsparse:1.4.2 +docker tag ghcr.io/neuralmagic/deepsparse:1.4.2 deepsparse-docker +docker run -it deepsparse-docker ``` -DeepSparse is available in two editions: -1. [**DeepSparse Community**](#installation) is open-source and free for evaluation, research, and non-production use with our [DeepSparse Community License](https://neuralmagic.com/legal/engine-license-agreement/). -2. [**DeepSparse Enterprise**](https://docs.neuralmagic.com/products/deepsparse-ent) requires a Trial License or [can be fully licensed](https://neuralmagic.com/legal/master-software-license-and-service-agreement/) for production, commercial applications. - -## 🧰 Hardware Support and System Requirements +- [Check out the Docker page](https://github.com/neuralmagic/deepsparse/tree/main/docker/) for more details. -To ensure that your CPU is compatible with DeepSparse, it is recommended to review the [Supported Hardware for DeepSparse](https://docs.neuralmagic.com/user-guides/deepsparse-engine/hardware-support) documentation. +#### Install via PyPI +DeepSparse Community is also available via PyPI. We recommend using a virtual enviornment. -To ensure that you get the best performance from DeepSparse, it has been thoroughly tested on Python versions 3.7-3.10, ONNX versions 1.5.0-1.12.0, ONNX opset version 11 or higher, and manylinux compliant systems. It is highly recommended to use a [virtual environment](https://docs.python.org/3/library/venv.html) when running DeepSparse. Please note that DeepSparse is only supported natively on Linux. For those using Mac or Windows, running Linux in a Docker or virtual machine is necessary to use DeepSparse. +```bash +pip install deepsparse +``` -## Features +- [Check out the Installation page](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/installation.md) for optional dependencies. -- 👩💻 Pipelines for [NLP](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/transformers), [CV Classification](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/image_classification), [CV Detection](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/yolo), [CV Segmentation](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/yolact) and more! -- 🔌 [DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server) -- 📜 [DeepSparse Benchmark](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark) -- ☁️ [Cloud Deployments and Demos](https://github.com/neuralmagic/deepsparse/tree/main/examples) +## Hardware Support and System Requirements -### 👩💻 Pipelines +[Supported Hardware for DeepSparse](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/hardware-support.md) -Pipelines are a high-level Python interface for running inference with DeepSparse across select tasks in NLP and CV: +DeepSparse is tested on Python versions 3.7-3.10, ONNX versions 1.5.0-1.12.0, ONNX opset version 11 or higher, and manylinux compliant systems. Please note that DeepSparse is only supported natively on Linux. For those using Mac or Windows, running Linux in a Docker or virtual machine is necessary to use DeepSparse. -| NLP | CV | -|-----------------------|---------------------------| -| Text Classification `"text_classification"` | Image Classification `"image_classification"` | -| Token Classification `"token_classification"` | Object Detection `"yolo"` | -| Sentiment Analysis `"sentiment_analysis"` | Instance Segmentation `"yolact"` | -| Question Answering `"question_answering"` | Keypoint Detection `"open_pif_paf"` | -| MultiLabel Text Classification `"text_classification"` | | -| Document Classification `"text_classification"` | | -| Zero-Shot Text Classification `"zero_shot_text_classification"` | | +## Deployment APIs +DeepSparse includes three deployment APIs: -**NLP Example** | Question Answering -```python -from deepsparse import Pipeline +- **Engine** is the lowest-level API. With Engine, you pass tensors and receive the raw logits. +- **Pipeline** wraps the Engine with pre- and post-processing. With Pipeline, you pass raw data and receive the prediction. +- **Server** wraps Pipelines with a REST API using FastAPI. With Server, you send raw data over HTTP and receive the prediction. -qa_pipeline = Pipeline.create( - task="question-answering", - model_path="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni", -) +### Engine -inference = qa_pipeline(question="What's my name?", context="My name is Snorlax") -``` -**CV Example** | Image Classification +The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, compiles the model, and runs inference on randomly generated input. ```python -from deepsparse import Pipeline +from deepsparse import Engine +from deepsparse.utils import generate_random_inputs, model_to_path -cv_pipeline = Pipeline.create( - task='image_classification', - model_path='zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none', -) +# download onnx, compile +zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none" +batch_size = 1 +compiled_model = Engine(model=zoo_stub, batch_size=batch_size) + +# run inference (input is raw numpy tensors, output is raw scores) +inputs = generate_random_inputs(model_to_path(zoo_stub), batch_size) +output = compiled_model(inputs) +print(output) -input_image = "my_image.png" -inference = cv_pipeline(images=input_image) +# > [array([[-0.3380675 , 0.09602544]], dtype=float32)] << raw scores ``` +### DeepSparse Pipelines -### 🔌 DeepSparse Server +Pipeline is the default API for interacting with DeepSparse. Similar to Hugging Face Pipelines, DeepSparse Pipelines wrap Engine with pre- and post-processing (as well as other utilities), enabling you to send raw data to DeepSparse and receive the post-processed prediction. -DeepSparse Server is a tool that enables you to serve your models and pipelines directly from your terminal. +The example below downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo, sets up a pipeline, and runs inference on sample data. -The server is built on top of two powerful libraries: the FastAPI web framework and the Uvicorn web server. This combination ensures that DeepSparse Server delivers excellent performance and reliability. Install with this command: +```python +from deepsparse import Pipeline -```bash -pip install deepsparse[server] +# download onnx, set up pipeline +zoo_stub = "zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none" +sentiment_analysis_pipeline = Pipeline.create( + task="sentiment-analysis", # name of the task + model_path=zoo_stub, # zoo stub or path to local onnx file +) + +# run inference (input is a sentence, output is the prediction) +prediction = sentiment_analysis_pipeline("I love using DeepSparse Pipelines") +print(prediction) +# > labels=['positive'] scores=[0.9954759478569031] ``` -#### Single Model +#### Additional Resources +- Check out the [Use Cases Page](https://github.com/neuralmagic/deepsparse/tree/main/docs/use-cases) for more details on supported tasks. +- Check out the [Pipelines User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-pipelines.md) for more usage details. -Once installed, the following example CLI command is available for running inference with a single BERT model: +### DeepSparse Server -```bash -deepsparse.server \ - task question_answering \ - --model_path "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni" -``` +Server wraps Pipelines with REST APIs, enabling you to stand up model serving endpoint running DeepSparse. This enables you to send raw data to DeepSparse over HTTP and receive the post-processed predictions. -To look up arguments run: `deepsparse.server --help`. - -#### Multiple Models -To deploy multiple models in your setup, a `config.yaml` file should be created. In the example provided, two BERT models are configured for the question-answering task: - -```yaml -num_workers: 1 -endpoints: - - task: question_answering - route: /predict/question_answering/base - model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none - batch_size: 1 - - task: question_answering - route: /predict/question_answering/pruned_quant - model: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni - batch_size: 1 -``` +DeepSparse Server is launched from the command line, configured via arguments or a server configuration file. The following downloads a 90% pruned-quantized BERT model for sentiment analysis in ONNX format from SparseZoo and launches a sentiment analysis endpoint: -After the `config.yaml` file has been created, the server can be started by passing the file path as an argument: ```bash -deepsparse.server config config.yaml +deepsparse.server \ + --task sentiment-analysis \ + --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none ``` -Read the [DeepSparse Server](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server) README for further details. - -### 📜 DeepSparse Benchmark +Sending a request: -DeepSparse Benchmark, a command-line (CLI) tool, is used to evaluate the DeepSparse Engine's performance with ONNX models. This tool processes arguments, downloads and compiles the network into the engine, creates input tensors, and runs the model based on the selected scenario. - -Run `deepsparse.benchmark -h` to look up arguments: +```python +import requests -```shell -deepsparse.benchmark [-h] [-b BATCH_SIZE] [-i INPUT_SHAPES] [-ncores NUM_CORES] [-s {async,sync,elastic}] [-t TIME] - [-w WARMUP_TIME] [-nstreams NUM_STREAMS] [-pin {none,core,numa}] [-e ENGINE] [-q] [-x EXPORT_PATH] - model_path +url = "http://localhost:5543/predict" # Server's port default to 5543 +obj = {"sequences": "Snorlax loves my Tesla!"} +response = requests.post(url, json=obj) +print(response.text) +# {"labels":["positive"],"scores":[0.9965094327926636]} ``` +#### Additional Resources +- Check out the [Use Cases Page](https://github.com/neuralmagic/deepsparse/tree/main/docs/use-cases) for more details on supported tasks. +- Check out the [Server User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-server.md) for more usage details. -Refer to the [Benchmark](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark) README for examples of specific inference scenarios. +## ONNX -### 🦉 Custom ONNX Model Support +DeepSparse accepts models in the ONNX format. ONNX models can be passed in one of two ways: -DeepSparse is capable of accepting ONNX models from two sources: +- **SparseZoo Stub**: [SparseZoo](https://sparsezoo.neuralmagic.com/) is an open-source repository of sparse models. The examples on this page use SparseZoo stubs to identify models and download them for deployment in DeepSparse. -**SparseZoo ONNX**: This is an open-source repository of sparse models available for download. [SparseZoo](https://github.com/neuralmagic/sparsezoo) offers inference-optimized models, which are trained using repeatable sparsification recipes and state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml). - -**Custom ONNX**: Users can provide their own ONNX models, whether dense or sparse. By plugging in a custom model, users can compare its performance with other solutions. +- **Local ONNX File**: Users can provide their own ONNX models, whether dense or sparse. For example: ```bash -> wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx -Saving to: ‘mobilenetv2-7.onnx’ +wget https://github.com/onnx/models/raw/main/vision/classification/mobilenet/model/mobilenetv2-7.onnx ``` -Custom ONNX Benchmark example: ```python -from deepsparse import compile_model +from deepsparse import Engine from deepsparse.utils import generate_random_inputs onnx_filepath = "mobilenetv2-7.onnx" batch_size = 16 @@ -209,34 +200,35 @@ batch_size = 16 inputs = generate_random_inputs(onnx_filepath, batch_size) # Compile and run -engine = compile_model(onnx_filepath, batch_size) -outputs = engine.run(inputs) +compiled_model = Engine(model=onnx_filepath, batch_size=batch_size) +outputs = compiled_model(inputs) +print(outputs[0].shape) +# (16, 1000) << batch, num_classes ``` -The [GitHub repository](https://github.com/neuralmagic/deepsparse) repository contains package APIs and examples that help users swiftly begin benchmarking and performing inference on sparse models. - -### Scheduling Single-Stream, Multi-Stream, and Elastic Inference +## Inference Modes -DeepSparse offers different inference scenarios based on your use case. Read more details here: [Inference Types](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md). +DeepSparse offers different inference scenarios based on your use case. -⚡ **Single-stream** scheduling: the latency/synchronous scenario, requests execute serially. [`default`] +**Single-stream** scheduling: the latency/synchronous scenario, requests execute serially. [`default`] It's highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets. -⚡ **Multi-stream** scheduling: the throughput/asynchronous scenario, requests execute in parallel. +**Multi-stream** scheduling: the throughput/asynchronous scenario, requests execute in parallel. The most common use cases for the multi-stream scheduler are where parallelism is low with respect to core count, and where requests need to be made asynchronously without time to batch them. -## Resources -#### Libraries -- [DeepSparse](https://docs.neuralmagic.com/deepsparse/) -- [SparseML](https://docs.neuralmagic.com/sparseml/) -- [SparseZoo](https://docs.neuralmagic.com/sparsezoo/) -- [Sparsify](https://docs.neuralmagic.com/sparsify/) +- [Check out the Scheduler User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/scheduler.md) for more details. + +## Additional Resources +- [Benchmarking Performance](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide/deepsparse-benchmarking.md) +- [User Guide](https://github.com/neuralmagic/deepsparse/tree/main/docs/user-guide) +- [Use Cases](https://github.com/neuralmagic/deepsparse/tree/main/docs/use-cases) +- [Cloud Deployments and Demos](https://github.com/neuralmagic/deepsparse/tree/main/examples/) #### Versions - [DeepSparse](https://pypi.org/project/deepsparse) | stable @@ -251,7 +243,6 @@ The most common use cases for the multi-stream scheduler are where parallelism i ### Be Part of the Future... And the Future is Sparse! - Contribute with code, examples, integrations, and documentation as well as bug reports and feature requests! [Learn how here.](https://github.com/neuralmagic/deepsparse/blob/main/CONTRIBUTING.md) For user help or questions about DeepSparse, sign up or log in to our **[Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)**. We are growing the community member by member and happy to see you there. Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue.](https://github.com/neuralmagic/deepsparse/issues) You can get the latest news, webinar and event invites, research papers, and other ML Performance tidbits by [subscribing](https://neuralmagic.com/subscribe/) to the Neural Magic community. diff --git a/docs/neural-magic-workflow.png b/docs/neural-magic-workflow.png new file mode 100644 index 0000000000..f870b4e97c Binary files /dev/null and b/docs/neural-magic-workflow.png differ diff --git a/docs/_static/css/nm-theme-adjustment.css b/docs/old/_static/css/nm-theme-adjustment.css similarity index 100% rename from docs/_static/css/nm-theme-adjustment.css rename to docs/old/_static/css/nm-theme-adjustment.css diff --git a/docs/_templates/versions.html b/docs/old/_templates/versions.html similarity index 100% rename from docs/_templates/versions.html rename to docs/old/_templates/versions.html diff --git a/docs/api/.gitkeep b/docs/old/api/.gitkeep similarity index 100% rename from docs/api/.gitkeep rename to docs/old/api/.gitkeep diff --git a/docs/api/deepsparse.rst b/docs/old/api/deepsparse.rst similarity index 100% rename from docs/api/deepsparse.rst rename to docs/old/api/deepsparse.rst diff --git a/docs/api/deepsparse.transformers.rst b/docs/old/api/deepsparse.transformers.rst similarity index 100% rename from docs/api/deepsparse.transformers.rst rename to docs/old/api/deepsparse.transformers.rst diff --git a/docs/api/deepsparse.utils.rst b/docs/old/api/deepsparse.utils.rst similarity index 100% rename from docs/api/deepsparse.utils.rst rename to docs/old/api/deepsparse.utils.rst diff --git a/docs/api/modules.rst b/docs/old/api/modules.rst similarity index 100% rename from docs/api/modules.rst rename to docs/old/api/modules.rst diff --git a/docs/conf.py b/docs/old/conf.py similarity index 100% rename from docs/conf.py rename to docs/old/conf.py diff --git a/docs/debugging-optimizing/diagnostics-debugging.md b/docs/old/debugging-optimizing/diagnostics-debugging.md similarity index 100% rename from docs/debugging-optimizing/diagnostics-debugging.md rename to docs/old/debugging-optimizing/diagnostics-debugging.md diff --git a/docs/debugging-optimizing/example-log.md b/docs/old/debugging-optimizing/example-log.md similarity index 100% rename from docs/debugging-optimizing/example-log.md rename to docs/old/debugging-optimizing/example-log.md diff --git a/docs/debugging-optimizing/index.rst b/docs/old/debugging-optimizing/index.rst similarity index 100% rename from docs/debugging-optimizing/index.rst rename to docs/old/debugging-optimizing/index.rst diff --git a/docs/debugging-optimizing/numactl-utility.md b/docs/old/debugging-optimizing/numactl-utility.md similarity index 100% rename from docs/debugging-optimizing/numactl-utility.md rename to docs/old/debugging-optimizing/numactl-utility.md diff --git a/docs/favicon.ico b/docs/old/favicon.ico similarity index 100% rename from docs/favicon.ico rename to docs/old/favicon.ico diff --git a/docs/index.rst b/docs/old/index.rst similarity index 100% rename from docs/index.rst rename to docs/old/index.rst diff --git a/docs/source/c++api-overview.md b/docs/old/source/c++api-overview.md similarity index 100% rename from docs/source/c++api-overview.md rename to docs/old/source/c++api-overview.md diff --git a/docs/source/hardware.md b/docs/old/source/hardware.md similarity index 100% rename from docs/source/hardware.md rename to docs/old/source/hardware.md diff --git a/docs/source/icon-deepsparse.png b/docs/old/source/icon-deepsparse.png similarity index 100% rename from docs/source/icon-deepsparse.png rename to docs/old/source/icon-deepsparse.png diff --git a/docs/source/multi-stream.png b/docs/old/source/multi-stream.png similarity index 100% rename from docs/source/multi-stream.png rename to docs/old/source/multi-stream.png diff --git a/docs/source/scheduler.md b/docs/old/source/scheduler.md similarity index 100% rename from docs/source/scheduler.md rename to docs/old/source/scheduler.md diff --git a/docs/source/single-stream.png b/docs/old/source/single-stream.png similarity index 100% rename from docs/source/single-stream.png rename to docs/old/source/single-stream.png diff --git a/docs/use-cases/README.md b/docs/use-cases/README.md new file mode 100644 index 0000000000..8d7532d398 --- /dev/null +++ b/docs/use-cases/README.md @@ -0,0 +1,90 @@ + + +# Use Cases + +There are three interfaces for interacting with DeepSparse: + +- **Engine** is the lowest-level API that enables you to compile a model and run inference on raw input tensors. + +- **Pipeline** is the default DeepSparse API. Similar to Hugging Face Pipelines, it wraps Engine with task-specific pre-processing and post-processing steps, allowing you to make requests on raw data and receive post-processed predictions. + +- **Server** is a REST API wrapper around Pipelines built on FastAPI and Uvicorn. It enables you to start a model serving endpoint running DeepSparse with a single CLI. + +This directory offers examples using each API in various supported tasks. + +### Supported Tasks + +DeepSparse supports the following tasks out of the box: + +| NLP | CV | +|-----------------------|---------------------------| +| [Text Classification `"text-classification"`](nlp/text-classification.md) | [Image Classification `"image_classification"`](cv/image-classification.md) | +| [Token Classification `"token-classification"`](nlp/token-classification.md) | [Object Detection `"yolo"`](cv/object-detection-yolov5.md) | +| [Sentiment Analysis `"sentiment-analysis"`](nlp/sentiment-analysis.md) | [Instance Segmentation `"yolact"`](cv/image-segmentation-yolact.md) | +| [Question Answering `"question-answering"`](nlp/question-answering.md) | | +| [Zero-Shot Text Classification `"zero-shot-text-classification"`](nlp/zero-shot-text-classification.md) | | +| [Embedding Extraction `"transformers_embedding_extraction"`](nlp/transformers-embedding-extraction.md) | | + +### Examples + +**Pipeline Example** | Sentiment Analysis + +Here's an example of how a task is used to create a Pipeline: + +```python +from deepsparse import Pipeline + +pipeline = Pipeline.create( + task="sentiment_analysis", + model_path="zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none") + +print(pipeline("I love DeepSparse Pipelines!")) +# labels=['positive'] scores=[0.998009443283081] +``` + +**Server Example** | Sentiment Analysis + +Here's an example of how a task is used to create a Server: + +```bash +deepsparse.server \ + --task sentiment_analysis \ + --model_path zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none +``` + +Making a request: + +```python +import requests + +# Uvicorn is running on this port +url = 'http://0.0.0.0:5543/predict' + +# send the data +obj = {"sequences": "Sending requests to DeepSparse Server is fast and easy!"} +resp = requests.post(url=url, json=obj) + +# recieve the post-processed output +print(resp.text) +# >> {"labels":["positive"],"scores":[0.9330279231071472]} +``` + +### Additional Resources + +- [Custom Tasks](../user-guide/deepsparse-pipelines.md#custom-use-case) +- [Pipeline User Guide](../user-guide/deepsparse-pipelines.md) +- [Server User Guide](../user-guide/deepsparse-server.md) diff --git a/docs/use-cases/cv/embedding-extraction.md b/docs/use-cases/cv/embedding-extraction.md new file mode 100644 index 0000000000..ff7e9f7ad1 --- /dev/null +++ b/docs/use-cases/cv/embedding-extraction.md @@ -0,0 +1,130 @@ + + +# Deploying Embedding Extraction Models With DeepSparse +This page explains how to deploy an Embedding Extraction Pipeline with DeepSparse. + +## Installation Requirements +This use case requires the installation of [DeepSparse Server](../../user-guide/installation.md). + +Confirm your machine is compatible with our [hardware requirements](../../user-guide/hardware-support.md). + +## Model Format +The Embedding Extraction Pipeline enables you to generate embeddings in any domain, meaning you can use it with any ONNX model. It (optionally) removes the projection head from the model, such that you can re-use SparseZoo models and custom models you have trained in the embedding extraction scenario. + +There are two options for passing a model to the Embedding Extraction Pipeline: + +- Pass a Local ONNX File +- Pass a SparseZoo Stub (which identifies an ONNX model in the SparseZoo) + +## DeepSparse Pipelines +Pipeline is the default interface for interacting with DeepSparse. + +Like Hugging Face Pipelines, DeepSparse Pipelines wrap pre- and post-processing around the inference performed by the Engine. This creates a clean API that allows you to pass raw text and images to DeepSparse and receive the post-processed predictions, making it easy to add DeepSparse to your application. + +We will use the `Pipeline.create()` constructor to create an instance of an embedding extraction Pipeline with a 95% pruned-quantized version of ResNet-50 trained on `imagenet`. We can then pass images the `Pipeline` and receive the embeddings. All of the pre-processing is handled by the `Pipeline`. + +The Embedding Extraction Pipeline handles some useful actions around inference: + +- First, on initialization, the Pipeline (optionally) removes a projection head from a model. You can use the `emb_extraction_layer` argument to specify which layer to return. If your ONNX model has no projection head, you can set `emb_extraction_layer=None` (the default) to skip this step. + +- Second, as with all DeepSparse Pipelines, it handles pre-processing such that you can pass raw input. You will notice that in addition to the typical task argument used in `Pipeline.create()`, the Embedding Extraction Pipeline includes a `base_task` argument. This argument tells the Pipeline the domain of the model, such that the Pipeline can figure out what pre-processing to do. + +Download an image to use with the Pipeline. +```bash +wget https://huggingface.co/spaces/neuralmagic/image-classification/resolve/main/lion.jpeg +``` + +This is an example of extracting the last layer from ResNet-50: + +```python +from deepsparse import Pipeline + +# this step removes the projection head before compiling the model +rn50_embedding_pipeline = Pipeline.create( + task="embedding-extraction", + base_task="image-classification", # tells the pipeline to expect images and normalize input with ImageNet means/stds + model_path="zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none", + emb_extraction_layer=-3, # extracts last layer before projection head and softmax +) + +# this step runs pre-processing, inference and returns an embedding +embedding = rn50_embedding_pipeline(images="lion.jpeg") +print(len(embedding.embeddings[0][0])) +# 2048 << size of final layer>> +``` + +### Cross Use Case Functionality +Check out the [Pipeline User Guide](../../user-guide/deepsparse-pipelines.md) for more details on configuring the Pipeline. + +## DeepSparse Server +As an alternative to the Python API, DeepSparse Server allows you to serve an Embedding Extraction Pipeline over HTTP. Configuring the server uses the same parameters and schemas as the Pipelines. + +Once launched, a `/docs` endpoint is created with full endpoint descriptions and support for making sample requests. + +This configuration file sets `emb_extraction_layer` to -3: +```yaml +# config.yaml +endpoints: + - task: embedding_extraction + model: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none + kwargs: + base_task: image_classification + emb_extraction_layer: -3 +``` +Spin up the server: +```bash +deepsparse.server --config_file config.yaml +``` + +Make requests to the server: +```python +import requests, json +url = "http://0.0.0.0:5543/predict/from_files" +paths = ["lion.jpeg"] +files = [("request", open(img, 'rb')) for img in paths] +resp = requests.post(url=url, files=files) +result = json.loads(resp.text) + +print(len(result["embeddings"][0][0])) + +# 2048 << size of final layer>> +``` +## Using a Custom ONNX File +Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files for embedding extraction. + +The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training. +Click Download on the [ResNet-50 - ImageNet page](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_uniform_quant-none) to download a ONNX ResNet model for demonstration. + +Extract the downloaded file and use the ResNet-50 ONNX model for embedding extraction: +```python +from deepsparse import Pipeline + +# this step removes the projection head before compiling the model +rn50_embedding_pipeline = Pipeline.create( + task="embedding-extraction", + base_task="image-classification", # tells the pipeline to expect images and normalize input with ImageNet means/stds + model_path="resnet.onnx", + emb_extraction_layer=-3, # extracts last layer before projection head and softmax +) + +# this step runs pre-processing, inference and returns an embedding +embedding = rn50_embedding_pipeline(images="lion.jpeg") +print(len(embedding.embeddings[0][0])) +# 2048 +``` +### Cross Use Case Functionality +Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server. diff --git a/docs/use-cases/cv/image-classification.md b/docs/use-cases/cv/image-classification.md new file mode 100644 index 0000000000..6d99374dd2 --- /dev/null +++ b/docs/use-cases/cv/image-classification.md @@ -0,0 +1,283 @@ + + +# Deploying Image Classification Models with DeepSparse + +This page explains how to benchmark and deploy an image classification model with DeepSparse. + +There are three interfaces for interacting with DeepSparse: +- **Engine** is the lowest-level API that enables you to compile a model and run inference on raw input tensors. + +- **Pipeline** is the default DeepSparse API. Similar to Hugging Face Pipelines, it wraps Engine with pre-processing +and post-processing steps, allowing you to make requests on raw data and receive post-processed predictions. + +- **Server** is a REST API wrapper around Pipelines built on [FastAPI](https://fastapi.tiangolo.com/) and [Uvicorn](https://www.uvicorn.org/). It enables you to start a model serving +endpoint running DeepSparse with a single CLI. + +This example uses ResNet-50. For a full list of pre-sparsified image classification models, [check out the SparseZoo](https://sparsezoo.neuralmagic.com/?domain=cv&sub_domain=classification&page=1). + +## Installation Requirements + +This use case requires the installation of [DeepSparse Server](../../user-guide/installation.md). + +Confirm your machine is compatible with our [hardware requirements](../../user-guide/hardware-support.md). + +## Benchmarking + +We can use the benchmarking utility to demonstrate the DeepSparse's performance. We ran the numbers below on an AWS `c6i.2xlarge` instance (4 cores). + +### ONNX Runtime Baseline + +As a baseline, let's check out ONNX Runtime's performance on ResNet-50. Make sure you have ORT installed (`pip install onnxruntime`). + +```bash +deepsparse.benchmark \ + zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/base-none \ + -b 64 -s sync -nstreams 1 \ + -e onnxruntime + +> Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/base-none +> Batch Size: 64 +> Scenario: sync +> Throughput (items/sec): 71.83 +``` +ONNX Runtime achieves 72 items/second with batch 64. + +### DeepSparse Speedup + +Now, let's run DeepSparse on an inference-optimized sparse version of ResNet-50. This model has been 95% pruned, while retaining >99% accuracy of the dense baseline on the `imagenet` dataset. + +```bash +deepsparse.benchmark \ + zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none \ + -b 64 -s sync -nstreams 1 \ + -e deepsparse + +> Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none +> Batch Size: 64 +> Scenario: sync +> Throughput (items/sec): 345.69 +``` + +DeepSparse achieves 346 items/second, an 4.8x speed-up over ONNX Runtime! + +## DeepSparse Engine +Engine is the lowest-level API for interacting with DeepSparse. As much as possible, we recommended using the Pipeline API but Engine is available if you want to handle pre- or post-processing yourself. + +With Engine, we can compile an ONNX file and run inference on raw tensors. + +Here's an example, using a 95% pruned-quantized ResNet-50 trained on `imagenet` from SparseZoo: +```python +from deepsparse import Engine +from deepsparse.utils import generate_random_inputs, model_to_path +import numpy as np + +# download onnx from sparsezoo and compile with batchsize 1 +sparsezoo_stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" +batch_size = 1 +compiled_model = Engine( + model=sparsezoo_stub, # sparsezoo stub or path to local ONNX + batch_size=batch_size # defaults to batch size 1 +) + +# input is raw numpy tensors, output is raw scores for classes +inputs = generate_random_inputs(model_to_path(sparsezoo_stub), batch_size) +output = compiled_model(inputs) +print(output) + +# [array([[-7.73529887e-01, 1.67251182e+00, -1.68212160e-01, +# .... +# 1.26290070e-05, 2.30549040e-06, 2.97072188e-06, 1.90549777e-04]], dtype=float32)] +``` +## DeepSparse Pipelines +Pipeline is the default interface for interacting with DeepSparse. + +Like Hugging Face Pipelines, DeepSparse Pipelines wrap pre- and post-processing around the inference performed by the Engine. This creates a clean API that allows you to pass raw text and images to DeepSparse and receive the post-processed predictions, making it easy to add DeepSparse to your application. + +Let's start by downloading a sample image: +```bash +wget https://huggingface.co/spaces/neuralmagic/image-classification/resolve/main/lion.jpeg +``` + +We will use the `Pipeline.create()` constructor to create an instance of an image classification Pipeline with a 90% pruned-quantized version of ResNet-50. We can then pass images to the Pipeline and receive the predictions. All the pre-processing (such as resizing the images and normalizing the inputs) is handled by the `Pipeline`. + +Passing the image as a JPEG to the Pipeline: + +```python +from deepsparse import Pipeline + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" +pipeline = Pipeline.create( + task="image_classification", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX +) + +# run inference on image file +prediction = pipeline(images=["lion.jpeg"]) +print(prediction.labels) +# [291] << class index of "lion" in imagenet +``` + +Passing the image as a numpy array to the Pipeline: + +```python +from deepsparse import Pipeline +from PIL import Image +import numpy as np + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" +pipeline = Pipeline.create( + task="image_classification", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX +) + +im = np.array(Image.open("lion.jpeg")) + +# run inference on image file +prediction = pipeline(images=[im]) +print(prediction.labels) + +# [291] << class index of "lion" in imagenet +``` + +### Use Case Specific Arguments +The Image Classification Pipeline contains additional arguments for configuring a `Pipeline`. + +#### Top K + +The `top_k` argument specifies the number of classes to return in the prediction. + +```python +from deepsparse import Pipeline + +sparsezoo_stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" +pipeline = Pipeline.create( + task="image_classification", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX + top_k=3, +) + +# run inference on image file +prediction = pipeline(images="lion.jpeg") +print(prediction.labels) +# labels=[291, 260, 244] +``` +#### Class Names + +The `class_names` argument defines a dictionary containing the desired class mappings. + +```python +from deepsparse import Pipeline + +classes = {0: 'tench, Tinca tinca',1: 'goldfish, Carassius auratus',2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',3: 'tiger shark, Galeocerdo cuvieri',4: 'hammerhead, hammerhead shark',5: 'electric ray, crampfish, numbfish, torpedo',6: 'stingray',7: 'cock', 8: 'hen', 9: 'ostrich, Struthio camelus', 10: 'brambling, Fringilla montifringilla', 11: 'goldfinch, Carduelis carduelis', 12: 'house finch, linnet, Carpodacus mexicanus', 13: 'junco, snowbird', 14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea', 15: 'robin, American robin, Turdus migratorius', 16: 'bulbul', 17: 'jay', 18: 'magpie', 19: 'chickadee', 20: 'water ouzel, dipper', 21: 'kite', 22: 'bald eagle, American eagle, Haliaeetus leucocephalus', 23: 'vulture', 24: 'great grey owl, great gray owl, Strix nebulosa', 25: 'European fire salamander, Salamandra salamandra', 26: 'common newt, Triturus vulgaris', 27: 'eft', 28: 'spotted salamander, Ambystoma maculatum', 29: 'axolotl, mud puppy, Ambystoma mexicanum', 30: 'bullfrog, Rana catesbeiana', 31: 'tree frog, tree-frog', 32: 'tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui', 33: 'loggerhead, loggerhead turtle, Caretta caretta', 34: 'leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea', 35: 'mud turtle', 36: 'terrapin', 37: 'box turtle, box tortoise', 38: 'banded gecko', 39: 'common iguana, iguana, Iguana iguana', 40: 'American chameleon, anole, Anolis carolinensis', 41: 'whiptail, whiptail lizard', 42: 'agama', 43: 'frilled lizard, Chlamydosaurus kingi', 44: 'alligator lizard', 45: 'Gila monster, Heloderma suspectum', 46: 'green lizard, Lacerta viridis', 47: 'African chameleon, Chamaeleo chamaeleon', 48: 'Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis', 49: 'African crocodile, Nile crocodile, Crocodylus niloticus', 50: 'American alligator, Alligator mississipiensis', 51: 'triceratops', 52: 'thunder snake, worm snake, Carphophis amoenus', 53: 'ringneck snake, ring-necked snake, ring snake', 54: 'hognose snake, puff adder, sand viper', 55: 'green snake, grass snake', 56: 'king snake, kingsnake', 57: 'garter snake, grass snake', 58: 'water snake', 59: 'vine snake', 60: 'night snake, Hypsiglena torquata', 61: 'boa constrictor, Constrictor constrictor', 62: 'rock python, rock snake, Python sebae', 63: 'Indian cobra, Naja naja', 64: 'green mamba', 65: 'sea snake', 66: 'horned viper, cerastes, sand viper, horned asp, Cerastes cornutus', 67: 'diamondback, diamondback rattlesnake, Crotalus adamanteus', 68: 'sidewinder, horned rattlesnake, Crotalus cerastes', 69: 'trilobite', 70: 'harvestman, daddy longlegs, Phalangium opilio', 71: 'scorpion', 72: 'black and gold garden spider, Argiope aurantia', 73: 'barn spider, Araneus cavaticus', 74: 'garden spider, Aranea diademata', 75: 'black widow, Latrodectus mactans', 76: 'tarantula', 77: 'wolf spider, hunting spider', 78: 'tick', 79: 'centipede', 80: 'black grouse', 81: 'ptarmigan', 82: 'ruffed grouse, partridge, Bonasa umbellus', 83: 'prairie chicken, prairie grouse, prairie fowl', 84: 'peacock', 85: 'quail', 86: 'partridge', 87: 'African grey, African gray, Psittacus erithacus', 88: 'macaw', 89: 'sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita', 90: 'lorikeet', 91: 'coucal', 92: 'bee eater', 93: 'hornbill', 94: 'hummingbird', 95: 'jacamar', 96: 'toucan', 97: 'drake', 98: 'red-breasted merganser, Mergus serrator', 99: 'goose', 100: 'black swan, Cygnus atratus', 101: 'tusker', 102: 'echidna, spiny anteater, anteater', 103: 'platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus', 104: 'wallaby, brush kangaroo', 105: 'koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus', 106: 'wombat', 107: 'jellyfish', 108: 'sea anemone, anemone', 109: 'brain coral', 110: 'flatworm, platyhelminth', 111: 'nematode, nematode worm, roundworm', 112: 'conch', 113: 'snail', 114: 'slug', 115: 'sea slug, nudibranch', 116: 'chiton, coat-of-mail shell, sea cradle, polyplacophore', 117: 'chambered nautilus, pearly nautilus, nautilus', 118: 'Dungeness crab, Cancer magister', 119: 'rock crab, Cancer irroratus', 120: 'fiddler crab', 121: 'king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica', 122: 'American lobster, Northern lobster, Maine lobster, Homarus americanus', 123: 'spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish', 124: 'crayfish, crawfish, crawdad, crawdaddy', 125: 'hermit crab', 126: 'isopod', 127: 'white stork, Ciconia ciconia', 128: 'black stork, Ciconia nigra', 129: 'spoonbill', 130: 'flamingo', 131: 'little blue heron, Egretta caerulea', 132: 'American egret, great white heron, Egretta albus', 133: 'bittern', 134: 'crane', 135: 'limpkin, Aramus pictus', 136: 'European gallinule, Porphyrio porphyrio', 137: 'American coot, marsh hen, mud hen, water hen, Fulica americana', 138: 'bustard', 139: 'ruddy turnstone, Arenaria interpres', 140: 'red-backed sandpiper, dunlin, Erolia alpina', 141: 'redshank, Tringa totanus', 142: 'dowitcher', 143: 'oystercatcher, oyster catcher', 144: 'pelican', 145: 'king penguin, Aptenodytes patagonica', 146: 'albatross, mollymawk', 147: 'grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus', 148: 'killer whale, killer, orca, grampus, sea wolf, Orcinus orca', 149: 'dugong, Dugong dugon', 150: 'sea lion', 151: 'Chihuahua', 152: 'Japanese spaniel', 153: 'Maltese dog, Maltese terrier, Maltese', 154: 'Pekinese, Pekingese, Peke', 155: 'Shih-Tzu', 156: 'Blenheim spaniel', 157: 'papillon', 158: 'toy terrier', 159: 'Rhodesian ridgeback', 160: 'Afghan hound, Afghan', 161: 'basset, basset hound', 162: 'beagle', 163: 'bloodhound, sleuthhound', 164: 'bluetick', 165: 'black-and-tan coonhound', 166: 'Walker hound, Walker foxhound', 167: 'English foxhound', 168: 'redbone', 169: 'borzoi, Russian wolfhound', 170: 'Irish wolfhound', 171: 'Italian greyhound', 172: 'whippet', 173: 'Ibizan hound, Ibizan Podenco', 174: 'Norwegian elkhound, elkhound', 175: 'otterhound, otter hound', 176: 'Saluki, gazelle hound', 177: 'Scottish deerhound, deerhound', 178: 'Weimaraner', 179: 'Staffordshire bullterrier, Staffordshire bull terrier', 180: 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier', 181: 'Bedlington terrier', 182: 'Border terrier', 183: 'Kerry blue terrier', 184: 'Irish terrier', 185: 'Norfolk terrier', 186: 'Norwich terrier', 187: 'Yorkshire terrier', 188: 'wire-haired fox terrier', 189: 'Lakeland terrier', 190: 'Sealyham terrier, Sealyham', 191: 'Airedale, Airedale terrier', 192: 'cairn, cairn terrier', 193: 'Australian terrier', 194: 'Dandie Dinmont, Dandie Dinmont terrier', 195: 'Boston bull, Boston terrier', 196: 'miniature schnauzer', 197: 'giant schnauzer', 198: 'standard schnauzer', 199: 'Scotch terrier, Scottish terrier, Scottie', 200: 'Tibetan terrier, chrysanthemum dog', 201: 'silky terrier, Sydney silky', 202: 'soft-coated wheaten terrier', 203: 'West Highland white terrier', 204: 'Lhasa, Lhasa apso', 205: 'flat-coated retriever', 206: 'curly-coated retriever', 207: 'golden retriever', 208: 'Labrador retriever', 209: 'Chesapeake Bay retriever', 210: 'German short-haired pointer', 211: 'vizsla, Hungarian pointer', 212: 'English setter', 213: 'Irish setter, red setter', 214: 'Gordon setter', 215: 'Brittany spaniel', 216: 'clumber, clumber spaniel', 217: 'English springer, English springer spaniel', 218: 'Welsh springer spaniel', 219: 'cocker spaniel, English cocker spaniel, cocker', 220: 'Sussex spaniel', 221: 'Irish water spaniel', 222: 'kuvasz', 223: 'schipperke', 224: 'groenendael', 225: 'malinois', 226: 'briard', 227: 'kelpie', 228: 'komondor', 229: 'Old English sheepdog, bobtail', 230: 'Shetland sheepdog, Shetland sheep dog, Shetland', 231: 'collie', 232: 'Border collie', 233: 'Bouvier des Flandres, Bouviers des Flandres', 234: 'Rottweiler', 235: 'German shepherd, German shepherd dog, German police dog, alsatian', 236: 'Doberman, Doberman pinscher', 237: 'miniature pinscher', 238: 'Greater Swiss Mountain dog', 239: 'Bernese mountain dog', 240: 'Appenzeller', 241: 'EntleBucher', 242: 'boxer', 243: 'bull mastiff', 244: 'Tibetan mastiff', 245: 'French bulldog', 246: 'Great Dane', 247: 'Saint Bernard, St Bernard', 248: 'Eskimo dog, husky', 249: 'malamute, malemute, Alaskan malamute', 250: 'Siberian husky', 251: 'dalmatian, coach dog, carriage dog', 252: 'affenpinscher, monkey pinscher, monkey dog', 253: 'basenji', 254: 'pug, pug-dog', 255: 'Leonberg', 256: 'Newfoundland, Newfoundland dog', 257: 'Great Pyrenees', 258: 'Samoyed, Samoyede', 259: 'Pomeranian', 260: 'chow, chow chow', 261: 'keeshond', 262: 'Brabancon griffon', 263: 'Pembroke, Pembroke Welsh corgi', 264: 'Cardigan, Cardigan Welsh corgi', 265: 'toy poodle', 266: 'miniature poodle', 267: 'standard poodle', 268: 'Mexican hairless', 269: 'timber wolf, grey wolf, gray wolf, Canis lupus', 270: 'white wolf, Arctic wolf, Canis lupus tundrarum', 271: 'red wolf, maned wolf, Canis rufus, Canis niger', 272: 'coyote, prairie wolf, brush wolf, Canis latrans', 273: 'dingo, warrigal, warragal, Canis dingo', 274: 'dhole, Cuon alpinus', 275: 'African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus', 276: 'hyena, hyaena', 277: 'red fox, Vulpes vulpes', 278: 'kit fox, Vulpes macrotis', 279: 'Arctic fox, white fox, Alopex lagopus', 280: 'grey fox, gray fox, Urocyon cinereoargenteus', 281: 'tabby, tabby cat', 282: 'tiger cat', 283: 'Persian cat', 284: 'Siamese cat, Siamese', 285: 'Egyptian cat', 286: 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor', 287: 'lynx, catamount', 288: 'leopard, Panthera pardus', 289: 'snow leopard, ounce, Panthera uncia', 290: 'jaguar, panther, Panthera onca, Felis onca', 291: 'lion, king of beasts, Panthera leo', 292: 'tiger, Panthera tigris', 293: 'cheetah, chetah, Acinonyx jubatus', 294: 'brown bear, bruin, Ursus arctos', 295: 'American black bear, black bear, Ursus americanus, Euarctos americanus', 296: 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus', 297: 'sloth bear, Melursus ursinus, Ursus ursinus', 298: 'mongoose', 299: 'meerkat, mierkat', 300: 'tiger beetle', 301: 'ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle', 302: 'ground beetle, carabid beetle', 303: 'long-horned beetle, longicorn, longicorn beetle', 304: 'leaf beetle, chrysomelid', 305: 'dung beetle', 306: 'rhinoceros beetle', 307: 'weevil', 308: 'fly', 309: 'bee', 310: 'ant, emmet, pismire', 311: 'grasshopper, hopper', 312: 'cricket', 313: 'walking stick, walkingstick, stick insect', 314: 'cockroach, roach', 315: 'mantis, mantid', 316: 'cicada, cicala', 317: 'leafhopper', 318: 'lacewing, lacewing fly', 319: "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk", 320: 'damselfly', 321: 'admiral', 322: 'ringlet, ringlet butterfly', 323: 'monarch, monarch butterfly, milkweed butterfly, Danaus plexippus', 324: 'cabbage butterfly', 325: 'sulphur butterfly, sulfur butterfly', 326: 'lycaenid, lycaenid butterfly', 327: 'starfish, sea star', 328: 'sea urchin', 329: 'sea cucumber, holothurian', 330: 'wood rabbit, cottontail, cottontail rabbit', 331: 'hare', 332: 'Angora, Angora rabbit', 333: 'hamster', 334: 'porcupine, hedgehog', 335: 'fox squirrel, eastern fox squirrel, Sciurus niger', 336: 'marmot', 337: 'beaver', 338: 'guinea pig, Cavia cobaya', 339: 'sorrel', 340: 'zebra', 341: 'hog, pig, grunter, squealer, Sus scrofa', 342: 'wild boar, boar, Sus scrofa', 343: 'warthog', 344: 'hippopotamus, hippo, river horse, Hippopotamus amphibius', 345: 'ox', 346: 'water buffalo, water ox, Asiatic buffalo, Bubalus bubalis', 347: 'bison', 348: 'ram, tup', 349: 'bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis', 350: 'ibex, Capra ibex', 351: 'hartebeest', 352: 'impala, Aepyceros melampus', 353: 'gazelle', 354: 'Arabian camel, dromedary, Camelus dromedarius', 355: 'llama', 356: 'weasel', 357: 'mink', 358: 'polecat, fitch, foulmart, foumart, Mustela putorius', 359: 'black-footed ferret, ferret, Mustela nigripes', 360: 'otter', 361: 'skunk, polecat, wood pussy', 362: 'badger', 363: 'armadillo', 364: 'three-toed sloth, ai, Bradypus tridactylus', 365: 'orangutan, orang, orangutang, Pongo pygmaeus', 366: 'gorilla, Gorilla gorilla', 367: 'chimpanzee, chimp, Pan troglodytes', 368: 'gibbon, Hylobates lar', 369: 'siamang, Hylobates syndactylus, Symphalangus syndactylus', 370: 'guenon, guenon monkey', 371: 'patas, hussar monkey, Erythrocebus patas', 372: 'baboon', 373: 'macaque', 374: 'langur', 375: 'colobus, colobus monkey', 376: 'proboscis monkey, Nasalis larvatus', 377: 'marmoset', 378: 'capuchin, ringtail, Cebus capucinus', 379: 'howler monkey, howler', 380: 'titi, titi monkey', 381: 'spider monkey, Ateles geoffroyi', 382: 'squirrel monkey, Saimiri sciureus', 383: 'Madagascar cat, ring-tailed lemur, Lemur catta', 384: 'indri, indris, Indri indri, Indri brevicaudatus', 385: 'Indian elephant, Elephas maximus', 386: 'African elephant, Loxodonta africana', 387: 'lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens', 388: 'giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca', 389: 'barracouta, snoek', 390: 'eel', 391: 'coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch', 392: 'rock beauty, Holocanthus tricolor', 393: 'anemone fish', 394: 'sturgeon', 395: 'gar, garfish, garpike, billfish, Lepisosteus osseus', 396: 'lionfish', 397: 'puffer, pufferfish, blowfish, globefish', 398: 'abacus', 399: 'abaya', 400: "academic gown, academic robe, judge's robe", 401: 'accordion, piano accordion, squeeze box', 402: 'acoustic guitar', 403: 'aircraft carrier, carrier, flattop, attack aircraft carrier', 404: 'airliner', 405: 'airship, dirigible', 406: 'altar', 407: 'ambulance', 408: 'amphibian, amphibious vehicle', 409: 'analog clock', 410: 'apiary, bee house', 411: 'apron', 412: 'ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin', 413: 'assault rifle, assault gun', 414: 'backpack, back pack, knapsack, packsack, rucksack, haversack', 415: 'bakery, bakeshop, bakehouse', 416: 'balance beam, beam', 417: 'balloon', 418: 'ballpoint, ballpoint pen, ballpen, Biro', 419: 'Band Aid', 420: 'banjo', 421: 'bannister, banister, balustrade, balusters, handrail', 422: 'barbell', 423: 'barber chair', 424: 'barbershop', 425: 'barn', 426: 'barometer', 427: 'barrel, cask', 428: 'barrow, garden cart, lawn cart, wheelbarrow', 429: 'baseball', 430: 'basketball', 431: 'bassinet', 432: 'bassoon', 433: 'bathing cap, swimming cap', 434: 'bath towel', 435: 'bathtub, bathing tub, bath, tub', 436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon', 437: 'beacon, lighthouse, beacon light, pharos', 438: 'beaker', 439: 'bearskin, busby, shako', 440: 'beer bottle', 441: 'beer glass', 442: 'bell cote, bell cot', 443: 'bib', 444: 'bicycle-built-for-two, tandem bicycle, tandem', 445: 'bikini, two-piece', 446: 'binder, ring-binder', 447: 'binoculars, field glasses, opera glasses', 448: 'birdhouse', 449: 'boathouse', 450: 'bobsled, bobsleigh, bob', 451: 'bolo tie, bolo, bola tie, bola', 452: 'bonnet, poke bonnet', 453: 'bookcase', 454: 'bookshop, bookstore, bookstall', 455: 'bottlecap', 456: 'bow', 457: 'bow tie, bow-tie, bowtie', 458: 'brass, memorial tablet, plaque', 459: 'brassiere, bra, bandeau', 460: 'breakwater, groin, groyne, mole, bulwark, seawall, jetty', 461: 'breastplate, aegis, egis', 462: 'broom', 463: 'bucket, pail', 464: 'buckle', 465: 'bulletproof vest', 466: 'bullet train, bullet', 467: 'butcher shop, meat market', 468: 'cab, hack, taxi, taxicab', 469: 'caldron, cauldron', 470: 'candle, taper, wax light', 471: 'cannon', 472: 'canoe', 473: 'can opener, tin opener', 474: 'cardigan', 475: 'car mirror', 476: 'carousel, carrousel, merry-go-round, roundabout, whirligig', 477: "carpenter's kit, tool kit", 478: 'carton', 479: 'car wheel', 480: 'cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM', 481: 'cassette', 482: 'cassette player', 483: 'castle', 484: 'catamaran', 485: 'CD player', 486: 'cello, violoncello', 487: 'cellular telephone, cellular phone, cellphone, cell, mobile phone', 488: 'chain', 489: 'chainlink fence', 490: 'chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour', 491: 'chain saw, chainsaw', 492: 'chest', 493: 'chiffonier, commode', 494: 'chime, bell, gong', 495: 'china cabinet, china closet', 496: 'Christmas stocking', 497: 'church, church building', 498: 'cinema, movie theater, movie theatre, movie house, picture palace', 499: 'cleaver, meat cleaver, chopper', 500: 'cliff dwelling', 501: 'cloak', 502: 'clog, geta, patten, sabot', 503: 'cocktail shaker', 504: 'coffee mug', 505: 'coffeepot', 506: 'coil, spiral, volute, whorl, helix', 507: 'combination lock', 508: 'computer keyboard, keypad', 509: 'confectionery, confectionary, candy store', 510: 'container ship, containership, container vessel', 511: 'convertible', 512: 'corkscrew, bottle screw', 513: 'cornet, horn, trumpet, trump', 514: 'cowboy boot', 515: 'cowboy hat, ten-gallon hat', 516: 'cradle', 517: 'crane', 518: 'crash helmet', 519: 'crate', 520: 'crib, cot', 521: 'Crock Pot', 522: 'croquet ball', 523: 'crutch', 524: 'cuirass', 525: 'dam, dike, dyke', 526: 'desk', 527: 'desktop computer', 528: 'dial telephone, dial phone', 529: 'diaper, nappy, napkin', 530: 'digital clock', 531: 'digital watch', 532: 'dining table, board', 533: 'dishrag, dishcloth', 534: 'dishwasher, dish washer, dishwashing machine', 535: 'disk brake, disc brake', 536: 'dock, dockage, docking facility', 537: 'dogsled, dog sled, dog sleigh', 538: 'dome', 539: 'doormat, welcome mat', 540: 'drilling platform, offshore rig', 541: 'drum, membranophone, tympan', 542: 'drumstick', 543: 'dumbbell', 544: 'Dutch oven', 545: 'electric fan, blower', 546: 'electric guitar', 547: 'electric locomotive', 548: 'entertainment center', 549: 'envelope', 550: 'espresso maker', 551: 'face powder', 552: 'feather boa, boa', 553: 'file, file cabinet, filing cabinet', 554: 'fireboat', 555: 'fire engine, fire truck', 556: 'fire screen, fireguard', 557: 'flagpole, flagstaff', 558: 'flute, transverse flute', 559: 'folding chair', 560: 'football helmet', 561: 'forklift', 562: 'fountain', 563: 'fountain pen', 564: 'four-poster', 565: 'freight car', 566: 'French horn, horn', 567: 'frying pan, frypan, skillet', 568: 'fur coat', 569: 'garbage truck, dustcart', 570: 'gasmask, respirator, gas helmet', 571: 'gas pump, gasoline pump, petrol pump, island dispenser', 572: 'goblet', 573: 'go-kart', 574: 'golf ball', 575: 'golfcart, golf cart', 576: 'gondola', 577: 'gong, tam-tam', 578: 'gown', 579: 'grand piano, grand', 580: 'greenhouse, nursery, glasshouse', 581: 'grille, radiator grille', 582: 'grocery store, grocery, food market, market', 583: 'guillotine', 584: 'hair slide', 585: 'hair spray', 586: 'half track', 587: 'hammer', 588: 'hamper', 589: 'hand blower, blow dryer, blow drier, hair dryer, hair drier', 590: 'hand-held computer, hand-held microcomputer', 591: 'handkerchief, hankie, hanky, hankey', 592: 'hard disc, hard disk, fixed disk', 593: 'harmonica, mouth organ, harp, mouth harp', 594: 'harp', 595: 'harvester, reaper', 596: 'hatchet', 597: 'holster', 598: 'home theater, home theatre', 599: 'honeycomb', 600: 'hook, claw', 601: 'hoopskirt, crinoline', 602: 'horizontal bar, high bar', 603: 'horse cart, horse-cart', 604: 'hourglass', 605: 'iPod', 606: 'iron, smoothing iron', 607: "jack-o'-lantern", 608: 'jean, blue jean, denim', 609: 'jeep, landrover', 610: 'jersey, T-shirt, tee shirt', 611: 'jigsaw puzzle', 612: 'jinrikisha, ricksha, rickshaw', 613: 'joystick', 614: 'kimono', 615: 'knee pad', 616: 'knot', 617: 'lab coat, laboratory coat', 618: 'ladle', 619: 'lampshade, lamp shade', 620: 'laptop, laptop computer', 621: 'lawn mower, mower', 622: 'lens cap, lens cover', 623: 'letter opener, paper knife, paperknife', 624: 'library', 625: 'lifeboat', 626: 'lighter, light, igniter, ignitor', 627: 'limousine, limo', 628: 'liner, ocean liner', 629: 'lipstick, lip rouge', 630: 'Loafer', 631: 'lotion', 632: 'loudspeaker, speaker, speaker unit, loudspeaker system, speaker system', 633: "loupe, jeweler's loupe", 634: 'lumbermill, sawmill', 635: 'magnetic compass', 636: 'mailbag, postbag', 637: 'mailbox, letter box', 638: 'maillot', 639: 'maillot, tank suit', 640: 'manhole cover', 641: 'maraca', 642: 'marimba, xylophone', 643: 'mask', 644: 'matchstick', 645: 'maypole', 646: 'maze, labyrinth', 647: 'measuring cup', 648: 'medicine chest, medicine cabinet', 649: 'megalith, megalithic structure', 650: 'microphone, mike', 651: 'microwave, microwave oven', 652: 'military uniform', 653: 'milk can', 654: 'minibus', 655: 'miniskirt, mini', 656: 'minivan', 657: 'missile', 658: 'mitten', 659: 'mixing bowl', 660: 'mobile home, manufactured home', 661: 'Model T', 662: 'modem', 663: 'monastery', 664: 'monitor', 665: 'moped', 666: 'mortar', 667: 'mortarboard', 668: 'mosque', 669: 'mosquito net', 670: 'motor scooter, scooter', 671: 'mountain bike, all-terrain bike, off-roader', 672: 'mountain tent', 673: 'mouse, computer mouse', 674: 'mousetrap', 675: 'moving van', 676: 'muzzle', 677: 'nail', 678: 'neck brace', 679: 'necklace', 680: 'nipple', 681: 'notebook, notebook computer', 682: 'obelisk', 683: 'oboe, hautboy, hautbois', 684: 'ocarina, sweet potato', 685: 'odometer, hodometer, mileometer, milometer', 686: 'oil filter', 687: 'organ, pipe organ', 688: 'oscilloscope, scope, cathode-ray oscilloscope, CRO', 689: 'overskirt', 690: 'oxcart', 691: 'oxygen mask', 692: 'packet', 693: 'paddle, boat paddle', 694: 'paddlewheel, paddle wheel', 695: 'padlock', 696: 'paintbrush', 697: "pajama, pyjama, pj's, jammies", 698: 'palace', 699: 'panpipe, pandean pipe, syrinx', 700: 'paper towel', 701: 'parachute, chute', 702: 'parallel bars, bars', 703: 'park bench', 704: 'parking meter', 705: 'passenger car, coach, carriage', 706: 'patio, terrace', 707: 'pay-phone, pay-station', 708: 'pedestal, plinth, footstall', 709: 'pencil box, pencil case', 710: 'pencil sharpener', 711: 'perfume, essence', 712: 'Petri dish', 713: 'photocopier', 714: 'pick, plectrum, plectron', 715: 'pickelhaube', 716: 'picket fence, paling', 717: 'pickup, pickup truck', 718: 'pier', 719: 'piggy bank, penny bank', 720: 'pill bottle', 721: 'pillow', 722: 'ping-pong ball', 723: 'pinwheel', 724: 'pirate, pirate ship', 725: 'pitcher, ewer', 726: "plane, carpenter's plane, woodworking plane", 727: 'planetarium', 728: 'plastic bag', 729: 'plate rack', 730: 'plow, plough', 731: "plunger, plumber's helper", 732: 'Polaroid camera, Polaroid Land camera', 733: 'pole', 734: 'police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria', 735: 'poncho', 736: 'pool table, billiard table, snooker table', 737: 'pop bottle, soda bottle', 738: 'pot, flowerpot', 739: "potter's wheel", 740: 'power drill', 741: 'prayer rug, prayer mat', 742: 'printer', 743: 'prison, prison house', 744: 'projectile, missile', 745: 'projector', 746: 'puck, hockey puck', 747: 'punching bag, punch bag, punching ball, punchball', 748: 'purse', 749: 'quill, quill pen', 750: 'quilt, comforter, comfort, puff', 751: 'racer, race car, racing car', 752: 'racket, racquet', 753: 'radiator', 754: 'radio, wireless', 755: 'radio telescope, radio reflector', 756: 'rain barrel', 757: 'recreational vehicle, RV, R.V.', 758: 'reel', 759: 'reflex camera', 760: 'refrigerator, icebox', 761: 'remote control, remote', 762: 'restaurant, eating house, eating place, eatery', 763: 'revolver, six-gun, six-shooter', 764: 'rifle', 765: 'rocking chair, rocker', 766: 'rotisserie', 767: 'rubber eraser, rubber, pencil eraser', 768: 'rugby ball', 769: 'rule, ruler', 770: 'running shoe', 771: 'safe', 772: 'safety pin', 773: 'saltshaker, salt shaker', 774: 'sandal', 775: 'sarong', 776: 'sax, saxophone', 777: 'scabbard', 778: 'scale, weighing machine', 779: 'school bus', 780: 'schooner', 781: 'scoreboard', 782: 'screen, CRT screen', 783: 'screw', 784: 'screwdriver', 785: 'seat belt, seatbelt', 786: 'sewing machine', 787: 'shield, buckler', 788: 'shoe shop, shoe-shop, shoe store', 789: 'shoji', 790: 'shopping basket', 791: 'shopping cart', 792: 'shovel', 793: 'shower cap', 794: 'shower curtain', 795: 'ski', 796: 'ski mask', 797: 'sleeping bag', 798: 'slide rule, slipstick', 799: 'sliding door', 800: 'slot, one-armed bandit', 801: 'snorkel', 802: 'snowmobile', 803: 'snowplow, snowplough', 804: 'soap dispenser', 805: 'soccer ball', 806: 'sock', 807: 'solar dish, solar collector, solar furnace', 808: 'sombrero', 809: 'soup bowl', 810: 'space bar', 811: 'space heater', 812: 'space shuttle', 813: 'spatula', 814: 'speedboat', 815: "spider web, spider's web", 816: 'spindle', 817: 'sports car, sport car', 818: 'spotlight, spot', 819: 'stage', 820: 'steam locomotive', 821: 'steel arch bridge', 822: 'steel drum', 823: 'stethoscope', 824: 'stole', 825: 'stone wall', 826: 'stopwatch, stop watch', 827: 'stove', 828: 'strainer', 829: 'streetcar, tram, tramcar, trolley, trolley car', 830: 'stretcher', 831: 'studio couch, day bed', 832: 'stupa, tope', 833: 'submarine, pigboat, sub, U-boat', 834: 'suit, suit of clothes', 835: 'sundial', 836: 'sunglass', 837: 'sunglasses, dark glasses, shades', 838: 'sunscreen, sunblock, sun blocker', 839: 'suspension bridge', 840: 'swab, swob, mop', 841: 'sweatshirt', 842: 'swimming trunks, bathing trunks', 843: 'swing', 844: 'switch, electric switch, electrical switch', 845: 'syringe', 846: 'table lamp', 847: 'tank, army tank, armored combat vehicle, armoured combat vehicle', 848: 'tape player', 849: 'teapot', 850: 'teddy, teddy bear', 851: 'television, television system', 852: 'tennis ball', 853: 'thatch, thatched roof', 854: 'theater curtain, theatre curtain', 855: 'thimble', 856: 'thresher, thrasher, threshing machine', 857: 'throne', 858: 'tile roof', 859: 'toaster', 860: 'tobacco shop, tobacconist shop, tobacconist', 861: 'toilet seat', 862: 'torch', 863: 'totem pole', 864: 'tow truck, tow car, wrecker', 865: 'toyshop', 866: 'tractor', 867: 'trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi', 868: 'tray', 869: 'trench coat', 870: 'tricycle, trike, velocipede', 871: 'trimaran', 872: 'tripod', 873: 'triumphal arch', 874: 'trolleybus, trolley coach, trackless trolley', 875: 'trombone', 876: 'tub, vat', 877: 'turnstile', 878: 'typewriter keyboard', 879: 'umbrella', 880: 'unicycle, monocycle', 881: 'upright, upright piano', 882: 'vacuum, vacuum cleaner', 883: 'vase', 884: 'vault', 885: 'velvet', 886: 'vending machine', 887: 'vestment', 888: 'viaduct', 889: 'violin, fiddle', 890: 'volleyball', 891: 'waffle iron', 892: 'wall clock', 893: 'wallet, billfold, notecase, pocketbook', 894: 'wardrobe, closet, press', 895: 'warplane, military plane', 896: 'washbasin, handbasin, washbowl, lavabo, wash-hand basin', 897: 'washer, automatic washer, washing machine', 898: 'water bottle', 899: 'water jug', 900: 'water tower', 901: 'whiskey jug', 902: 'whistle', 903: 'wig', 904: 'window screen', 905: 'window shade', 906: 'Windsor tie', 907: 'wine bottle', 908: 'wing', 909: 'wok', 910: 'wooden spoon', 911: 'wool, woolen, woollen', 912: 'worm fence, snake fence, snake-rail fence, Virginia fence', 913: 'wreck', 914: 'yawl', 915: 'yurt', 916: 'web site, website, internet site, site', 917: 'comic book', 918: 'crossword puzzle, crossword', 919: 'street sign', 920: 'traffic light, traffic signal, stoplight', 921: 'book jacket, dust cover, dust jacket, dust wrapper', 922: 'menu', 923: 'plate', 924: 'guacamole', 925: 'consomme', 926: 'hot pot, hotpot', 927: 'trifle', 928: 'ice cream, icecream', 929: 'ice lolly, lolly, lollipop, popsicle', 930: 'French loaf', 931: 'bagel, beigel', 932: 'pretzel', 933: 'cheeseburger', 934: 'hotdog, hot dog, red hot', 935: 'mashed potato', 936: 'head cabbage', 937: 'broccoli', 938: 'cauliflower', 939: 'zucchini, courgette', 940: 'spaghetti squash', 941: 'acorn squash', 942: 'butternut squash', 943: 'cucumber, cuke', 944: 'artichoke, globe artichoke', 945: 'bell pepper', 946: 'cardoon', 947: 'mushroom', 948: 'Granny Smith', 949: 'strawberry', 950: 'orange', 951: 'lemon', 952: 'fig', 953: 'pineapple, ananas', 954: 'banana', 955: 'jackfruit, jak, jack', 956: 'custard apple', 957: 'pomegranate', 958: 'hay', 959: 'carbonara', 960: 'chocolate sauce, chocolate syrup', 961: 'dough', 962: 'meat loaf, meatloaf', 963: 'pizza, pizza pie', 964: 'potpie', 965: 'burrito', 966: 'red wine', 967: 'espresso', 968: 'cup', 969: 'eggnog', 970: 'alp', 971: 'bubble', 972: 'cliff, drop, drop-off', 973: 'coral reef', 974: 'geyser', 975: 'lakeside, lakeshore', 976: 'promontory, headland, head, foreland', 977: 'sandbar, sand bar', 978: 'seashore, coast, seacoast, sea-coast', 979: 'valley, vale', 980: 'volcano', 981: 'ballplayer, baseball player', 982: 'groom, bridegroom', 983: 'scuba diver', 984: 'rapeseed', 985: 'daisy', 986: "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum", 987: 'corn', 988: 'acorn', 989: 'hip, rose hip, rosehip', 990: 'buckeye, horse chestnut, conker', 991: 'coral fungus', 992: 'agaric', 993: 'gyromitra', 994: 'stinkhorn, carrion fungus', 995: 'earthstar', 996: 'hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa', 997: 'bolete', 998: 'ear, spike, capitulum', 999: 'toilet tissue, toilet paper, bathroom tissue'} + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none" +pipeline = Pipeline.create( + task="image_classification", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX + class_names=classes, +) + +# run inference on image file +prediction = pipeline(images="lion.jpeg") +print(prediction.labels) +# labels=['lion, king of beasts, Panthera leo'] +``` +### Cross Use Case Functionality +Check out the [Pipeline User Guide](../../user-guide/deepsparse-pipelines.md) for more details on configuring a Pipeline. + +## DeepSparse Server +Built on the popular FastAPI and Uvicorn stack, DeepSparse Server enables you to set up a REST endpoint for serving inferences over HTTP. Since DeepSparse Server wraps the Pipeline API, it inherits all the utilities provided by Pipelines. + +The CLI command below launches an image classification pipeline with a 95% pruned ResNet model: + +```bash +deepsparse.server \ + --task image_classification \ + --model_path zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none +``` +You should see Uvicorn report that it is running on http://0.0.0.0:5543. Once launched, a /docs path is created with full endpoint descriptions and support for making sample requests. + +Here is an example client request, using the Python requests library for formatting the HTTP: + +```python +import requests + +url = 'http://0.0.0.0:5543/predict/from_files' +path = ['lion.jpeg'] # just put the name of images in here +files = [('request', open(img, 'rb')) for img in path] +resp = requests.post(url=url, files=files) +print(resp.text) +# {"labels":[291],"scores":[24.185693740844727]} +``` +#### Use Case Specific Arguments + +To use a use-case specific argument, create a server configuration file for passing the argument via kwargs. + +This configuration file sets `top_k` classes to 3: +```yaml +# image_classification-config.yaml +endpoints: + - task: image_classification + model: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95_quant-none + kwargs: + top_k: 3 +``` + +Start the server: +```bash +deepsparse.server --config-file image_classification-config.yaml +``` + +Make a request over HTTP: + +```python +import requests + +url = 'http://0.0.0.0:5543/predict/from_files' +path = ['lion.jpeg'] # just put the name of images in here +files = [('request', open(img, 'rb')) for img in path] +resp = requests.post(url=url, files=files) +print(resp.text) +# {"labels":[291,260,244],"scores":[24.185693740844727,18.982254028320312,16.390701293945312]} +``` +## Using a Custom ONNX File +Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model. + +The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training. +Click Download on the [ResNet-50 - ImageNet page](https://sparsezoo.neuralmagic.com/models/cv%2Fclassification%2Fresnet_v1-50%2Fpytorch%2Fsparseml%2Fimagenet%2Fpruned95_uniform_quant-none) to download a ONNX ResNet model for demonstration. + +Extract the downloaded file and use the ResNet-50 ONNX model for inference: +```python +from deepsparse import Pipeline + +# download onnx from sparsezoo and compile with batch size 1 +pipeline = Pipeline.create( + task="image_classification", + model_path="resnet.onnx", # sparsezoo stub or path to local ONNX +) + +# run inference on image file +prediction = pipeline(images=["lion.jpeg"]) +print(prediction.labels) +# [291] +``` +### Cross Use Case Functionality + +Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server. diff --git a/docs/use-cases/cv/image-segmentation-yolact.md b/docs/use-cases/cv/image-segmentation-yolact.md new file mode 100644 index 0000000000..cc7be1e044 --- /dev/null +++ b/docs/use-cases/cv/image-segmentation-yolact.md @@ -0,0 +1,251 @@ + + +# Deploying Image Segmentation Models with DeepSparse + +This page explains how to benchmark and deploy an image segmentation with DeepSparse. + +There are three interfaces for interacting with DeepSparse: +- **Engine** is the lowest-level API that enables you to compile a model and run inference on raw input tensors. + +- **Pipeline** is the default DeepSparse API. Similar to Hugging Face Pipelines, it wraps Engine with pre-processing +and post-processing steps, allowing you to make requests on raw data and receive post-processed predictions. + +- **Server** is a REST API wrapper around Pipelines built on [FastAPI](https://fastapi.tiangolo.com/) and [Uvicorn](https://www.uvicorn.org/). It enables you to start a model serving +endpoint running DeepSparse with a single CLI. + +We will walk through an example of each using YOLACT. + +## Installation Requirements + +This use case requires the installation of [DeepSparse Server](../../user-guide/installation.md). + +Confirm your machine is compatible with our [hardware requirements](../../user-guide/hardware-support.md) + +## Benchmarking + +We can use the benchmarking utility to demonstrate the DeepSparse's performance. The numbers below were run on a 4 core `c6i.2xlarge` instance in AWS. + +### ONNX Runtime Baseline + +As a baseline, let's check out ONNX Runtime's performance on YOLACT. Make sure you have ORT installed (`pip install onnxruntime`). + +```bash +deepsparse.benchmark \ + zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/base-none \ + -b 64 -s sync -nstreams 1 \ + -e onnxruntime + +> Original Model Path: zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/base-none +> Batch Size: 64 +> Scenario: sync +> Throughput (items/sec): 3.5290 +``` + +ONNX Runtime achieves 3.5 items/second with batch 64. + +### DeepSparse Speedup +Now, let's run DeepSparse on an inference-optimized sparse version of YOLACT. This model has been 82.5% pruned and quantized to INT8, while retaining >99% accuracy of the dense baseline on the `coco` dataset. + +```bash +deepsparse.benchmark \ + zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none \ + -b 64 -s sync -nstreams 1 \ + -e deepsparse + +> Original Model Path: zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none +> Batch Size: 64 +> Scenario: sync +> Throughput (items/sec): 23.2061 +``` + +DeepSparse achieves 23 items/second, a 6.6x speed-up over ONNX Runtime! + +## DeepSparse Engine +Engine is the lowest-level API for interacting with DeepSparse. As much as possible, we recommended using the Pipeline API but Engine is available if you want to handle pre- or post-processing yourself. + +With Engine, we can compile an ONNX file and run inference on raw tensors. + +Here's an example, using a 82.5% pruned-quantized YOLACT model from SparseZoo: + +```python +from deepsparse import Engine +from deepsparse.utils import generate_random_inputs, model_to_path +import numpy as np + +# download onnx from sparsezoo and compile with batchsize 1 +sparsezoo_stub = "zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none" +batch_size = 1 +compiled_model = Engine( + model=sparsezoo_stub, # sparsezoo stub or path to local ONNX + batch_size=batch_size # defaults to batch size 1 +) + +# input is raw numpy tensors, output is raw data +inputs = generate_random_inputs(model_to_path(sparsezoo_stub), batch_size) +output = compiled_model(inputs) + +print(output[0].shape) +print(output) + +# (1, 19248, 4) + +# [array([[[ 0.444973 , -0.02015 , -1.3631972 , -0.9219434 ], +# ... +# 9.50585604e-02, 4.13608968e-01, 1.57236055e-01]]]], dtype=float32)] +``` + +## DeepSparse Pipelines +Pipeline is the default interface for interacting with DeepSparse. + +Like Hugging Face Pipelines, DeepSparse Pipelines wrap pre- and post-processing around the inference performed by the Engine. This creates a clean API that allows you to pass raw text and images to DeepSparse and receive the post-processed predictions, making it easy to add DeepSparse to your application. + +Let's start by downloading a sample image: +```bash +wget https://huggingface.co/spaces/neuralmagic/cv-yolact/resolve/main/thailand.jpeg +``` +We will use the `Pipeline.create()` constructor to create an instance of an image segmentation Pipeline with a 82% pruned-quantized version of YOLACT trained on `coco`. We can then pass images to the `Pipeline` and receive the predictions. All the pre-processing (such as resizing the images) is handled by the `Pipeline`. + +```python +from deepsparse.pipeline import Pipeline + +model_stub = "zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none" +yolact_pipeline = Pipeline.create( + task="yolact", + model_path=model_stub, +) + +images = ["thailand.jpeg"] +predictions = yolact_pipeline(images=images) +# predictions has attributes `boxes`, `classes`, `masks` and `scores` +predictions.classes[0] +# [20,......, 5] +``` + +### Use Case Specific Arguments +The Image Segmentation Pipeline contains additional arguments for configuring a `Pipeline`. + +#### Classes +The `class_names` argument defines a dictionary containing the desired class mappings. + +```python +from deepsparse.pipeline import Pipeline + +model_stub = "zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none" + +yolact_pipeline = Pipeline.create( + task="yolact", + model_path=model_stub, + class_names="coco", +) + +images = ["thailand.jpeg"] +predictions = yolact_pipeline(images=images, confidence_threshold=0.2, nms_threshold=0.5) +# predictions has attributes `boxes`, `classes`, `masks` and `scores` +predictions.classes[0] +['elephant','elephant','person',...'zebra','stop sign','bus'] +``` + +### Annotate CLI +You can also use the annotate command to have the engine save an annotated photo on disk. +```bash +deepsparse.instance_segmentation.annotate --source thailand.jpeg #Try --source 0 to annotate your live webcam feed +``` +Running the above command will create an `annotation-results` folder and save the annotated image inside. + +If a `--model_filepath` arg isn't provided, then `zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none` will be used by default. + +![Annotation Results](images/result-0.jpg) + +### Cross Use Case Functionality +Check out the [Pipeline User Guide](../../user-guide/deepsparse-pipelines.md) for more details on configuring a Pipeline. + +## DeepSparse Server +Built on the popular FastAPI and Uvicorn stack, DeepSparse Server enables you to set up a REST endpoint for serving inferences over HTTP. Since DeepSparse Server wraps the Pipeline API, it inherits all the utilities provided by Pipelines. + +The CLI command below launches an image segmentation pipeline with a 82% pruned-quantized YOLACT model: + +```bash +deepsparse.server \ + --task yolact \ + --model_path "zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none" --port 5543 +``` +Run inference: +```python +import requests +import json + +url = 'http://0.0.0.0:5543/predict/from_files' +path = ['thailand.jpeg'] # list of images for inference +files = [('request', open(img, 'rb')) for img in path] +resp = requests.post(url=url, files=files) +annotations = json.loads(resp.text) # dictionary of annotation results +boxes, classes, masks, scores = annotations["boxes"], annotations["classes"], annotations["masks"], annotations["scores"] +``` +#### Use Case Specific Arguments +To use the `class_names` argument, create a Server configuration file for passing the argument via kwargs. + +This configuration file sets `class_names` to `coco`: + +```yaml +# yolact-config.yaml +endpoints: + - task: yolact + model: zoo:cv/segmentation/yolact-darknet53/pytorch/dbolya/coco/pruned82_quant-none + kwargs: + class_names: coco +``` +Start the server: +```bash +deepsparse.server --config-file yolact-config.yaml +``` +Run inference: +```python +import requests +import json + +url = 'http://0.0.0.0:5543/predict/from_files' +path = ['thailand.jpeg'] # list of images for inference +files = [('request', open(img, 'rb')) for img in path] +resp = requests.post(url=url, files=files) +annotations = json.loads(resp.text) # dictionary of annotation results +boxes, classes, masks, scores = annotations["boxes"], annotations["classes"], annotations["masks"], annotations["scores"] +``` + +## Using a Custom ONNX File +Apart from using models from the SparseZoo, DeepSparse allows you to define custom ONNX files when deploying a model. + +The first step is to obtain the ONNX model. You can obtain the file by converting your model to ONNX after training. +Click Download on the [YOLCAT page](https://sparsezoo.neuralmagic.com/models/cv%2Fsegmentation%2Fyolact-darknet53%2Fpytorch%2Fdbolya%2Fcoco%2Fpruned82_quant-none) to download a ONNX YOLACT model for demonstration. + +Extract the downloaded file and use the YOLACT ONNX model for inference: +```python +from deepsparse.pipeline import Pipeline + +yolact_pipeline = Pipeline.create( + task="yolact", + model_path="yolact.onnx", +) + +images = ["thailand.jpeg"] +predictions = yolact_pipeline(images=images) +# predictions has attributes `boxes`, `classes`, `masks` and `scores` +predictions.classes[0] +# [20,20, .......0, 0,24] +``` +### Cross Use Case Functionality + +Check out the [Server User Guide](../../user-guide/deepsparse-server.md) for more details on configuring the Server. diff --git a/docs/use-cases/cv/images/result-0.jpg b/docs/use-cases/cv/images/result-0.jpg new file mode 100644 index 0000000000..f485f70677 Binary files /dev/null and b/docs/use-cases/cv/images/result-0.jpg differ diff --git a/docs/use-cases/cv/images/result.jpg b/docs/use-cases/cv/images/result.jpg new file mode 100644 index 0000000000..3a6df579ca Binary files /dev/null and b/docs/use-cases/cv/images/result.jpg differ diff --git a/docs/use-cases/cv/object-detection-yolov5.md b/docs/use-cases/cv/object-detection-yolov5.md new file mode 100644 index 0000000000..1843a4d6ee --- /dev/null +++ b/docs/use-cases/cv/object-detection-yolov5.md @@ -0,0 +1,323 @@ + + +# Deploying YOLOv5 Object Detection Models with DeepSparse + +This page explains how to benchmark and deploy a YOLOv5 object detection model with DeepSparse. + +There are three interfaces for interacting with DeepSparse: +- **Engine** is the lowest-level API that enables you to compile a model and run inference on raw input tensors. + +- **Pipeline** is the default DeepSparse API. Similar to Hugging Face Pipelines, it wraps Engine with pre-processing and post-processing steps, allowing you to make requests on raw data and receive post-processed predictions. + +- **Server** is a REST API wrapper around Pipelines built on [FastAPI](https://fastapi.tiangolo.com/) and [Uvicorn](https://www.uvicorn.org/). It enables you to start a model serving endpoint running DeepSparse with a single CLI. + +This example uses YOLOv5s. For a full list of pre-sparsified object detection models, [check out the SparseZoo](https://sparsezoo.neuralmagic.com/?domain=cv&sub_domain=detection&page=1). + +## Installation Requirements + +This use case requires the installation of [DeepSparse Server and YOLO](../../user-guide/installation.md). + +Confirm your machine is compatible with our [hardware requirements](../../user-guide/hardware-support.md) + +## Benchmarking + +We can use the benchmarking utility to demonstrate the DeepSparse's performance. The numbers below were run on a 4 core `c6i.2xlarge` instance in AWS. + +### ONNX Runtime Baseline + +As a baseline, let's check out ONNX Runtime's performance on YOLOv5s. Make sure you have ORT installed (`pip install onnxruntime`). + +```bash +deepsparse.benchmark \ + zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none \ + -b 64 -s sync -nstreams 1 \ + -e onnxruntime + +> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/base-none +> Batch Size: 64 +> Scenario: sync +> Throughput (items/sec): 12.2369 +``` +ONNX Runtime achieves 12 items/second with batch 64. + +### DeepSparse Speedup +Now, let's run DeepSparse on an inference-optimized sparse version of YOLOv5s. This model has been 85% pruned and quantized. + +```bash +deepsparse.benchmark \ + zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none \ + -b 64 -s sync -nstreams 1 + +> Original Model Path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none +> Batch Size: 64 +> Scenario: sync +> Throughput (items/sec): 72.55 +``` +DeepSparse achieves 73 items/second, a 5.5x speed-up over ONNX Runtime! + +## DeepSparse Engine +Engine is the lowest-level API for interacting with DeepSparse. As much as possible, we recommended using the Pipeline API but Engine is available if you want to handle pre- or post-processing yourself. + +With Engine, we can compile an ONNX file and run inference on raw tensors. + +Here's an example, using a 85% pruned-quantized YOLOv5s model from SparseZoo: + +```python +from deepsparse import Engine +from deepsparse.utils import generate_random_inputs, model_to_path +import numpy as np + +# download onnx from sparsezoo and compile with batchsize 1 +sparsezoo_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none" +batch_size = 1 +compiled_model = Engine( + model=sparsezoo_stub, # sparsezoo stub or path to local ONNX + batch_size=batch_size # defaults to batch size 1 +) +# input is raw numpy tensors, output is raw scores for classes +inputs = generate_random_inputs(model_to_path(sparsezoo_stub), batch_size) +output = compiled_model(inputs) + +print(output[0].shape) +print(output[0]) + +# (1,25200, 85) +# [array([[[5.54789925e+00, 4.28643513e+00, 9.98156166e+00, ..., +# ... +# -6.13238716e+00, -6.80812788e+00, -5.50403357e+00]]]]], dtype=float32)] +``` + +## DeepSparse Pipeline +Pipeline is the default interface for interacting with DeepSparse. + +Like Hugging Face Pipelines, DeepSparse Pipelines wrap pre- and post-processing around the inference performed by the Engine. This creates a clean API that allows you to pass raw text and images to DeepSparse and receive the post-processed predictions, making it easy to add DeepSparse to your application. + +Let's start by downloading a sample image: +```bash +wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg +``` + +We will use the `Pipeline.create()` constructor to create an instance of an object detection Pipeline with a 85% pruned version of YOLOv5s trained on `coco`. We can then pass images to the `Pipeline` and receive the predictions. All the pre-processing (such as resizing the images and running NMS) is handled by the `Pipeline`. + +```python +from deepsparse import Pipeline + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none" +yolo_pipeline = Pipeline.create( + task="yolo", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX +) +images = ["basilica.jpg"] + +# run inference on image file +pipeline_outputs = yolo_pipeline(images=images) +print(pipeline_outputs.boxes) +print(pipeline_outputs.labels) + +# [[[262.56866455078125, 483.48693108558655, 514.8401184082031, 611.7606239318848], [542.7222747802734, 385.72591066360474, 591.0432586669922, 412.0340189933777], [728.4929351806641, 403.6355793476105, 769.6295471191406, 493.7961976528168], [466.83229064941406, 383.6878204345703, 530.7117462158203, 408.8705735206604], [309.2399597167969, 396.0068359375, 362.10223388671875, 435.58393812179565], [56.86535453796387, 409.39830899238586, 99.50672149658203, 497.8857614994049], [318.8877868652344, 388.9980583190918, 449.08460998535156, 587.5987024307251], [793.9356079101562, 390.5112290382385, 861.0441284179688, 489.4586777687073], [449.93934631347656, 441.90707445144653, 574.4951934814453, 539.5000758171082], [99.09783554077148, 381.93165946006775, 135.13665390014648, 458.19711089134216], [154.37461853027344, 386.8395175933838, 188.95138549804688, 469.1738815307617], [14.558289527893066, 396.7127945423126, 54.229820251464844, 487.2396695613861], [704.1891632080078, 398.2202727794647, 739.6305999755859, 471.5654203891754], [731.9091796875, 380.60836935043335, 761.627197265625, 414.56129932403564]]] << list of bounding boxes >> + +# [['3.0', '2.0', '0.0', '2.0', '2.0', '0.0', '0.0', '0.0', '3.0', '0.0', '0.0', '0.0', '0.0', '0.0']] << list of label ids >> +``` + +### Use Case Specific Arguments +The YOLOv5 pipeline contains additional arguments for configuring a Pipeline. + +#### Image Shape + +DeepSparse runs with static shapes. By default, YOLOv5 inferences run with images of shape 640x640. The Pipeline accepts images of any size and scales the images to image shape specified by the ONNX graph. + +We can override the image shape used by DeepSparse with the `image_size` argument. In the example below, we run the inferences at 320x320. + +```python +from deepsparse import Pipeline + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none" +yolo_pipeline = Pipeline.create( + task="yolo", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX + image_size=(320,320) +) +images = ["basilica.jpg"] + +# run inference on image file +pipeline_outputs = yolo_pipeline(images=images) +print(pipeline_outputs.boxes) +print(pipeline_outputs.labels) +``` + +#### Class Names +We can specify class names for the labels by passing a dictionary. In the example below, we just use +the first 4 classes from COCO for the sake of a quick example. + +```python +from deepsparse import Pipeline + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none" +yolo_pipeline = Pipeline.create( + task="yolo", + model_path=sparsezoo_stub, # sparsezoo stub or path to local ONNX + class_names={"0":"person", "1":"bicycle", "2":"car", "3":"motorcycle"} + +) +images = ["basilica.jpg"] + +# run inference on image file +pipeline_outputs = yolo_pipeline(images=images) +print(pipeline_outputs.labels) +# [['motorcycle', 'car', 'person', 'car', 'car', 'person', 'person', 'person', 'motorcycle', 'person', 'person', 'person', 'person', 'person']] +``` + +#### IOU and Conf Threshold +We can also configure the thresholds for making detections in YOLO. + +```python +from deepsparse import Pipeline + +# download onnx from sparsezoo and compile with batch size 1 +sparsezoo_stub = "zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned85_quant-none" +yolo_pipeline = Pipeline.create( + task="yolo", + model_path=sparsezoo_stub +) + +images = ["basilica.jpg"] + +# low threshold inference +pipeline_outputs_low_conf = yolo_pipeline(images=images, iou_thres=0.3, conf_thres=0.1) +print(len(pipeline_outputs_low_conf.boxes[0])) +# 37 <+ +
+ +## Server Configuration + +You can configure DeepSparse Server via YAML files. + +### Basic Example + +Let us walk through a basic example of deploying via a configuration file. + +The following creates an endpoint running a 90% pruned-quantized version of +BERT trained on the SST2 dataset for the sentiment analysis task. + +```yaml +# config.yaml +endpoints: + - task: sentiment-analysis + model: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none +``` + +We can then spin up with the `--config-file` argument: + +```bash +deepsparse.server \ + --config-file config.yaml +``` + +Sending a request: +```python +import requests +url = "http://localhost:5543/predict" +obj = {"sequences": "I love querying DeepSparse launched from a config file!"} +print(requests.post(url, json=obj).text) + +# >>> {"labels":["positive"],"scores":[0.9136188626289368]} +``` + +### Server Level Options + +At the server level, there are a few arguments that can be toggled. + +#### Physical Resources +`num_cores` specifies the number of cores that DeepSparse runs on. By default, +DeepSparse runs on all available cores. + +#### Scheduler +`num_workers` configures DeepSparse's scheduler. + +If `num_workers = 1` (the default), DeepSparse uses its "synchronous" scheduler, which allocates as many resources as possible +to each request. This format is optimizes per-request latency. By setting `num_workers > 1`, DeepSparse +utilizes its multi-stream scheduler, which processes multiple requests at the same time. +In deployment scenarios with low batch sizes and high core counts, using the "multi-stream" scheduler +can increase throughput by allowing DeepSparse to better saturate the cores. + +The following configuration creates a Server with DeepSparse running on two cores, with two input streams, +DeepSparse threads pinned to cores, and PyTorch provided with 2 threads. + +```yaml +# server-level-options-config.yaml +num_cores: 2 +num_workers: 2 + +endpoints: + - task: sentiment-analysis + model: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none +``` + +We can also adjust the port by providing the `--port` argument. + +Spinning up: +```bash +deepsparse.server \ + --config-file server-level-options-config.yaml \ + --port 5555 +``` + +We can then query the Server with the same pattern, querying on port 5555: +```python +import requests +url = "http://localhost:5555/predict" +obj = {"sequences": "I love querying DeepSparse launched from a config file!"} +print(requests.post(url, json=obj).text) + +# >>> {"labels":["positive"],"scores":[0.9136188626289368]} +``` + +### Multiple Endpoints + +To serve multiple models from the same context, we can add an additional endpoint +to the server configuration file. + +Here is an example which stands up two sentiment analysis endpoints, one using a +dense unoptimized BERT and one using a 90% pruned-quantized BERT. + +```yaml +# multiple-endpoint-config.yaml +endpoints: + - task: sentiment-analysis + model: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none + route: /sparse/predict + name: sparse-sentiment-analysis + + - task: sentiment-analysis + model: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/base-none + route: /dense/predict + name: dense-sentiment-analysis +``` + +Spinning up: +```bash +deepsparse.server \ + --config-file multiple-endpoint-config.yaml +``` + +Making a request: +```python +import requests + +obj = {"sequences": "I love querying the multi-model server!"} + +sparse_url = "http://localhost:5543/sparse/predict" +print(f"From the sparse model: {requests.post(sparse_url, json=obj).text}") + +dense_url = "http://localhost:5543/dense/predict" +print(f"From the dense model: {requests.post(dense_url, json=obj).text}") + +# >>> From the sparse model: {"labels":["positive"],"scores":[0.9942120313644409]} +# >>> From the dense model: {"labels":["positive"],"scores":[0.998753547668457]} +``` + +### Endpoint Level Configuration + +We can also configure the properties of each endpoint, including task-specific +arguments from within the YAML file. + +For instance, the following configuration file creates two endpoints. + +The first is a text classification endpoint, using a 90% pruned-quantized BERT model trained on +IMDB for document classification (which means the model is tuned to classify long +sequence lengths). We configure this endpoint with batch size 1 and sequence length +of 512. Since sequence length is a task-specific argument used only in Transformers Pipelines, +we will pass this in `kwargs` in the YAML file. + +The second is a sentiment analysis endpoint. We will use the default +sequence length (128) with batch size 3. + +```yaml +# advanced-endpoint-config.yaml + +endpoints: + - task: text-classification + model: zoo:nlp/document_classification/obert-base/pytorch/huggingface/imdb/pruned90_quant-none + route: /text-classification/predict + name: text-classification + batch_size: 1 + kwargs: + sequence_length: 512 # uses 512 sequence len (transformers pipeline specific) + top_k: 2 # returns top 2 scores (text-classification pipeline specific arg) + + - task: sentiment-analysis + model: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none + route: /sentiment-analysis/predict + name: sentiment-analysis + batch_size: 3 +``` + +Spinning up: +```bash +deepsparse.server \ + --config-file advanced-endpoint-config.yaml +``` + +Making requests: +```python +import requests + +# batch 1 +document_obj = {"sequences": "I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. \ + I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, \ + stilted dialogues, CG that doesn't match the background, and painfully one-dimensional characters cannot be overcome with a 'sci-fi' setting. (I'm sure \ + there are those of you out there who think Babylon 5 is good sci-fi TV. It's not. It's clichéd and uninspiring.) While US viewers might like emotion and \ + character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. \ + It's really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden \ + and predictable, often painful to watch. The makers of Earth KNOW it's rubbish as they have to always say 'Gene Roddenberry's Earth...' otherwise people \ + would not continue watching. Roddenberry's ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks \ + really brings this home) trudging Trabant of a show lumbers into space. Spoiler. So, kill off a main character. And then bring him back as another actor. \ + Jeeez! Dallas all over again."} + +# batch 3 +short_obj = {"sequences": [ + "I love how easy it is to configure DeepSparse Server!", + "It was very challenging to configure my old deep learning inference platform", + "YAML is the best format for configuring my infrastructure" +]} + +document_classification_url = "http://localhost:5543/text-classification/predict" +print(requests.post(document_classification_url, json=document_obj).text) + +sentiment_analysis_url = "http://localhost:5543/sentiment-analysis/predict" +print(requests.post(sentiment_analysis_url, json=short_obj).text) + +# >>> {"labels":[["0","1"]],"scores":[[0.9994900226593018,0.0005100301350466907]]} +# >>> {"labels":["positive","negative","positive"],"scores":[0.9665533900260925,0.9952980279922485,0.9939143061637878]} +``` + +Check out the [Use Case](../use-cases) page for detailed documentation on task-specific arguments that can be applied to the Server via `kwargs`. + +## Custom Use Cases + +Stay tuned for documentation on using a custom DeepSparse Pipeline within the Server! + +## Multi-Stream + +Stay tuned for documentation on multi-stream scheduling with DeepSparse! + +## Logging + +Stay tuned for documentation on DeepSparse Logging! + +## Hot Reloading + +Stay tuned for documentation on Hot Reloading! diff --git a/docs/user-guide/hardware-support.md b/docs/user-guide/hardware-support.md new file mode 100644 index 0000000000..6602699e00 --- /dev/null +++ b/docs/user-guide/hardware-support.md @@ -0,0 +1,30 @@ + + +# Supported Hardware for DeepSparse + +With support for AVX2, AVX-512, and VNNI instruction sets, DeepSparse is validated to work on x86 Intel (Haswell generation and later) and AMD (Zen 2 and later) CPUs running Linux. +Mac and Windows require running Linux in a Docker or virtual machine. + +Here is a table detailing specific support for some algorithms over different microarchitectures: + +| x86 Extension | Microarchitectures | Kernel Sparsity | Sparse Quantization | +|:----------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:---------------:|:-------------------:| +| [AMD AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2) | [Zen 2,](https://en.wikipedia.org/wiki/Zen_2) [Zen 3](https://en.wikipedia.org/wiki/Zen_3) | optimized | emulated | +| [AMD AVX-512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX-512) VNNI | [Zen 4](https://en.wikipedia.org/wiki/Zen_4) | optimized | optimized | +| [Intel AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2) | [Haswell,](https://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29) [Broadwell,](https://en.wikipedia.org/wiki/Broadwell_%28microarchitecture%29) and newer | optimized | emulated | +| [Intel AVX-512](https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512) | [Skylake](https://en.wikipedia.org/wiki/Skylake_%28microarchitecture%29), [Cannon Lake](https://en.wikipedia.org/wiki/Cannon_Lake_%28microarchitecture%29), and newer | optimized | emulated | +| [Intel AVX-512](https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512) VNNI (DL Boost) | [Cascade Lake](https://en.wikipedia.org/wiki/Cascade_Lake_%28microarchitecture%29), [Ice Lake](https://en.wikipedia.org/wiki/Ice_Lake_%28microprocessor%29), [Cooper Lake](https://en.wikipedia.org/wiki/Cooper_Lake_%28microarchitecture%29), [Tiger Lake](https://en.wikipedia.org/wiki/Tiger_Lake_%28microprocessor%29) | optimized | optimized | diff --git a/docs/user-guide/installation.md b/docs/user-guide/installation.md new file mode 100644 index 0000000000..75b7346bb6 --- /dev/null +++ b/docs/user-guide/installation.md @@ -0,0 +1,47 @@ + + +# DeepSparse Installation + +DeepSparse is tested on Python 3.7-3.10, ONNX 1.5.0-1.10.1, ONNX opset version 11+ and is [manylinux compliant](https://peps.python.org/pep-0513/). + +It currently supports Intel and AMD AVX2, AVX-512, and VNNI x86 instruction sets. + +## General Install + +Use the following command to install DeepSparse with pip: + +```bash +pip install deepsparse +``` + +## Installing the Server + +DeepSparse Server allows you to serve models and pipelines through an HTTP interface using the `deepsparse.server` CLI. +To install, use the following extra option: + +```bash +pip install deepsparse[server] +``` + +## Installing YOLO + +The Ultralytics YOLOv5 models require extra dependencies for deployment. To use YOLO models, install with the following extra option: + +```bash +pip install deepsparse[yolo] # just yolo requirements +pip install deepsparse[yolo,server] # both yolo + server requirements +``` diff --git a/docs/user-guide/scheduler.md b/docs/user-guide/scheduler.md new file mode 100644 index 0000000000..a26c8aa282 --- /dev/null +++ b/docs/user-guide/scheduler.md @@ -0,0 +1,75 @@ + + +# Inference Types With DeepSparse Scheduler + +This page explains the various settings for DeepSparse, which enable you to tune the performance to your workload. + +Schedulers are special system software, which handle the distribution of work across cores in parallel computation. +The goal of a good scheduler is to ensure that, while work is available, cores are not sitting idle. +On the contrary, as long as parallel tasks are available, all cores should be kept busy. + +## Single Stream (Default) +In most use cases, the default scheduler is the preferred choice when running inferences with DeepSparse. +The default scheduler is highly optimized for minimum per-request latency, using all of the system's resources provided to it on every request it gets. +Often, particularly when working with large batch sizes, the scheduler is able to distribute the workload of a single request across as many cores as it's provided. + +*Single-stream scheduling; requests execute serially by default:* + + + + +## Multi-Stream + +There are circumstances in which more cores does not imply better performance. If the computation can't be divided up to produce enough parallelism (while maximizing use of the CPU cache), then adding more cores simply adds more compute power with little work to apply it to. + +An alternative, multi-stream scheduler is provided with the software. In cases where parallelism is low, sending multiple requests simultaneously can more adequately saturate the available cores. In other words, if speedup can't be achieved by adding more cores, then perhaps speedup can be achieved by adding more work. + +If increasing core count does not decrease latency, that's a strong indicator that parallelism is low in your particular model/batch-size combination. It may be that total throughput can be increased by making more requests simultaneously. Using the [deepsparse.engine.Scheduler API,](https://docs.neuralmagic.com/archive/deepsparse/api/deepsparse.html#module-deepsparse.engine) the multi-stream scheduler can be selected, and requests made by multiple Python threads will be handled concurrently. + +*Multi-stream scheduling; requests execute in parallel and may better utilize hardware resources:* + + + + + +Whereas the default scheduler will queue up requests made simultaneously and handle them serially, the multi-stream scheduler allows multiple requests to be run in parallel. The `num_streams` argument to the Engine/Context classes controls how the multi-streams scheduler partitions up the machine. Each stream maps to a contiguous set of hardware threads. By default, only one hyperthread per core is used. There is no sharing amongst the partitions and it is generally good practice to make sure the `num_streams` value evenly divides into your number of cores. By default `num_streams` is set to multiplex requests across L3 caches. + +Here's an example. Consider a machine with 2 sockets, each with 8 cores. In this case, the multi-stream scheduler will create two streams, one per socket by default. The first stream will contain cores 0-7 and the second stream will contain cores 8-15. + +Manually increasing `num_streams` to 3 will result in the following stream breakdown: threads 0-5 in the first stream, 6-10 in the second, and 11-15 in the last. This is problematic for our 2-socket system. The second stream (threads 6-10) is straddling both sockets, meaning that each request being serviced by that stream is going to incur a performance penalty each time one of its threads makes a remote memory access. The impact of this penalty will depend on the workload, but it will likely be significant. + +Manually increasing `num_streams` to 4 is interesting. Here's the stream breakdown: threads 0-3 in the first stream, 4-7 in the second, 8-11 in the third, and 12-15 in the fourth. Each stream is only making memory accesses that are local to its socket, which is good. However, the first two and last two streams are sharing the same L3 cache, which can result in worse performance due to cache thrashing. Depending on the workload, though, the performance gain from the increased parallelism may negate this penalty. + +The most common use cases for the multi-stream scheduler are where parallelism is low with respect to core count, and where requests need to be made asynchronously without time to batch them. Implementing a model server may fit such a scenario and be ideal for using multi-stream scheduling. + +## Enabling a Scheduler + +Depending on your engine execution strategy, enable one of these options by running: + +```python +engine = compile_model(model_path, scheduler="single_stream") +``` + +or: + +```python +engine = compile_model(model_path, scheduler="multi_stream", num_streams=None) # None is the default +``` + +or pass in the enum value directly, since` "multi_stream" == Scheduler.multi_stream`. + +By default, the scheduler will map to a single stream. diff --git a/src/deepsparse/__init__.py b/src/deepsparse/__init__.py index 0c0e7e0d1b..7ac2b698ef 100644 --- a/src/deepsparse/__init__.py +++ b/src/deepsparse/__init__.py @@ -31,11 +31,19 @@ cpu_vnni_compatible, ) from .engine import * -from .tasks import * from .timing import * from .pipeline import * from .loggers import * from .version import __version__, is_release -from .analytics import deepsparse_analytics as _analytics -_analytics.send_event("python__init") +try: + from sparsezoo.package import check_package_version as _check_package_version + + _check_package_version( + package_name=__name__ if is_release else f"{__name__}-nightly", + package_version=__version__, + ) +except Exception as err: + print( + f"Need sparsezoo version above 0.9.0 to run Neural Magic's latest-version check\n{err}" + ) diff --git a/src/deepsparse/benchmark/__init__.py b/src/deepsparse/benchmark/__init__.py index 432d48cf44..91d264f339 100644 --- a/src/deepsparse/benchmark/__init__.py +++ b/src/deepsparse/benchmark/__init__.py @@ -14,10 +14,5 @@ # flake8: noqa -from deepsparse.analytics import deepsparse_analytics as _analytics - from .ort_engine import * from .results import * - - -_analytics.send_event("python__benchmark__init") diff --git a/src/deepsparse/benchmark/benchmark_model.py b/src/deepsparse/benchmark/benchmark_model.py index ae37d7807f..f66c36039c 100644 --- a/src/deepsparse/benchmark/benchmark_model.py +++ b/src/deepsparse/benchmark/benchmark_model.py @@ -422,7 +422,6 @@ def benchmark_model( "seconds_to_run": time, "num_streams": num_streams, "benchmark_result": benchmark_result, - "fraction_of_supported_ops": getattr(model, "fraction_of_supported_ops", None), } # Export results diff --git a/src/deepsparse/benchmark/ort_engine.py b/src/deepsparse/benchmark/ort_engine.py index d2d61e83a1..d16b14578e 100644 --- a/src/deepsparse/benchmark/ort_engine.py +++ b/src/deepsparse/benchmark/ort_engine.py @@ -19,6 +19,8 @@ import numpy from deepsparse.utils import ( + get_input_names, + get_output_names, model_to_path, override_onnx_batch_size, override_onnx_input_shapes, @@ -100,6 +102,9 @@ def __init__( self._num_cores = num_cores self._input_shapes = input_shapes + self._input_names = get_input_names(self._model_path) + self._output_names = get_output_names(self._model_path) + if providers is None: providers = onnxruntime.get_available_providers() self._providers = providers @@ -209,34 +214,6 @@ def scheduler(self) -> None: """ return None - @property - def input_names(self) -> List[str]: - """ - :return: The ordered names of the inputs. - """ - return [node_arg.name for node_arg in self._eng_net.get_inputs()] - - @property - def input_shapes(self) -> List[Tuple]: - """ - :return: The ordered shapes of the inputs. - """ - return [tuple(node_arg.shape) for node_arg in self._eng_net.get_inputs()] - - @property - def output_names(self) -> List[str]: - """ - :return: The ordered names of the outputs. - """ - return [node_arg.name for node_arg in self._eng_net.get_outputs()] - - @property - def output_shapes(self) -> List[Tuple]: - """ - :return: The ordered shapes of the outputs. - """ - return [tuple(node_arg.shape) for node_arg in self._eng_net.get_outputs()] - @property def providers(self) -> List[str]: """ @@ -282,8 +259,8 @@ def run( """ if val_inp: self._validate_inputs(inp) - inputs_dict = {name: value for name, value in zip(self.input_names, inp)} - return self._eng_net.run(self.output_names, inputs_dict) + inputs_dict = {name: value for name, value in zip(self._input_names, inp)} + return self._eng_net.run(self._output_names, inputs_dict) def timed_run( self, inp: List[numpy.ndarray], val_inp: bool = False diff --git a/src/deepsparse/cpu.py b/src/deepsparse/cpu.py index 6f4144c876..cb3ed6e7f5 100644 --- a/src/deepsparse/cpu.py +++ b/src/deepsparse/cpu.py @@ -18,10 +18,8 @@ import json import os -import platform import subprocess import sys -from distutils.version import StrictVersion from typing import Any, Tuple @@ -41,7 +39,6 @@ VALID_VECTOR_EXTENSIONS = {"avx2", "avx512", "neon", "sve"} -MINIMUM_DARWIN_VERSION = "13.0.0" class _Memoize: @@ -143,55 +140,6 @@ def _parse_arch_bin() -> architecture: raise OSError(error_msg.format(ex)) -def allow_experimental_darwin() -> bool: - """ - Check if experimental Darwin support is allowed. - """ - try: - allow = int(os.getenv("NM_ALLOW_DARWIN", "0")) - except ValueError: - allow = False - return allow - - -def get_darwin_version() -> str: - """ - If we are running Darwin, get the current version. Otherwise return None. - """ - if sys.platform.startswith("darwin"): - return platform.mac_ver()[0] - return None - - -def check_darwin_support() -> bool: - """ - Check if the system is running Darwin and it meets the minimum version - requirements. - """ - if sys.platform.startswith("darwin") and allow_experimental_darwin(): - ver = get_darwin_version() - return StrictVersion(ver) >= StrictVersion(MINIMUM_DARWIN_VERSION) - return False - - -def platform_error_msg() -> str: - """ - Generate unsupported platform error message. - """ - if allow_experimental_darwin(): - darwin_str = f" or MacOS >= {MINIMUM_DARWIN_VERSION}" - else: - darwin_str = "" - - darwin_ver = get_darwin_version() - if darwin_ver: - current_os = f"MacOS {darwin_ver}" - else: - current_os = sys.platform - - return f"Neural Magic: Only Linux{darwin_str} is supported, not '{current_os}'." - - def cpu_architecture() -> architecture: """ Detect the CPU details on linux systems @@ -207,8 +155,10 @@ def cpu_architecture() -> architecture: :return: an instance of the architecture class """ - if not (sys.platform.startswith("linux") or check_darwin_support()): - raise OSError(platform_error_msg()) + if not sys.platform.startswith("linux"): + raise OSError( + "Neural Magic: Only Linux is supported, not '{}'.".format(sys.platform) + ) arch = _parse_arch_bin() isa_type_override = os.getenv("NM_ARCH", None) diff --git a/src/deepsparse/engine.py b/src/deepsparse/engine.py index 169b36c023..4cbcf0e86c 100644 --- a/src/deepsparse/engine.py +++ b/src/deepsparse/engine.py @@ -24,13 +24,8 @@ import numpy from tqdm.auto import tqdm -from deepsparse.analytics import deepsparse_analytics as _analytics from deepsparse.benchmark import BenchmarkResults -from deepsparse.utils import ( - generate_random_inputs, - model_to_path, - override_onnx_input_shapes, -) +from deepsparse.utils import model_to_path, override_onnx_input_shapes try: @@ -186,7 +181,6 @@ def __init__( scheduler: Scheduler = None, input_shapes: List[List[int]] = None, ): - _analytics.send_event("python__engine__init") self._model_path = model_to_path(model) self._batch_size = _validate_batch_size(batch_size) self._num_cores = _validate_num_cores(num_cores) @@ -303,34 +297,6 @@ def fraction_of_supported_ops(self) -> float: """ return round(self._eng_net.fraction_of_supported_ops(), 4) - @property - def input_names(self) -> List[str]: - """ - :return: The ordered names of the inputs. - """ - return self._eng_net.input_names() - - @property - def input_shapes(self) -> List[Tuple]: - """ - :return: The ordered shapes of the inputs. - """ - return self._eng_net.input_dims() - - @property - def output_names(self) -> List[str]: - """ - :return: The ordered names of the outputs. - """ - return self._eng_net.output_names() - - @property - def output_shapes(self) -> List[Tuple]: - """ - :return: The ordered shapes of the outputs. - """ - return self._eng_net.output_dims() - @property def cpu_avx_type(self) -> str: """ @@ -348,13 +314,6 @@ def cpu_vnni(self) -> bool: """ return self._cpu_vnni - def generate_random_inputs(self) -> List[numpy.ndarray]: - """ - Generate random data that matches the type and shape of the ONNX model - :return: List of random tensors - """ - return generate_random_inputs(self.model_path, self.batch_size) - def run( self, inp: List[numpy.ndarray], @@ -571,6 +530,47 @@ def benchmark_loader( return results + def analyze( + self, + inp: List[numpy.ndarray], + num_iterations: int = 20, + num_warmup_iterations: int = 5, + optimization_level: int = 1, + imposed_as: Optional[float] = None, + imposed_ks: Optional[float] = None, + ): + """ + Function to analyze a model's performance in the DeepSparse Engine. + + Note 1: Analysis is currently only supported on a single socket. + + :param inp: The list of inputs to pass to the engine for analyzing inference. + The expected order is the inputs order as defined in the ONNX graph. + :param num_iterations: The number of times to repeat execution of the model + while analyzing, default is 20 + :param num_warmup_iterations: The number of times to repeat execution of the model + before analyzing, default is 5 + :param optimization_level: The amount of graph optimizations to perform. + The current choices are either 0 (minimal) or 1 (all), default is 1 + :param imposed_as: Imposed activation sparsity, defaults to None. + Will force the activation sparsity from all ReLu layers in the graph + to match this desired sparsity level (percentage of 0's in the tensor). + Beneficial for seeing how AS affects the performance of the model. + :param imposed_ks: Imposed kernel sparsity, defaults to None. + Will force all prunable layers in the graph to have weights with + this desired sparsity level (percentage of 0's in the tensor). + Beneficial for seeing how pruning affects the performance of the model. + :return: the analysis structure containing the performance details of each layer + """ + return self._eng_net.benchmark( + inp, + num_iterations, + num_warmup_iterations, + optimization_level, + imposed_as, + imposed_ks, + ) + def _validate_inputs(self, inp: List[numpy.ndarray]): if isinstance(inp, str) or not isinstance(inp, List): raise ValueError("inp must be a list, given {}".format(type(inp))) @@ -603,115 +603,6 @@ def _properties_dict(self) -> Dict: } -class DebugAnalysisEngine(Engine): - """ - A subclass of Engine that supports debug analysis. - - :param model: Either a path to the model's onnx file, a SparseZoo model stub - prefixed by 'zoo:', a SparseZoo Model object, or a SparseZoo ONNX File - object that defines the neural network - :param batch_size: The batch size of the inputs to be used with the engine - :param num_cores: The number of physical cores to run the model on. If more - cores are requested than are available on a single socket, the engine - will try to distribute them evenly across as few sockets as possible. - :param num_streams: The max number of requests the model can handle - concurrently. - :param scheduler: The kind of scheduler to execute with. Pass None for the default. - :param input_shapes: The list of shapes to set the inputs to. Pass None to use model as-is. - :param num_iterations: The number of iterations to run benchmarking for. - Default is 20 - :param num_warmup_iterations: T number of iterations to warm up engine before - benchmarking. These executions will not be counted in the benchmark - results that are returned. Useful and recommended to bring - the system to a steady state. Default is 5 - :param include_inputs: If True, inputs from forward passes during benchmarking - will be added to the results. Default is False - :param include_outputs: If True, outputs from forward passes during benchmarking - will be added to the results. Default is False - :param show_progress: If True, will display a progress bar. Default is False - :param scheduler: The kind of scheduler to execute with. Pass None for the default. - """ - - def __init__( - self, - model: Union[str, "Model", "File"], - batch_size: int = 1, - num_cores: int = None, - scheduler: Scheduler = None, - input_shapes: List[List[int]] = None, - num_iterations: int = 20, - num_warmup_iterations: int = 5, - optimization_level: int = 1, - imposed_as: Optional[float] = None, - imposed_ks: Optional[float] = None, - ): - self._model_path = model_to_path(model) - self._batch_size = _validate_batch_size(batch_size) - self._num_cores = _validate_num_cores(num_cores) - self._scheduler = _validate_scheduler(scheduler) - self._input_shapes = input_shapes - self._cpu_avx_type = AVX_TYPE - self._cpu_vnni = VNNI - - num_streams = _validate_num_streams(None, self._num_cores) - if self._input_shapes: - with override_onnx_input_shapes( - self._model_path, self._input_shapes - ) as model_path: - self._eng_net = LIB.deepsparse_engine( - model_path, - self._batch_size, - self._num_cores, - num_streams, - self._scheduler.value, - None, - "external", - num_iterations, - num_warmup_iterations, - optimization_level, - imposed_as, - imposed_ks, - ) - else: - self._eng_net = LIB.deepsparse_engine( - self._model_path, - self._batch_size, - self._num_cores, - num_streams, - self._scheduler.value, - None, - "external", - num_iterations, - num_warmup_iterations, - optimization_level, - imposed_as, - imposed_ks, - ) - - def analyze( - self, - inp: List[numpy.ndarray], - val_inp: bool = True, - ) -> List[numpy.ndarray]: - """ - Function to analyze a model's performance in the DeepSparse Engine. - - Note 1: Analysis is currently only supported on a single socket. - - :param inp: The list of inputs to pass to the engine for analyzing inference. - The expected order is the inputs order as defined in the ONNX graph. - :param val_inp: Validate the input to the model to ensure numpy array inputs - are setup correctly for the DeepSparse Engine - :return: the analysis structure containing the performance details of each layer - """ - if val_inp: - self._validate_inputs(inp) - - [out, bench_info] = self._eng_net.benchmark_execute(inp) - - return bench_info - - class Context(object): """ Contexts can be used to run multiple instances of the MultiModelEngine with the same @@ -969,17 +860,19 @@ def model_debug_analysis( :param scheduler: The kind of scheduler to execute with. Pass None for the default. :return: the analysis structure containing the performance details of each layer """ - model = DebugAnalysisEngine( + model = compile_model( model=model, batch_size=batch_size, num_cores=num_cores, scheduler=scheduler, input_shapes=input_shapes, + ) + + return model.analyze( + inp, num_iterations=num_iterations, num_warmup_iterations=num_warmup_iterations, optimization_level=optimization_level, imposed_as=imposed_as, imposed_ks=imposed_ks, ) - - return model.analyze(inp) diff --git a/src/deepsparse/image_classification/__init__.py b/src/deepsparse/image_classification/__init__.py index cf62c40992..00ceb5828e 100644 --- a/src/deepsparse/image_classification/__init__.py +++ b/src/deepsparse/image_classification/__init__.py @@ -18,10 +18,6 @@ import warnings from collections import namedtuple -from deepsparse.analytics import deepsparse_analytics as _analytics - - -_analytics.send_event("python__image_classification__init") _LOGGER = _logging.getLogger(__name__) _Dependency = namedtuple("_Dependency", ["name", "import_name", "version", "necessary"]) diff --git a/src/deepsparse/open_pif_paf/__init__.py b/src/deepsparse/open_pif_paf/__init__.py index 78fe2add68..8d3ec2e88e 100644 --- a/src/deepsparse/open_pif_paf/__init__.py +++ b/src/deepsparse/open_pif_paf/__init__.py @@ -12,9 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. # flake8: noqa -from deepsparse.analytics import deepsparse_analytics as _analytics - from .utils import * - - -_analytics.send_event("python__open_pif_paf__init") diff --git a/src/deepsparse/server/README.md b/src/deepsparse/server/README.md index c564ef1363..1f8a84ebb4 100644 --- a/src/deepsparse/server/README.md +++ b/src/deepsparse/server/README.md @@ -54,7 +54,7 @@ Usage: deepsparse.server [OPTIONS] COMMAND [ARGS]... prometheus: port: 6100 text_log_save_dir: /home/deepsparse-server/prometheus - text_log_save_frequency: 30 + text_log_save_freq: 30 endpoints: - task: question_answering ... diff --git a/src/deepsparse/server/__init__.py b/src/deepsparse/server/__init__.py index 3ff5d0fb10..4e63b031c7 100644 --- a/src/deepsparse/server/__init__.py +++ b/src/deepsparse/server/__init__.py @@ -19,9 +19,4 @@ the DeepSparse Engine. """ -from deepsparse.analytics import deepsparse_analytics as _analytics - from .cli import main - - -_analytics.send_event("python__server__init") diff --git a/src/deepsparse/server/cli.py b/src/deepsparse/server/cli.py index 1b323e28e3..82aed430b0 100644 --- a/src/deepsparse/server/cli.py +++ b/src/deepsparse/server/cli.py @@ -199,7 +199,7 @@ def main( prometheus: port: 6100 text_log_save_dir: /home/deepsparse-server/prometheus - text_log_save_frequency: 30 + text_log_save_freq: 30 endpoints: - task: question_answering ... diff --git a/src/deepsparse/transformers/__init__.py b/src/deepsparse/transformers/__init__.py index 3e084cece9..b0f7ca22f6 100644 --- a/src/deepsparse/transformers/__init__.py +++ b/src/deepsparse/transformers/__init__.py @@ -22,10 +22,6 @@ import logging as _logging import pkg_resources -from deepsparse.analytics import deepsparse_analytics as _analytics - - -_analytics.send_event("python__transformers__init") _EXPECTED_VERSION = "4.23.1" diff --git a/src/deepsparse/utils/onnx.py b/src/deepsparse/utils/onnx.py index c571e62850..aec78e885a 100644 --- a/src/deepsparse/utils/onnx.py +++ b/src/deepsparse/utils/onnx.py @@ -21,7 +21,6 @@ import numpy import onnx -from onnx.mapping import TENSOR_TYPE_TO_NP_TYPE from deepsparse.utils.extractor import Extractor @@ -36,6 +35,7 @@ sparsezoo_import_error = sparsezoo_err __all__ = [ + "ONNX_TENSOR_TYPE_MAP", "model_to_path", "get_external_inputs", "get_external_outputs", @@ -50,6 +50,23 @@ _LOGGER = logging.getLogger(__name__) +ONNX_TENSOR_TYPE_MAP = { + 1: numpy.float32, + 2: numpy.uint8, + 3: numpy.int8, + 4: numpy.uint16, + 5: numpy.int16, + 6: numpy.int32, + 7: numpy.int64, + 9: numpy.bool_, + 10: numpy.float16, + 11: numpy.float64, + 12: numpy.uint32, + 13: numpy.uint64, + 14: numpy.complex64, + 15: numpy.complex128, +} + def save_onnx(model: Model, model_path: str, external_data_file: str) -> bool: """ @@ -98,9 +115,9 @@ def translate_onnx_type_to_numpy(tensor_type: int): :param tensor_type: Integer representing a type in ONNX spec :return: Corresponding numpy type """ - if tensor_type not in TENSOR_TYPE_TO_NP_TYPE: + if tensor_type not in ONNX_TENSOR_TYPE_MAP: raise Exception("Unknown ONNX tensor type = {}".format(tensor_type)) - return TENSOR_TYPE_TO_NP_TYPE[tensor_type] + return ONNX_TENSOR_TYPE_MAP[tensor_type] def model_to_path(model: Union[str, Model, File]) -> str: diff --git a/src/deepsparse/yolact/__init__.py b/src/deepsparse/yolact/__init__.py index bee4474d74..86aaaa5de4 100644 --- a/src/deepsparse/yolact/__init__.py +++ b/src/deepsparse/yolact/__init__.py @@ -18,10 +18,6 @@ import warnings from collections import namedtuple -from deepsparse.analytics import deepsparse_analytics as _analytics - - -_analytics.send_event("python__yolact__init") _LOGGER = _logging.getLogger(__name__) _Dependency = namedtuple("_Dependency", ["name", "version", "necessary", "import_name"]) diff --git a/src/deepsparse/yolo/__init__.py b/src/deepsparse/yolo/__init__.py index 28a2af36d0..135b18a839 100644 --- a/src/deepsparse/yolo/__init__.py +++ b/src/deepsparse/yolo/__init__.py @@ -14,11 +14,6 @@ # flake8: noqa -from deepsparse.analytics import deepsparse_analytics as _analytics - from .annotate import * from .pipelines import * from .schemas import * - - -_analytics.send_event("python__yolov5__init") diff --git a/src/deepsparse/yolo/pipelines.py b/src/deepsparse/yolo/pipelines.py index 935fc9a1d4..c3866433f3 100644 --- a/src/deepsparse/yolo/pipelines.py +++ b/src/deepsparse/yolo/pipelines.py @@ -163,12 +163,6 @@ class properties into an inference ready onnx file to be compiled by the model_path = model_to_path(self.model_path) if self._image_size is None: self._image_size = get_onnx_expected_image_shape(onnx.load(model_path)) - if self._image_size == (0, 0): - raise ValueError( - "The model does not have a static image size shape. " - "Specify the expected image size by passing the" - "`image_size` argument to the pipeline." - ) else: # override model input shape to given image size if isinstance(self._image_size, int): diff --git a/src/deepsparse/yolo/utils/utils.py b/src/deepsparse/yolo/utils/utils.py index baa4c18721..07e7b87ec2 100644 --- a/src/deepsparse/yolo/utils/utils.py +++ b/src/deepsparse/yolo/utils/utils.py @@ -359,8 +359,6 @@ def modify_yolo_onnx_input_shape( model_input = model.graph.input[0] initial_x, initial_y = get_onnx_expected_image_shape(model) - if initial_x == initial_y == 0: - initial_x, initial_y = image_shape if not (isinstance(initial_x, int) and isinstance(initial_y, int)): return model_path, None # model graph does not have static integer input shape diff --git a/src/deepsparse/yolov8/__init__.py b/src/deepsparse/yolov8/__init__.py index a55c36903d..9efc49cd88 100644 --- a/src/deepsparse/yolov8/__init__.py +++ b/src/deepsparse/yolov8/__init__.py @@ -14,13 +14,8 @@ # flake8: noqa -from deepsparse.analytics import deepsparse_analytics as _analytics - from .annotate import * from .pipelines import * from .schemas import * from .utils import * from .validation import * - - -_analytics.send_event("python__yolov8__init") diff --git a/src/deepsparse/yolov8/utils/validation/helpers.py b/src/deepsparse/yolov8/utils/validation/helpers.py index a951ae8f5d..0db35c462d 100644 --- a/src/deepsparse/yolov8/utils/validation/helpers.py +++ b/src/deepsparse/yolov8/utils/validation/helpers.py @@ -12,53 +12,16 @@ # See the License for the specific language governing permissions and # limitations under the License. import argparse -import glob import os import warnings from typing import List, Optional, Union -import yaml - import torch from deepsparse.yolo import YOLOOutput as YOLODetOutput from deepsparse.yolov8.schemas import YOLOSegOutput -from ultralytics.yolo.data.utils import ROOT - - -__all__ = ["data_from_dataset_path", "schema_to_tensor", "check_coco128_segmentation"] - - -def data_from_dataset_path(data: str, dataset_path: str) -> str: - """ - Given a dataset name, fetch the yaml config for the dataset - from the Ultralytics dataset repo, overwrite its 'path' - attribute (dataset root dir) to point to the `dataset_path` - and finally save it to the current working directory. - This allows to create load data yaml config files that point - to the arbitrary directories on the disk. - - :param data: name of the dataset (e.g. "coco.yaml") - :param dataset_path: path to the dataset directory - :return: a path to the new yaml config file - (saved in the current working directory) - """ - ultralytics_dataset_path = glob.glob(os.path.join(ROOT, "**", data), recursive=True) - if len(ultralytics_dataset_path) != 1: - raise ValueError( - "Expected to find a single path to the " - f"dataset yaml file: {data}, but found {ultralytics_dataset_path}" - ) - ultralytics_dataset_path = ultralytics_dataset_path[0] - with open(ultralytics_dataset_path, "r") as f: - yaml_config = yaml.safe_load(f) - yaml_config["path"] = dataset_path - yaml_save_path = os.path.join(os.getcwd(), data) - # save the new dataset yaml file - with open(yaml_save_path, "w") as outfile: - yaml.dump(yaml_config, outfile, default_flow_style=False) - return yaml_save_path +__all__ = ["schema_to_tensor", "check_coco128_segmentation"] def schema_to_tensor( diff --git a/src/deepsparse/yolov8/validation.py b/src/deepsparse/yolov8/validation.py index 7412b4975c..cc8fd1fbaa 100644 --- a/src/deepsparse/yolov8/validation.py +++ b/src/deepsparse/yolov8/validation.py @@ -12,8 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -from typing import Optional - import click from deepsparse import Pipeline @@ -22,7 +20,6 @@ DeepSparseDetectionValidator, DeepSparseSegmentationValidator, check_coco128_segmentation, - data_from_dataset_path, ) from ultralytics.yolo.cfg import get_cfg from ultralytics.yolo.utils import DEFAULT_CFG @@ -66,6 +63,16 @@ show_default=True, help="Validation batch size", ) +@click.option( + "--stride", + type=int, + default=32, + show_default=True, + help="YOLOv8 can handle arbitrary sized images as long as " + "both sides are a multiple of 32. This is because the " + "maximum stride of the backbone is 32 and it is a fully " + "convolutional network.", +) @click.option( "--engine-type", default=DEEPSPARSE_ENGINE, @@ -88,21 +95,15 @@ show_default=True, help="A subtask of YOLOv8 to run. Default is `detection`.", ) -@click.option( - "--dataset-path", - type=str, - default=None, - help="Path to override default dataset path.", -) def main( dataset_yaml: str, model_path: str, batch_size: int, num_cores: int, engine_type: str, + stride: int, device: str, subtask: str, - dataset_path: Optional[str], ): pipeline = Pipeline.create( @@ -123,8 +124,6 @@ def main( f"Dataset yaml {dataset_yaml} is not supported. " f"Supported dataset configs are {SUPPORTED_DATASET_CONFIGS})" ) - if dataset_path is not None: - args.data = data_from_dataset_path(args.data, dataset_path) classes = {label: class_ for (label, class_) in enumerate(COCO_CLASSES)} if subtask == "detection":