Skip to content

GreenWaves-Technologies/visual_wake_words

Repository files navigation

Visual Wake Words for GAP9

In this project you can run visual wake words task on GAP9 chip. The NNs used for this task have been taken from open source projects:

The model can be chosen via Kconfig, NNTool will take the right model path to generate the Autotiler code using the nntool_generate_model.py script. Once the Autotiler Model is generated it is compiled and run to generate the final NN GAP code.

The same script can be used to test the deployable model executing inference on NNTool with the provided image.

The application has 2 operating modes that can be chosen via Kconfig described in the following:

INFERENCE:

In this mode it simply run the NN on samples from files. This mode can emulate the DEMO mode by resizing the image from the CAMERA size (emulating the images coming from a similar camera) by enabling the INFERENCE_RESIZER in Kconfig. Otherwise no resize will be applied and the images from files are expected of the correct size.

You can test the expected results using the target test_nntool_inference: it will use the same image used by the C code on GAP and runs the NNTool inference using nntool_generate_model.py --mode=test.

DEMO:

NOTE: Only usable on board using camera OV5647

In this mode the application runs on GAP9 taking inputs from the camera and running inference with the selected NN.

Accuracy

To test the accuracy of the models you can use the scripts in the accuracy folder:

  1. Download the coco dataset using download_coco.sh
  2. create the VWW annotations from the coco dataset with the visualwakewords package using create_vww_dataset.sh (it will automatically clone the repo and create the annotations)
  3. Run the accuracy script test_accuracy.py: by default it will run the original tflite model, if you provide --test_nntool flag, it will run it in NNTool.

NOTE: this accuracy has been calculated with the scripts above, the image preprocessing is just a bilinear resize of the original coco image without cropping. Do not compare these numbers with publicly available accuracy metrics since they might differ in the way the images are preprocessed.

Model TFLite Acc NNTool Acc
visual_wake_quant.tflite 89.45% (0: 97.13%, 1: 82.75%) 88.57% (0: 97.71%, 1: 80.59%)
vww_96_int8.tflite 79.56% (0: 94.04%, 1: 66.91%) 79.41% (0: 94.01%, 1: 66.71%)