Merge branch 'MaximIntegratedAI:develop' into voyager4

asyatrhl · Feb 5, 2024 · 317b3d2 · 317b3d2
2 parents 4b1f8f1 + 4f664d7
commit 317b3d2
Show file tree

Hide file tree

Showing 7 changed files with 33 additions and 28 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,9 +2,10 @@
 *.mem
 *.prof
 /.mypy_cache/
+/.venv/
 /.vscode/
 **/build/
-/data/
+/data
 /latest_log_dir
 /latest_log_file
 /logs/

diff --git a/docs/FacialRecognitionSystem.md b/docs/FacialRecognitionSystem.md
@@ -1,16 +1,16 @@
 # Facial Recognition System
 
-This document aims to explain facial recognition applications for MAX7800x series microcontrollers. Facial recognition task consists from three main parts, face detection, face identification and dot product.
+This document aims to explain facial recognition applications for MAX7800x series microcontrollers. The facial recognition task consists of three main parts: face detection, face identification and dot product:
 
-    - The face detection model detects faces in the captured image and extracts a rectangular sub-image containing only one face.
-    - The face Identification model identifies a person from their facial images by generating the embedding for a given face image.
-    - The dot product layer outputs the dot product representing the similarity between the embedding from the given image and embeddings in the database.
+- The face detection model detects faces in the captured image and extracts a rectangular sub-image containing only one face.
+- The face Identification model identifies a person from their facial images by generating the embedding for a given face image.
+- The dot product layer outputs the dot product representing the similarity between the embedding from the given image and embeddings in the database.
 
 Figure 1 depicts the facial recognition system sequential diagram.
 
 <img src="facialrecognition.png" style="zoom: 50%;" />
 
-Figure 1. MAX7800x facial recognition system
+Figure 1. MAX7800x facial recognition system
 
 ## Dataset
 
@@ -28,28 +28,32 @@ FaceID and Face Detection tasks share the same ground truth pickle, and it will
 
 ## Face Detection
 
-To be able to localize faces in a facial recognition system, a face detection algorithm is generally used in facial recognition systems. Face detection is an object detection problem that has various solutions in the literature. In this work, a face detection algorithm that will run on MAX7800x series microcontrollers with a real-time performance was targeted.
+To be able to localize faces in a facial recognition system, a face detection algorithm is generally used in facial recognition systems. Face detection is an object detection problem that has various solutions in the literature. In this work, a face detection algorithm that will run on MAX7800x series microcontrollers with real-time performance was targeted.
 
-For digit detection problem, previously, a Tiny SSD[2] based MAX7800x object detection algorithm was developed , named Tinier SSD. The face detection model is a modified version of the digit detection model. The modification was realized to reduce the number of parameters and enable larger input size.
+For the digit detection problem, previously, a TinySSD[2] based MAX7800x object detection algorithm was developed, named TinierSSD. The face detection model is a modified version of the digit detection model. The modification reduces the number of parameters and enables larger input sizes.
 
-To train the facedetection model, "scripts/train_facedet_tinierssd.sh" script can be used.
+To train the face detection model, `scripts/train_facedet_tinierssd.sh` can be used.
 
 ## FaceID
 
 To train a FaceID model for MAX7800x microcontrollers, there are multiple steps. As the MAX7800x FaceID models will be trained in a knowledge distillation fashion, the first step will be downloading a backbone checkpoint for the teacher model.
 
-The suggested teacher model is IR-152, but the other teacher models defined in "model_irse_drl.py" may be used as well. Please review the terms and conditions at face.evoLVe[3] repository, and download the checkpoint according to your teacher model selection. Then, the checkpoint should be placed in a folder (e.g. "pretrained") in the root directory of the repository. By using the "--backbone-checkpoint" argument, the path to the checkpoint should be given to the training script.
+The suggested teacher model is IR-152, but the other teacher models defined in `model_irse_drl.py` may be used as well. Please review the terms and conditions at face.evoLVe[3] repository, and download the checkpoint according to your teacher model selection.
 
-There are two FaceID models, one for the MAX78000 and one for the MAX78002. The MAX78000 one is named faceid_112, and it is a relatively lightweight model. To enable more performance on MAX78002, a more complex model was developed, which is named mobilefacenet_112. To train the FaceID models, "scripts/train_faceid_112.sh" and "scripts/train_mobilefacenet_112.sh" scripts can be used, respectivey. Training scripts will realize Dimensionality Reduction and Relation Based-Knowledge Knowledge Distillation steps automatically. A summary of Dimensionality Reduction and Relation-Based Knowledge Distillation can be found in the following sub-sections.
+By default, both `scripts/train_faceid_112.sh` and `scripts/train_mobilefacenet_112.sh` use `Backbone_IR_152_Epoch_112_Batch_2547328_Time_2019-07-13-02-59_checkpoint.pth` which needs to be placed in the folder `pretrained` in the root directory of the repository. This checkpoint can be found via the [Model Zoo](https://github.com/ZhaoJ9014/face.evoLVe?tab=readme-ov-file#Model-Zoo) section of the face.evoLVe repository under “IR-152”.
+
+There are two FaceID models, one for the MAX78000 and one for the MAX78002. The MAX78000 one is named `faceid_112`, and it is a relatively lightweight model. To enable more performance on MAX78002, a more complex model was developed, which is named `mobilefacenet_112`. To train the FaceID models, `scripts/train_faceid_112.sh` and `scripts/train_mobilefacenet_112.sh` scripts can be used, respectively. By using the `--backbone-checkpoint` argument, the path to the checkpoint can be changed.
+
+The training scripts will run the Dimensionality Reduction and Relation Based-Knowledge Knowledge Distillation steps automatically. A summary of Dimensionality Reduction and Relation-Based Knowledge Distillation can be found in the following sub-sections.
 
 ### Dimensionality Reduction on the Teacher Model
 
-Reducing embedding dimensionality can greatly reduce the post-processing operations and memory usage for the facial recognition system. To achieve this, the teacher backbone will be frozen and two additional Conv1d layers will be added to the teacher models. These additions are called dimension reduction layers. For the example in the repository, the length of the embeddings produced by the teacher model is 512 and the optimum length for the student model is found to be 64. Still, other choices like 32, 128 or 256 can be examined for different application areas. A summary of the dimensionality reduction is shown in Figure 2, and dimension reduction layers' details are represented in Table 1.
+Reducing embedding dimensionality can greatly reduce the post-processing operations and memory usage for the facial recognition system. To achieve this, the teacher backbone will be frozen and two additional Conv1d layers will be added to the teacher models. These additions are called dimension reduction layers. For the example in the repository, the length of the embeddings produced by the teacher model is 512 and the optimum length for the student model is found to be 64. Still, other choices like 32, 128 or 256 can be examined for different application areas. A summary of the dimensionality reduction is shown in Figure 2, and dimension reduction layer details are shown in Table 1.
 
 
 <img src="dimensionreductionlayers.png" style="zoom: 30%;" />
 
-Figure 2. Dimensionality Reduction
+Figure 2. Dimensionality Reduction
 
 
 
@@ -63,29 +67,29 @@ Table 1. Dimension Reduction Layers
 
 
 
-To train dimensionality reduction layers Sub-Center ArcFace loss is used. The SubCenterArcFace Loss was presented in the [4], summary of the training framework can be seen in Figure 3. The loss function uses cosine similarity as the distance metric, and in the framework embedding network is trained as a part of the classification problem. The Normalized Sub-Centers(also known as the prototypes) must be learned from zero as no model is available to extract embeddings at the beginning.
+To train dimensionality reduction layers, the Sub-Center ArcFace loss is used. The SubCenterArcFace Loss was presented in the [4], and a summary of the training framework can be seen in Figure 3. This loss function uses cosine similarity as the distance metric, and in the framework embedding network is trained as a part of the classification problem. The Normalized Sub-Centers (also known as the prototypes) must be learned from zero as no model is available to extract embeddings at the beginning.
 
 <img src="SubCenterArcFaceLoss.png" style="zoom: 100%;" />
 
-Figure 3. Sub-Center ArcFace Loss[4]
+Figure 3. Sub-Center ArcFace Loss[4]
 
 ### Relation-Based Knowledge Distillation
 
-The knowledge distillation choice for the FaceID models was a relation-based one. The distillation loss was calculated as the MSE between teacher model and student model.
+The knowledge distillation choice for the FaceID models is relation-based. The distillation loss is calculated as the MSE between teacher model and student model.
 
-To train the student FaceID models, no student loss was used, so student weight was set to 0.
+To train the student FaceID models, no student loss is used, so the student weight is set to 0.
 
-From Figure 4, a visual can be seen for the relation-based knowledge distillation.
+Figure 4 visually represents the relation-based knowledge distillation.
 
 <img src="RelationBasedKD.png" style="zoom: 100%;" />
 
-Figure 4. Relation-Based Knowledge Distillation[5]
+Figure 4. Relation-Based Knowledge Distillation[5]
 
 
 
 ## Dot Product Layer
 
-The dot product layer weights will be populated with the embeddings that are generated by MAX7800x FaceID models. Outputs of the FaceID models are normalized at both inference and recording. Therefore, the result of the dot product layer equals cosine similarity. Using the cosine similarity as a distance metric, the image is identified as either one of the known subjects or 'Unknown' depending on the embedding distances. To record new people in the database, there are two options. The first one is using the Python scripts that are available on the SDK demos. The second option is to use "record on hardware" mode which does not require any external connection. The second option is not available for all platforms, so please check SDK demo ReadMEs to see if it is supported.
+The dot product layer weights will be populated with the embeddings that are generated by MAX7800x FaceID models. The outputs of the FaceID models are normalized at both inference and recording. Therefore, the result of the dot product layer equals cosine similarity. Using the cosine similarity as a distance metric, the image is identified as either one of the known subjects or “Unknown”, depending on the embedding distances. To record new people in the database, there are two options. The first one is using the Python scripts that are available with the SDK demos. The second option is to use a “record on hardware” mode which does not require any external connection. The second option is not available for all platforms, therefore please check the SDK demo README files.
 
 
 

diff --git a/models/model_irse_drl.py b/models/model_irse_drl.py
@@ -307,11 +307,11 @@ def create_model(input_size=(112, 112),  # pylint: disable=unused-argument
             model.load_state_dict(torch.load(backbone_checkpoint,
                                              map_location=torch.device('cpu')))
         except FileNotFoundError:
-            print('Backbone checkpoint was not found. Please follow the '
-                  'instructions in the docs/FacialRecognitionSystem.md file, '
-                  'FaceID section to download the backbone checkpoint.',
+            print(f'Backbone checkpoint {backbone_checkpoint} not found. Please follow the '
+                  'instructions in docs/FacialRecognitionSystem.md, section ## FaceID, '
+                  'to download the backbone checkpoint.',
                   file=sys.stderr)
-            sys.exit()
+            sys.exit(1)
     for param in model.parameters():
         param.requires_grad = False
     drl = DRL(dimensionality)

diff --git a/scripts/evaluate_faceid_112.sh b/scripts/evaluate_faceid_112.sh
@@ -1,2 +1,2 @@
 #!/bin/sh
-python train.py --model ai85faceidnet_112 --dataset VGGFace2_FaceID --kd-student-wt 0 --kd-distill-wt 1  --kd-teacher ir_152 --kd-resume pretrained/ir152_dim64/best.pth.tar --kd-relationbased --evaluate --device MAX78000 --exp-load-weights-from ../ai8x-synthesis/trained/ai85-faceid_112-qat-q.pth.tar -8 --use-bias --save-sample 10 --slice-sample "$@"
+python train.py --model ai85faceidnet_112 --dataset VGGFace2_FaceID --kd-student-wt 0 --kd-distill-wt 1  --kd-teacher ir_152 --kd-resume pretrained/ir152_dim64/best.pth.tar --kd-relationbased --evaluate --device MAX78000 --exp-load-weights-from ../ai8x-synthesis/trained/ai85-faceid_112-qat-q.pth.tar -8 --use-bias --save-sample 10 --slice-sample "$@"
diff --git a/scripts/evaluate_mobilefacenet_112.sh b/scripts/evaluate_mobilefacenet_112.sh
@@ -1,2 +1,2 @@
 #!/bin/sh
-python train.py --model ai87netmobilefacenet_112 --dataset VGGFace2_FaceID --kd-student-wt 0 --kd-distill-wt 1  --kd-teacher ir_152 --kd-resume pretrained/ir152_dim64/best.pth.tar --kd-relationbased --evaluate --device MAX78002 --exp-load-weights-from ../ai8x-synthesis/trained/ai87-mobilefacenet_112_qat_best-q.pth.tar -8 --use-bias --save-sample 10 --slice-sample "$@"
+python train.py --model ai87netmobilefacenet_112 --dataset VGGFace2_FaceID --kd-student-wt 0 --kd-distill-wt 1  --kd-teacher ir_152 --kd-resume pretrained/ir152_dim64/best.pth.tar --kd-relationbased --evaluate --device MAX78002 --exp-load-weights-from ../ai8x-synthesis/trained/ai87-mobilefacenet_112_qat_best-q.pth.tar -8 --use-bias --save-sample 10 --slice-sample "$@"
diff --git a/scripts/train_faceid_112.sh b/scripts/train_faceid_112.sh
@@ -1,3 +1,3 @@
 #!/bin/sh
-python train.py --epochs 4 --optimizer Adam --lr 0.001 --scaf-lr 1e-2 --scaf-scale 32 --copy-output-folder pretrained/ir152_dim64 --wd 5e-4 --deterministic --workers 8 --qat-policy None  --model ir_152 --dr 64 --backbone-checkpoint pretrained/Backbone_IR_152_Epoch_112_Batch_2547328_Time_2019-07-13-02-59_checkpoint.pth --use-bias --dataset VGGFace2_FaceID_dr --batch-size 64 --device MAX78000 --validation-split 0 --print-freq 250 "$@"
+python train.py --epochs 4 --optimizer Adam --lr 0.001 --scaf-lr 1e-2 --scaf-scale 32 --copy-output-folder pretrained/ir152_dim64 --wd 5e-4 --deterministic --workers 8 --qat-policy None  --model ir_152 --dr 64 --backbone-checkpoint pretrained/Backbone_IR_152_Epoch_112_Batch_2547328_Time_2019-07-13-02-59_checkpoint.pth --use-bias --dataset VGGFace2_FaceID_dr --batch-size 64 --device MAX78000 --validation-split 0 --print-freq 250 "$@" || exit 1
 python train.py --epochs 80 --optimizer Adam --lr 0.001 --compress policies/schedule-faceid_112.yaml --kd-student-wt 0 --kd-distill-wt 1 --qat-policy policies/qat_policy_faceid_112.yaml --model ai85faceidnet_112 --kd-teacher ir_152 --kd-resume pretrained/ir152_dim64/best.pth.tar --kd-relationbased --wd 0 --deterministic --workers 8 --use-bias --dataset VGGFace2_FaceID --batch-size 256 --device MAX78000 --print-freq 100 --validation-split 0 "$@"
diff --git a/scripts/train_mobilefacenet_112.sh b/scripts/train_mobilefacenet_112.sh
@@ -1,3 +1,3 @@
 #!/bin/sh
-python train.py --epochs 4 --optimizer Adam --lr 0.001 --scaf-lr 1e-2 --scaf-scale 32 --copy-output-folder pretrained/ir152_dim64 --wd 5e-4 --deterministic --workers 8 --qat-policy None  --model ir_152 --dr 64 --backbone-checkpoint pretrained/Backbone_IR_152_Epoch_112_Batch_2547328_Time_2019-07-13-02-59_checkpoint.pth --use-bias --dataset VGGFace2_FaceID_dr --batch-size 64 --device MAX78000 --validation-split 0 --print-freq 250 "$@"
+python train.py --epochs 4 --optimizer Adam --lr 0.001 --scaf-lr 1e-2 --scaf-scale 32 --copy-output-folder pretrained/ir152_dim64 --wd 5e-4 --deterministic --workers 8 --qat-policy None  --model ir_152 --dr 64 --backbone-checkpoint pretrained/Backbone_IR_152_Epoch_112_Batch_2547328_Time_2019-07-13-02-59_checkpoint.pth --use-bias --dataset VGGFace2_FaceID_dr --batch-size 64 --device MAX78000 --validation-split 0 --print-freq 250 "$@" || exit 1
 python train.py --epochs 35 --optimizer Adam --lr 0.001 --compress policies/schedule-mobilefacenet_112.yaml --kd-student-wt 0 --kd-distill-wt 1 --qat-policy policies/qat_policy_mobilefacenet_112.yaml --model ai87netmobilefacenet_112 --kd-teacher ir_152 --kd-resume pretrained/ir152_dim64/best.pth.tar --kd-relationbased --wd 0 --deterministic --workers 8 --use-bias --dataset VGGFace2_FaceID --batch-size 100 --device MAX78002 --validation-split 0 --print-freq 100 "$@"