Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QATv2 updates, minor bug fixes #331

Merged
merged 12 commits into from
Oct 30, 2024
4 changes: 2 additions & 2 deletions .github/linters/.python-lint
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ ignored-classes = ModelProto
max-line-length = 99
[DESIGN]
max-locals=100
max-statements=350
max-statements=360
min-public-methods=1
max-branches=130
max-module-lines=5000
max-args=20
max-returns=10
max-attributes=25
max-attributes=30
max-nested-blocks=10
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ADI MAX78000/MAX78002 Model Training and Synthesis

July 22, 2024
August 27, 2024

**Note: This branch requires PyTorch 2. Please see the archive-1.8 branch for PyTorch 1.8 support. [KNOWN_ISSUES](KNOWN_ISSUES.txt) contains a list of known issues.**

Expand Down Expand Up @@ -1620,13 +1620,15 @@ When using the `-8` command line switch, all module outputs are quantized to 8-b

The last layer can optionally use 32-bit output for increased precision. This is simulated by adding the parameter `wide=True` to the module function call.

##### Weights: Quantization-Aware Training (QAT)
##### Weights and Activations: Quantization-Aware Training (QAT)

Quantization-aware training (QAT) is enabled by default. QAT is controlled by a policy file, specified by `--qat-policy`.

* After `start_epoch` epochs, training will learn an additional parameter that corresponds to a shift of the final sum of products.
* After `start_epoch` epochs, an intermediate epoch with no backpropagation will be realized to collect activation statistics. Each layer's activation ranges will be determined based on the range & resolution trade-off from the collected activations. Then, QAT will start and an additional parameter (`output_shift`) will be learned to shift activations for compensating weights & biases scaling down.
* `weight_bits` describes the number of bits available for weights.
* `overrides` allows specifying the `weight_bits` on a per-layer basis.
* `outlier_removal_z_score` defines the z-score threshold for outlier removal during activation range calculation. (default: 8.0)
* `shift_quantile` defines the quantile of the parameters distribution to be used for the `output_shift` parameter. (default: 1.0)

By default, weights are quantized to 8-bits after 30 epochs as specified in `policies/qat_policy.yaml`. A more refined example that specifies weight sizes for individual layers can be seen in `policies/qat_policy_cifar100.yaml`.

Expand Down Expand Up @@ -1745,7 +1747,7 @@ For both approaches, the `quantize.py` software quantizes an existing PyTorch ch

#### Quantization-Aware Training (QAT)

Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights: Quantization-Aware Training (QAT)](#weights-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`.
Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`.

The input checkpoint to `quantize.py` is either `qat_best.pth.tar`, the best QAT epoch’s checkpoint, or `qat_checkpoint.pth.tar`, the final QAT epoch’s checkpoint.

Expand Down Expand Up @@ -2004,7 +2006,7 @@ The behavior of a training session might change when Quantization Aware Training
While there can be multiple reasons for this, check two important settings that can influence the training behavior:

* The initial learning rate may be set too high. Reduce LR by a factor of 10 or 100 by specifying a smaller initial `--lr` on the command line, and possibly by reducing the epoch `milestones` for further reduction of the learning rate in the scheduler file specified by `--compress`. Note that the the selected optimizer and the batch size both affect the learning rate.
* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights: Quantization-Aware Training (QAT)](#weights:-auantization-aware-training \(qat\)).*
* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat).*



Expand Down Expand Up @@ -2209,6 +2211,7 @@ The following table describes the most important command line arguments for `ai8
| `--no-unload` | Do not create the `cnn_unload()` function | |
| `--no-kat` | Do not generate the `check_output()` function (disable known-answer test) | |
| `--no-deduplicate-weights` | Do not deduplicate weights and and bias values | |
| `--no-scale-output` | Do not use scales from the checkpoint to recover output range while generating `cnn_unload()` function | |

oguzhanbsolak marked this conversation as resolved.
Show resolved Hide resolved
### YAML Network Description

Expand Down Expand Up @@ -2330,6 +2333,12 @@ The following keywords are required for each `unload` list item:
`width`: Data width (optional, defaults to 8) — either 8 or 32
`write_gap`: Gap between data words (optional, defaults to 0)

When `--no-scale-output` is not specified, scales from the checkpoint file are used to recover the output range. If there is a non-zero scale for the 8 bits output, the output will be scaled and kept in 16 bits. If the scale is zero, the output will be 8 bits. For 32 bits output, the output will be kept in 32 bits always.

Example:

![Unload Array](docs/unload_example.png)

##### `layers` (Mandatory)

`layers` is a list that defines the per-layer description, as shown below:
Expand Down Expand Up @@ -2654,7 +2663,7 @@ Example:
By default, the final layer is used as the output layer. Output layers are checked using the known-answer test, and they are copied from hardware memory when `cnn_unload()` is called. The tool also checks that output layer data isn’t overwritten by any later layers.

When specifying `output: true`, any layer (or a combination of layers) can be used as an output layer.
*Note:* When `unload:` is used, output layers are not used for generating `cnn_unload()`.
*Note:* When `--no-unload` is used, output layers are not used for generating `cnn_unload()`.

Example:
`output: true`
Expand Down
Binary file modified README.pdf
Binary file not shown.
Loading