4:2 compressors validate

intel · Dec 7, 2024 · d562567 · d562567
1 parent ffe15af
commit d562567
Show file tree

Hide file tree

Showing 43 changed files with 3,080 additions and 1,206 deletions.
diff --git a/.github/workflows/general.yml b/.github/workflows/general.yml
@@ -62,6 +62,9 @@ jobs:
 
       - name: Generate HTML for examples
         run: tool/gh_actions/create_htmls.sh
+
+      - name: Check temporary test files
+        run: tool/gh_actions/check_tmp_test.sh
 
       # https://github.com/devcontainers/ci/blob/main/docs/github-action.md
       - name: Build dev container and run tests in it

diff --git a/.gitignore b/.gitignore
@@ -18,6 +18,7 @@ tmp*
 confapp/.vscode/*
 *tracker.json
 *tracker.log
+devtools_options.yaml
 *.sv
 
 # Exceptions

diff --git a/doc/README.md b/doc/README.md
@@ -65,6 +65,7 @@ Some in-development items will have opened issues, as well. Feel free to create
 - Counters
   - [Summation](./components/summation.md#sum)
   - [Binary counter](./components/summation.md#counter)
+  - [Gated counter](./components/summation.md#gated-counter)
   - Gray counter
 - Pseudorandom
   - LFSR
@@ -73,8 +74,9 @@ Some in-development items will have opened issues, as well. Feel free to create
   - CRC
   - [Parity](./components/parity.md)
   - Interleaving
-- Clocking
+- Gating
   - [Clock gating](./components/clock_gating.md)
+  - [Toggle gating](./components/toggle_gate.md)
 - Data flow
   - Ready/Valid
   - Connect/Disconnect

diff --git a/doc/components/divider.md b/doc/components/divider.md
@@ -1,13 +1,13 @@
 # Divider
 
-ROHD HCL provides an integer divider module to get the dividend of numerator and denominator operands. The divider implementation is not pipelined and has a maximum latency of the bit width of the operands.
+ROHD HCL provides an integer divider module to get the quotient and the remainder of dividend and divisor operands. The divider implementation is not pipelined and has a minimum latency of 3 cycles. The maximum latency is dependent on the width of the operands (upper bound of `O(WIDTH**2)`). Note that latency increases exponentially as the absolute difference between the dividend and the divisor increases (worst case: largest possible dividend and divisor of 1).
 
 ## Interface
 
 The inputs to the divider module are:
 
 * `clock` => clock for synchronous logic
-* `reset` => reset for synchronous logic (active high)
+* `reset` => reset for synchronous logic (active high, synchronous to `clock`)
 * `dividend` => the numerator operand
 * `divisor` => the denominator operand
 * `isSigned` => should the operands of the division be treated as signed integers
@@ -30,7 +30,7 @@ To initiate a new request, it is expected that the requestor drive `validIn` to
 
 When the division is complete, the module will assert the `validOut` signal along with the numerical values of `quotient` and `remainder` representing the division result and the signal `divZero` to indicate whether or not a division by zero occurred. The module will hold these signal values until `readyOut` is driven high by the integrating environment. The integrating environment must assume that `quotient` and `remainder` are meaningless if `divZero` is asserted.
 
-### Mathematical Properties
+## Mathematical Properties
 
 For the division, implicit rounding towards 0 is always performed. I.e., a negative quotient will always be rounded up if the dividend is not evenly divisible by the divisor. Note that this behavior is not uniform across all programming languages (for example, Python rounds towards negative infinity).
 
@@ -65,3 +65,7 @@ if (divIntf.validOut.value.toBool()) {
 }
 
 ```
+
+## Future Considerations
+
+In the future, an optimization might be added in which the `remainder` output is optional and controlled by a build time constructor parameter. If the remainder does not need to be computed, the implementation's upper bound latency can be significantly improved (`O(WIDTH**2)` => `O(WIDTH)`).
diff --git a/doc/components/multiplier.md b/doc/components/multiplier.md
@@ -4,7 +4,12 @@ ROHD-HCL provides an abstract `Multiplier` module which multiplies two
 numbers represented as two `Logic`s, potentially of different widths,
 treating them as either signed (2s complement) or unsigned. It
 produces the product as a `Logic` with width equal to the sum of the
-widths of the inputs. As of now, we have the following implementations
+widths of the inputs. The signs of the operands are either fixed by a parameter,
+or runtime selectable, e.g.:   `signedMultiplicand` or `selectSignedMultiplicand`.
+The output of the multiplier also has a signal telling us if the result is to be
+treated as signed.
+
+As of now, we have the following implementations
 of this abstract `Module`:
 
 - [Carry Save Multiplier](#carry-save-multiplier)
@@ -13,7 +18,13 @@ of this abstract `Module`:
 An additional kind of abstract module provided is a
 `MultiplyAccumulate` module which multiplies two numbers represented
 as two `Logic`s and adds the result to a third `Logic` with width
-equal to the sum of the widths of the main inputs. We have a
+equal to the sum of the widths of the main inputs. Similar to the `Multiplier`,
+the signs of the operands are either fixed by a parameter,
+or runtime selectable, e.g.:   `signedMultiplicand` or `selectSignedMultiplicand`.
+The output of the multiply-accumulate also has a signal telling us if the result is to be
+treated as signed.
+
+We have a
 high-performance implementation:
 
 - [Compression Tree Multiply Accumulate](#compression-tree-multiply-accumulate)
@@ -22,7 +33,7 @@ The compression tree based arithmetic units are built from a set of components f
 
 ## Carry Save Multiplier
 
-Carry save multiplier is a digital circuit used for performing multiplication operations. It
+The carry-save multiplier is a digital circuit used for performing multiplication operations. It
 is particularly useful in applications that require high speed
 multiplication, such as digital signal processing.
 
@@ -31,7 +42,8 @@ The
 module in ROHD-HCL accept input parameters the clock `clk` signal,
 reset `reset` signal, `Logic`s' a and b as the input pin and the name
 of the module `name`. Note that the width of the inputs must be the
-same or `RohdHclException` will be thrown.
+same or `RohdHclException` will be thrown.  The output latency is equal to the width of the inputs
+given by `latency` on the component.
 
 An example is shown below to multiply two inputs of signals that have 4-bits of width.
 
@@ -82,16 +94,18 @@ digital signal processing.
 The parameters of the
 `CompressionTreeMultiplier` are:
 
-- Two input terms `a` and `b` which can be different widths
-- The radix used for Booth encoding (2, 4, 8, and 16 are currently supported)
-- The type of `ParallelPrefix` tree used in the final `ParallelPrefixAdder` (optional)
-- `signed` parameter: whether the operands should be treated as signed (2s complement) or unsigned
+- Two input terms `a` and `b` which can be different widths.
+- The radix used for Booth encoding (2, 4, 8, and 16 are currently supported).
+- The type of `ParallelPrefix` tree used in the final `ParallelPrefixAdder` (optional).
 - `ppGen` parameter: the type of `PartialProductGenerator` to use which has derived classes for different styles of sign extension. In some cases this adds an extra row to hold a sign bit.
-- An optional `selectSigned` control signal which overrides the `signed` configuration allowing for runtime control of signed or unsigned operation with the same hardware. `signed` must be false if using this control signal.
+- `signedMultiplicand` parameter: whether the multiplicand (first arg) should be treated as signed (2s complement) or unsigned.
+- `signedMultiplier` parameter: whether the multiplier (second arg) should be treated as signed (2s complement) or unsigned.
+- An optional `selectSignedMultiplicand` control signal which overrides the `signedMultiplicand` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplicand` must be false if using this control signal.
+- An optional `selectSignedMultiplier` control signal which overrides the `signedMultiplier` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplier` must be false if using this control signal.
 - An optional `clk`, as well as `enable` and `reset` that are used to add a pipestage in the `ColumnCompressor` to allow for pipelined operation.
 - An optional `use42Compressors` boolean enables the `ColumnCompressor` to use 4:2 compressors in addition to 3:2 (Full Adder) and 2:2 (Half Adder) compressors.
 
-Here is an example of use of the `CompressionTreeMultiplier`:
+Here is an example of use of the `CompressionTreeMultiplier` with one signed input:
 
 ```dart
     const widthA = 6;
@@ -104,7 +118,7 @@ Here is an example of use of the `CompressionTreeMultiplier`:
     b.put(3);
 
     final multiplier =
-        CompressionTreeMultiplier(a, b, radix, signed: true);
+        CompressionTreeMultiplier(a, b, radix, signedMultiplicand: true);
 
     final product = multiplier.product;
 
@@ -124,13 +138,17 @@ The parameters of the
 - The accumulate input term `c` which must have width as sum of the two operand widths + 1.
 - The radix used for Booth encoding (2, 4, 8, and 16 are currently supported)
 - The type of `ParallelPrefix` tree used in the final `ParallelPrefixAdder` (default Kogge-Stone).
-- `signed` parameter: whether the operands should be treated as signed (2s complement) or unsigned
 - `ppGen` parameter: the type of `PartialProductGenerator` to use which has derived classes for different styles of sign extension. In some cases this adds an extra row to hold a sign bit (default `PartialProductGeneratorCompactRectSignExtension`).
-- An optional `selectSigned` control signal which overrides the `signed` configuration allowing for runtime control of signed or unsigned operation with the same hardware. `signed` must be false if using this control signal.
+- `signedMultiplicand` parameter: whether the multiplicand (first arg) should be treated as signed (2s complement) or unsigned
+- `signedMultiplier` parameter: whether the multiplier (second arg) should be treated as signed (2s complement) or unsigned
+- `signedAddend` parameter: whether the addend (third arg) should be treated as signed (2s complement) or unsigned
+- An optional `selectSignedMultiplicand` control signal which overrides the `signedMultiplicand` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplicand` must be false if using this control signal.
+- An optional `selectSignedMultiplier` control signal which overrides the `signedMultiplier` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplier` must be false if using this control signal.
+- An optional `selectSignedAddend` control signal which overrides the `signedAddend` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedAddend` must be false if using this control signal.
 - An optional `clk`, as well as `enable` and `reset` that are used to add a pipestage in the `ColumnCompressor` to allow for pipelined operation.
 - An optional `use42Compressors` boolean enables the `ColumnCompressor` to use 4:2 compressors in addition to 3:2 (Full Adder) and 2:2 (Half Adder) compressors.
 
-Here is an example of using the `CompressionTreeMultiplyAccumulate`:
+Here is an example of using the `CompressionTreeMultiplyAccumulate` with all inputs as signed:
 
 ```dart
     const widthA = 6;
@@ -144,7 +162,7 @@ Here is an example of using the `CompressionTreeMultiplyAccumulate`:
     b.put(3);
     c.put(5);
 
-    final multiplier = CompressionTreeMultiplyAccumulate(a, b, c, radix, signed: true);
+    final multiplier = CompressionTreeMultiplyAccumulate(a, b, c, radix, signedMultiplicand: true, signedMultiplier: true, signedAddend: true);
 
     final accumulate = multiplier.accumulate;
     

diff --git a/doc/components/multiplier_components.md b/doc/components/multiplier_components.md
@@ -51,7 +51,7 @@ row  slice  mult
 
 A few things to note: first, that we are negating by 1s complement (so we need a -0) and second, these rows do not add up to (18: 10010). For Booth encoded rows to add up properly, they need to be in 2s complement form, and they need to be sign-extended.
 
- Here is the matrix with crude sign extension (this formatting is available from our `PartialProductGenerator` component). With 2s complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010).
+ Here is the matrix with a crude sign extension `brute` (the table formatting is available from our `PartialProductGenerator` component). With 2s complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010).
 
 ```text
             7  6  5  4  3  2  1  0  
@@ -64,7 +64,7 @@ A few things to note: first, that we are negating by 1s complement (so we need a
             0  0  0  1  0  0  1  0  : 00010010 = 18 (18)
  ```
 
- There are more compact ways of doing sign-extension which result in far fewer additions. Here is an example of compact sign-extension:  
+ There are more compact ways of doing sign-extension which result in far fewer additions. Here is an example of `compact` sign-extension, where the last row which carries only a sign bit is folded into the previous row:  
 
 ```text
             7  6  5  4  3  2  1  0  
@@ -86,7 +86,7 @@ And of course, with higher radix-encoding, we select more bits at a time from th
             0  0  0  1  0  0  1  0  : 00010010 = 18 (18)
 ```
 
-Note that radix-4 shifts by 2 positions each row, but with only two rows and with sign-extension adding an LSB bit, you only see a shift of 1 in row 1.
+Note that radix-4 shifts by 2 positions each row, but with only two rows and with sign-extension adding an LSB bit to each row, you only see a shift of 1 in row 1, but in a larger example you would see the two-bit shift in the following rows.
 
 ## Partial Product Generator
 
@@ -222,7 +222,7 @@ Finally, we produce the product.
 
 ```dart
     final pp =
-        PartialProductGeneratorCompactRectSignExtension(a, b, RadixEncoder(radix), signed: true);
+        PartialProductGeneratorCompactRectSignExtension(a, b, RadixEncoder(radix), signedMultiplicand: true, signedMultiplier: true);
     final compressor = ColumnCompressor(pp)..compress();
     final adder = ParallelPrefixAdder(
         compressor.exractRow(0), compressor.extractRow(1), BrentKung.new);

diff --git a/doc/components/summation.md b/doc/components/summation.md
@@ -38,3 +38,11 @@ The `Counter` also has a `Counter.simple` constructor which is intended for very
 // A counter which increments by 1 each cycle up to 5, then rolls over.
 Counter.simple(clk: clk, reset: reset, maxValue: 5);
 ```
+
+## Gated Counter
+
+The `GatedCounter` is a version of a `Counter` which contains a number of power-saving features including clock gating to save on flop power and enable gating to avoid unnecessary combinational toggles.
+
+The `GatedCounter` has a `clkGatePartitionIndex` which determines a dividing line for the counter to be clock gated such that flops at or above that index will be independently clock gated from the flops below that index. This is an effective method of saving extra power on many counters because the upper bits of the counter may change much less frequently than the lower bits (or vice versa).  If the index is negative or greater than or equal to the width of the counter, then the whole counter will be clock gated in unison.
+
+The `gateToggles` flag will enable `ToggleGate` insertion on a per-interface basis to help reduce combinational toggles within the design when interfaces are not enabled.
diff --git a/doc/components/toggle_gate.md b/doc/components/toggle_gate.md
@@ -0,0 +1,16 @@
+# Toggle Gate
+
+The `ToggleGate` component is intended to help save power by avoiding unnecessary toggles through combinational logic. It accomplishes this by flopping the previous value of data and muxing the previous value to the `gatedData` output if the `enable` is low. By default, the flops within the `ToggleGate` are also clock gated for extra power savings, but it can be controlled via a `ClockGateControlInterface`.
+
+As an example use case, if you have a large arithmetic unit but only care about the result when a `valid` bit is high, you could use a `ToggleGate` so that the inputs to that combinational logic do not change unless `valid` is high.
+
+```dart
+final toggleGate = ToggleGate(
+  clk: clk,
+  reset: reset,
+  enable: arithmeticDataValid,
+  data: arithmeticData,
+);
+
+BigArithmeticUnit(dataIn: toggleGate.gatedData);
+```
diff --git a/lib/rohd_hcl.dart b/lib/rohd_hcl.dart
@@ -22,4 +22,5 @@ export 'src/serialization/serialization.dart';
 export 'src/shift_register.dart';
 export 'src/sort.dart';
 export 'src/summation/summation.dart';
+export 'src/toggle_gate.dart';
 export 'src/utils.dart';