Skip to content

Commit

Permalink
4:2 compressors validate
Browse files Browse the repository at this point in the history
  • Loading branch information
ganewto committed Dec 7, 2024
1 parent ffe15af commit d562567
Show file tree
Hide file tree
Showing 43 changed files with 3,080 additions and 1,206 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/general.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ jobs:

- name: Generate HTML for examples
run: tool/gh_actions/create_htmls.sh

- name: Check temporary test files
run: tool/gh_actions/check_tmp_test.sh

# https://github.com/devcontainers/ci/blob/main/docs/github-action.md
- name: Build dev container and run tests in it
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ tmp*
confapp/.vscode/*
*tracker.json
*tracker.log
devtools_options.yaml
*.sv

# Exceptions
Expand Down
4 changes: 3 additions & 1 deletion doc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ Some in-development items will have opened issues, as well. Feel free to create
- Counters
- [Summation](./components/summation.md#sum)
- [Binary counter](./components/summation.md#counter)
- [Gated counter](./components/summation.md#gated-counter)
- Gray counter
- Pseudorandom
- LFSR
Expand All @@ -73,8 +74,9 @@ Some in-development items will have opened issues, as well. Feel free to create
- CRC
- [Parity](./components/parity.md)
- Interleaving
- Clocking
- Gating
- [Clock gating](./components/clock_gating.md)
- [Toggle gating](./components/toggle_gate.md)
- Data flow
- Ready/Valid
- Connect/Disconnect
Expand Down
10 changes: 7 additions & 3 deletions doc/components/divider.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Divider

ROHD HCL provides an integer divider module to get the dividend of numerator and denominator operands. The divider implementation is not pipelined and has a maximum latency of the bit width of the operands.
ROHD HCL provides an integer divider module to get the quotient and the remainder of dividend and divisor operands. The divider implementation is not pipelined and has a minimum latency of 3 cycles. The maximum latency is dependent on the width of the operands (upper bound of `O(WIDTH**2)`). Note that latency increases exponentially as the absolute difference between the dividend and the divisor increases (worst case: largest possible dividend and divisor of 1).

## Interface

The inputs to the divider module are:

* `clock` => clock for synchronous logic
* `reset` => reset for synchronous logic (active high)
* `reset` => reset for synchronous logic (active high, synchronous to `clock`)
* `dividend` => the numerator operand
* `divisor` => the denominator operand
* `isSigned` => should the operands of the division be treated as signed integers
Expand All @@ -30,7 +30,7 @@ To initiate a new request, it is expected that the requestor drive `validIn` to

When the division is complete, the module will assert the `validOut` signal along with the numerical values of `quotient` and `remainder` representing the division result and the signal `divZero` to indicate whether or not a division by zero occurred. The module will hold these signal values until `readyOut` is driven high by the integrating environment. The integrating environment must assume that `quotient` and `remainder` are meaningless if `divZero` is asserted.

### Mathematical Properties
## Mathematical Properties

For the division, implicit rounding towards 0 is always performed. I.e., a negative quotient will always be rounded up if the dividend is not evenly divisible by the divisor. Note that this behavior is not uniform across all programming languages (for example, Python rounds towards negative infinity).

Expand Down Expand Up @@ -65,3 +65,7 @@ if (divIntf.validOut.value.toBool()) {
}
```

## Future Considerations

In the future, an optimization might be added in which the `remainder` output is optional and controlled by a build time constructor parameter. If the remainder does not need to be computed, the implementation's upper bound latency can be significantly improved (`O(WIDTH**2)` => `O(WIDTH)`).
48 changes: 33 additions & 15 deletions doc/components/multiplier.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@ ROHD-HCL provides an abstract `Multiplier` module which multiplies two
numbers represented as two `Logic`s, potentially of different widths,
treating them as either signed (2s complement) or unsigned. It
produces the product as a `Logic` with width equal to the sum of the
widths of the inputs. As of now, we have the following implementations
widths of the inputs. The signs of the operands are either fixed by a parameter,
or runtime selectable, e.g.: `signedMultiplicand` or `selectSignedMultiplicand`.
The output of the multiplier also has a signal telling us if the result is to be
treated as signed.

As of now, we have the following implementations
of this abstract `Module`:

- [Carry Save Multiplier](#carry-save-multiplier)
Expand All @@ -13,7 +18,13 @@ of this abstract `Module`:
An additional kind of abstract module provided is a
`MultiplyAccumulate` module which multiplies two numbers represented
as two `Logic`s and adds the result to a third `Logic` with width
equal to the sum of the widths of the main inputs. We have a
equal to the sum of the widths of the main inputs. Similar to the `Multiplier`,
the signs of the operands are either fixed by a parameter,
or runtime selectable, e.g.: `signedMultiplicand` or `selectSignedMultiplicand`.
The output of the multiply-accumulate also has a signal telling us if the result is to be
treated as signed.

We have a
high-performance implementation:

- [Compression Tree Multiply Accumulate](#compression-tree-multiply-accumulate)
Expand All @@ -22,7 +33,7 @@ The compression tree based arithmetic units are built from a set of components f

## Carry Save Multiplier

Carry save multiplier is a digital circuit used for performing multiplication operations. It
The carry-save multiplier is a digital circuit used for performing multiplication operations. It
is particularly useful in applications that require high speed
multiplication, such as digital signal processing.

Expand All @@ -31,7 +42,8 @@ The
module in ROHD-HCL accept input parameters the clock `clk` signal,
reset `reset` signal, `Logic`s' a and b as the input pin and the name
of the module `name`. Note that the width of the inputs must be the
same or `RohdHclException` will be thrown.
same or `RohdHclException` will be thrown. The output latency is equal to the width of the inputs
given by `latency` on the component.

An example is shown below to multiply two inputs of signals that have 4-bits of width.

Expand Down Expand Up @@ -82,16 +94,18 @@ digital signal processing.
The parameters of the
`CompressionTreeMultiplier` are:

- Two input terms `a` and `b` which can be different widths
- The radix used for Booth encoding (2, 4, 8, and 16 are currently supported)
- The type of `ParallelPrefix` tree used in the final `ParallelPrefixAdder` (optional)
- `signed` parameter: whether the operands should be treated as signed (2s complement) or unsigned
- Two input terms `a` and `b` which can be different widths.
- The radix used for Booth encoding (2, 4, 8, and 16 are currently supported).
- The type of `ParallelPrefix` tree used in the final `ParallelPrefixAdder` (optional).
- `ppGen` parameter: the type of `PartialProductGenerator` to use which has derived classes for different styles of sign extension. In some cases this adds an extra row to hold a sign bit.
- An optional `selectSigned` control signal which overrides the `signed` configuration allowing for runtime control of signed or unsigned operation with the same hardware. `signed` must be false if using this control signal.
- `signedMultiplicand` parameter: whether the multiplicand (first arg) should be treated as signed (2s complement) or unsigned.
- `signedMultiplier` parameter: whether the multiplier (second arg) should be treated as signed (2s complement) or unsigned.
- An optional `selectSignedMultiplicand` control signal which overrides the `signedMultiplicand` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplicand` must be false if using this control signal.
- An optional `selectSignedMultiplier` control signal which overrides the `signedMultiplier` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplier` must be false if using this control signal.
- An optional `clk`, as well as `enable` and `reset` that are used to add a pipestage in the `ColumnCompressor` to allow for pipelined operation.
- An optional `use42Compressors` boolean enables the `ColumnCompressor` to use 4:2 compressors in addition to 3:2 (Full Adder) and 2:2 (Half Adder) compressors.

Here is an example of use of the `CompressionTreeMultiplier`:
Here is an example of use of the `CompressionTreeMultiplier` with one signed input:

```dart
const widthA = 6;
Expand All @@ -104,7 +118,7 @@ Here is an example of use of the `CompressionTreeMultiplier`:
b.put(3);
final multiplier =
CompressionTreeMultiplier(a, b, radix, signed: true);
CompressionTreeMultiplier(a, b, radix, signedMultiplicand: true);
final product = multiplier.product;
Expand All @@ -124,13 +138,17 @@ The parameters of the
- The accumulate input term `c` which must have width as sum of the two operand widths + 1.
- The radix used for Booth encoding (2, 4, 8, and 16 are currently supported)
- The type of `ParallelPrefix` tree used in the final `ParallelPrefixAdder` (default Kogge-Stone).
- `signed` parameter: whether the operands should be treated as signed (2s complement) or unsigned
- `ppGen` parameter: the type of `PartialProductGenerator` to use which has derived classes for different styles of sign extension. In some cases this adds an extra row to hold a sign bit (default `PartialProductGeneratorCompactRectSignExtension`).
- An optional `selectSigned` control signal which overrides the `signed` configuration allowing for runtime control of signed or unsigned operation with the same hardware. `signed` must be false if using this control signal.
- `signedMultiplicand` parameter: whether the multiplicand (first arg) should be treated as signed (2s complement) or unsigned
- `signedMultiplier` parameter: whether the multiplier (second arg) should be treated as signed (2s complement) or unsigned
- `signedAddend` parameter: whether the addend (third arg) should be treated as signed (2s complement) or unsigned
- An optional `selectSignedMultiplicand` control signal which overrides the `signedMultiplicand` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplicand` must be false if using this control signal.
- An optional `selectSignedMultiplier` control signal which overrides the `signedMultiplier` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedMultiplier` must be false if using this control signal.
- An optional `selectSignedAddend` control signal which overrides the `signedAddend` parameter allowing for runtime control of signed or unsigned operation with the same hardware. `signedAddend` must be false if using this control signal.
- An optional `clk`, as well as `enable` and `reset` that are used to add a pipestage in the `ColumnCompressor` to allow for pipelined operation.
- An optional `use42Compressors` boolean enables the `ColumnCompressor` to use 4:2 compressors in addition to 3:2 (Full Adder) and 2:2 (Half Adder) compressors.

Here is an example of using the `CompressionTreeMultiplyAccumulate`:
Here is an example of using the `CompressionTreeMultiplyAccumulate` with all inputs as signed:

```dart
const widthA = 6;
Expand All @@ -144,7 +162,7 @@ Here is an example of using the `CompressionTreeMultiplyAccumulate`:
b.put(3);
c.put(5);
final multiplier = CompressionTreeMultiplyAccumulate(a, b, c, radix, signed: true);
final multiplier = CompressionTreeMultiplyAccumulate(a, b, c, radix, signedMultiplicand: true, signedMultiplier: true, signedAddend: true);
final accumulate = multiplier.accumulate;
Expand Down
8 changes: 4 additions & 4 deletions doc/components/multiplier_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ row slice mult

A few things to note: first, that we are negating by 1s complement (so we need a -0) and second, these rows do not add up to (18: 10010). For Booth encoded rows to add up properly, they need to be in 2s complement form, and they need to be sign-extended.

Here is the matrix with crude sign extension (this formatting is available from our `PartialProductGenerator` component). With 2s complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010).
Here is the matrix with a crude sign extension `brute` (the table formatting is available from our `PartialProductGenerator` component). With 2s complementation, and sign bits folded in (note the LSB of each row has a sign term from the previous row), these addends are correctly formed and add to (18: 10010).

```text
7 6 5 4 3 2 1 0
Expand All @@ -64,7 +64,7 @@ A few things to note: first, that we are negating by 1s complement (so we need a
0 0 0 1 0 0 1 0 : 00010010 = 18 (18)
```

There are more compact ways of doing sign-extension which result in far fewer additions. Here is an example of compact sign-extension:
There are more compact ways of doing sign-extension which result in far fewer additions. Here is an example of `compact` sign-extension, where the last row which carries only a sign bit is folded into the previous row:

```text
7 6 5 4 3 2 1 0
Expand All @@ -86,7 +86,7 @@ And of course, with higher radix-encoding, we select more bits at a time from th
0 0 0 1 0 0 1 0 : 00010010 = 18 (18)
```

Note that radix-4 shifts by 2 positions each row, but with only two rows and with sign-extension adding an LSB bit, you only see a shift of 1 in row 1.
Note that radix-4 shifts by 2 positions each row, but with only two rows and with sign-extension adding an LSB bit to each row, you only see a shift of 1 in row 1, but in a larger example you would see the two-bit shift in the following rows.

## Partial Product Generator

Expand Down Expand Up @@ -222,7 +222,7 @@ Finally, we produce the product.

```dart
final pp =
PartialProductGeneratorCompactRectSignExtension(a, b, RadixEncoder(radix), signed: true);
PartialProductGeneratorCompactRectSignExtension(a, b, RadixEncoder(radix), signedMultiplicand: true, signedMultiplier: true);
final compressor = ColumnCompressor(pp)..compress();
final adder = ParallelPrefixAdder(
compressor.exractRow(0), compressor.extractRow(1), BrentKung.new);
Expand Down
8 changes: 8 additions & 0 deletions doc/components/summation.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,11 @@ The `Counter` also has a `Counter.simple` constructor which is intended for very
// A counter which increments by 1 each cycle up to 5, then rolls over.
Counter.simple(clk: clk, reset: reset, maxValue: 5);
```

## Gated Counter

The `GatedCounter` is a version of a `Counter` which contains a number of power-saving features including clock gating to save on flop power and enable gating to avoid unnecessary combinational toggles.

The `GatedCounter` has a `clkGatePartitionIndex` which determines a dividing line for the counter to be clock gated such that flops at or above that index will be independently clock gated from the flops below that index. This is an effective method of saving extra power on many counters because the upper bits of the counter may change much less frequently than the lower bits (or vice versa). If the index is negative or greater than or equal to the width of the counter, then the whole counter will be clock gated in unison.

The `gateToggles` flag will enable `ToggleGate` insertion on a per-interface basis to help reduce combinational toggles within the design when interfaces are not enabled.
16 changes: 16 additions & 0 deletions doc/components/toggle_gate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Toggle Gate

The `ToggleGate` component is intended to help save power by avoiding unnecessary toggles through combinational logic. It accomplishes this by flopping the previous value of data and muxing the previous value to the `gatedData` output if the `enable` is low. By default, the flops within the `ToggleGate` are also clock gated for extra power savings, but it can be controlled via a `ClockGateControlInterface`.

As an example use case, if you have a large arithmetic unit but only care about the result when a `valid` bit is high, you could use a `ToggleGate` so that the inputs to that combinational logic do not change unless `valid` is high.

```dart
final toggleGate = ToggleGate(
clk: clk,
reset: reset,
enable: arithmeticDataValid,
data: arithmeticData,
);
BigArithmeticUnit(dataIn: toggleGate.gatedData);
```
1 change: 1 addition & 0 deletions lib/rohd_hcl.dart
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ export 'src/serialization/serialization.dart';
export 'src/shift_register.dart';
export 'src/sort.dart';
export 'src/summation/summation.dart';
export 'src/toggle_gate.dart';
export 'src/utils.dart';
Loading

0 comments on commit d562567

Please sign in to comment.