PrUcess is low-power multi-clock configurable digital processing system that executes commands (unsigned arithmetic operations, logical operations, register file read & write operations) which are received from an external source through UART receiver module and it transmits the commands' results through the UART transmitter module.
This is a full ASIC design project (from RTL to GDS). It goes through the ASIC design flow from frontend to backend:
- System's architecture design.
- Synthesizable Verilog RTL modelling (behavioral modelling, structural modelling, and FSM coding) of all the system blocks from scratch (UART transmitter and receiver, integer clock divider, ALU, register file, parametrized data and bit synchronizers, reset synchronizer, and system's main controller).
- Solving CDC issues by using synchronizers.
- Functional verification using self-checking testbenches and automated Python verification environments and running the testbenches using Modelsim.
- Logic synthesis using Synopsys Design Compiler & 130 nm technology node.
- Formal verification post logic synthesis using Synopsys Formality.
- Design for testability (DFT) using Synopsys DFT Compiler.
- Formal verification post DFT using Synopsys Formality.
- Physical design (floor planning, power planning, placement, CTS, routing, timing closure, and chip finishing) using Cadence innovus.
- Formal verification post physical design using Synopsys Formality.
- System's Specifications
- System Top Level Module
- System Functional Verification
- UART Transmitter
- UART Receiver
- Clock Divider
- Clock Gating Cell
- ALU
- Register File
- Bus Synchronizer
- Data Synchronizer
- System Controller
- UART Transmitter Controller
- UART Receiver Controller
- Reset Synchronizer
- Logic Synthesis
- Post - Logic Synthesis Formal Verification
- Design For Testability (DFT)
- Post - DFT Formal Verification
- Physical Design
- Final Chip Layout
- Post - Physical Design Formal Verification
UART is a standard serial communication protocol widely used in many applications. Oversampling is a technique used in UART receivers to improve the accuracy and reliability of the received data. In a UART receiver, data is received as a series of binary bits that are transmitted asynchronously with respect to a clock signal. To correctly interpret the received data, the receiver must sample the incoming signal at the correct time to capture the correct value of each bit. Oversampling involves sampling the incoming signal at a higher frequency than the baud rate of the transmitted data. This means that multiple samples are taken during the transmission of each bit, allowing the receiver to more accurately determine the timing and value of each bit. Oversampling also helps to mitigate the effects of noise and other signal distortions that can cause errors in the received data. By taking multiple samples of each bit, the receiver can detect and correct for these errors, improving the overall reliability of the data transmission.
The system includes two asynchronous clock domains (reference clock domain and UART clock domain). The command is received by the UART receiver, then it is sent to the system controller through a synchronizer (to solve the CDC issues) to decode and execute the command and then it sends the result to the UART transmitter through a synchronizer which will finally transmit it serially.
Parameter | Default Value | Description |
---|---|---|
DATA_WIDTH | 8 | It is the size of: registers, ALU operands, UART transmitter frames, and UART receiver frames. |
REGISTER_FILE_DEPTH | 16 | The number of registers in the register file. |
SYNCHRONIZER_STAGE_COUNT | 2 | The number of stages in a synchronizer. |
oversampling_prescale | the default value after resetting the system is 8 | The ratio between the frequency of the UART receiver clock and the frequency of the UART transmitter clock. |
Clock Domain | Clock Names | Modules | Frequency |
---|---|---|---|
Reference clock domain |
|
|
Reference clock frequency = ALU clock frequency = 40 MHz |
UART clock domain |
|
|
|
Note that the oversampling prescale can have the values (8, 16, or 32) but 32 is used in the simulations and backend flow to ensure that the UART receiver is functioning correctly in the worst case (highest clock frequency).
- UART: It consists of UART receiver which receives the commands and UART transmitter that transmits the commands' results.
- Clock divider: An integer clock divider which can divide the source clock up to division ratio of 32. It is used to divide the UART clock to produce UART transmitter clock with division ratio equal oversampling prescale.
- ALU: It executes unsigend arithmetic operations and logical operations.
- Clock gating cell: It is used to gate the ALU clock because there is significant time in which the ALU is not in operation (because the ALU operates on a very fast clock compared with the UART, so it waits long time to receive a new command).
- Register file.
- System controller: It is the main controller of the system. It consists of UART transmitter controller and UART receiver controller. The UART transmitter controller controls the UART transmitter by sending to it the data to be sent serially after it is ready (ALU result or register file data). The UART receiver controller controls the ALU and register file control signals based on the received frames from the UART receiver.
- Reset synchronizer: It is used to synchronize the global reset to all clock domains.
- Bus synchronizer: This module can be used to synchronize a single bit or a grey encoded bus between two asynchronous clock domains. It is a generic module (setting BUS_WIDTH = 1, means that it is a single bit synchronizer).
- Data synchronizer: It is used to synchronize a bus by using a bit synchronizer and pulse generator to synchronize the bus's data valid signal.
- Addition (+)
- Subtraction (-)
- Multiplication (*)
- Division (/)
- Bit-wise AND (&)
- Bit-wise OR (|)
- Bit-wise NAND (~&)
- Bit-wise NOR (~|)
- Bit-wise XOR (^)
- Bit-wise XNOR (~^)
- Is equal (==)
- Is greater than (>)
- Is less than (<)
- Shift right (>>1)
- Shift left (<<1)
- Register file write command. This command consists of 3 frames as follows:
- Command opcode (0xAA)
- Register file write address
- Register file write data
- Register file read command. This command consists of 2 frames as follows:
- Command opcode (0xBB)
- Register file read address
- ALU operation with operands command. The operands of the ALU are connected to the first two registers of the register file, so to execute this command: the operands are first written to the first two registers in the register file then the result is evaluated. This command consists of 4 frames as follows:
- Command opcode (0xCC)
- Operand A
- Operand B
- ALU function
- ALU operation without operands command. This command executes the ALU operation on the stored values in the first two registers in the register file directly. This command consists of 2 frames as follows:
- Command opcode (0xDD)
- ALU function
In all ALU commands, the UART transmitter sends two consecutive frames (becuase the size of the ALU result is double the size of the frame).
- The parity configuration of UART (parity enable and parity type).
- The oversampling prescale (division ratio) of the UART receiver.
Note that the mentioned configurations are outputs from the register file (reference clock domain) and they are inputs to blocks that operates on UART clock (i.e. Metastability may occur becuase the source and destination domains are asynchronous to one another), however there is no synchronizers used to synchronize those signals because they are Quasi-static signals (they are effectively stable for long periods of time. Such domain crossings do not require synchronizers in the destination domain, because they are held long enough to be captured by even the slowest clock domains without the risk of metastability).
Blue-colored signals are the system's input ports, while red-colored signals are the system's output ports.
Port | Direction | Width | Description |
---|---|---|---|
reference_clk | input | 1 | The main clock of the system. |
UART_clk | input | 1 | UART clock (the clock of the UART receiver). |
reset | input | 1 | Unsynchronized global active low asynchronous reset. |
serial_data_in | input | 1 | The data which is received serially by the UART receiver. |
serial_data_out | output | 1 | The output of the UART transmitter (It is also the output of the mux that select between start, serial data, parity, or stop bits according to the state of the transmission). |
parity_error | output | 1 | A signal to indicate that there is parity mismatch between the received parity bit and the calculated parity bit. |
frame_error | output | 1 | A signal to indicate that the start bit or the stop bit was incorrect. |
The whole system is verified through an automated Python environment which does the following:
- Generates the opcodes of all the given commands in an external file.
- Generates all the expected results that should be transmitted serially through the UART transmitter in an external file.
- Generates the memory file which corresponds to the final values that should be stored in the register file after the execution of all the commands.
- Compares the results of the Verilog testbench (transmitted through UART transmitter) and the generated memory file with the expected results' file and expected memory file.
- Reports any mismatch that occur in the testbench.
- Reports the number of passed and failed testcases.
Sample test cases:
Any arbitrary test case can be written in functional_verification/system_top/test_cases_generator.py
as the following:
Then the script functional_verification/system_top/run.tcl
is used to: run the Python script to generate the expected results, run the testbench to generate the actual results, and compare both the results to report any mismatch in the results.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | Generated clock produced from the clock divider whose source clock is UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parity_type | input | 1 | A signal to indicate the parity type (1 for odd, 0 for even). |
parity_enable | input | 1 | A signal to enable the transmission of the parity bit in the frame. |
data_valid | input | 1 | A signal that indicates that there exist data to be transmitted. |
parallel_data | input | DATA_WIDTH (default value is 8) | The data to be transmitted by the UART transmitter. |
serial_data_out | output | 1 | The output of the transmitter (It is also the output of the mux that selects between start, serial data, parity, or stop bits according to the state of the transmission). |
busy | output | 1 | A signal that indicates that the transmitter is currently in operation and it can't transmit new data. |
Note that if any omitted condition occurs, the current state won't change.
This FSM controls the following output ports according to the current state: busy, serial_enable, bit_select.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | Generated clock produced from the clock divider whose source clock is UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parity_enable | input | 1 | A signal to enable the transmission of the parity bit in the frame. |
data_valid | input | 1 | A signal that indicates that there exist data to be transmitted. |
serial_enable | output | 1 | A signal to enable the operation of the serializer. |
bit_select | output | 2 | The output mux selection bits that selects the output bit (start bit, serial data bit, parity bit, or stop bit). The value of those selection bits is decided according to the current state of the transmission. |
seial_data_index | output | log2(DATA_WIDTH) (default value is 3) | A number between 0 and 7 that indicates the index of the bit to be transmitted serially. |
busy | output | 1 | A signal that indicates that the transmitter is currently in operation and it can't transmit new data. |
It produces serial bits from the parallel data (from LSB to MSB) according to the input index.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | Generated clock produced from the clock divider whose source clock is UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parallel_data | input | DATA_WIDTH (default value is 8) | The data to be transmitted by the UART transmitter. |
serial_enable | input | 1 | A signal to enable the operation of the serializer. |
serial_data_index | input | log2(DATA_WIDTH) (default value is 3) | A number between 0 and 7 that indicates the index of the bit to be transmitted serially. |
serial_data | output | 1 | The bit that is serially transmitted from the UART transmitter. |
It calculates the parity bit according to the parity type.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | Generated clock produced from the clock divider whose source clock is UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parity_type | input | 1 | A signal to indicate the parity type (1 for odd, 0 for even). |
parity_enable | input | 1 | A signal to enable the transmission of the parity bit in the frame. |
parallel_data | input | DATA_WIDTH (default value is 8) | The data to be transmitted by the UART transmitter. |
parity_bit | output | 1 | The parity bit of the parallel data to be transmitted. |
It selects between (start bit, serial data bit, parity bit, or stop bit) according to the current state of trasnmission.
Port | Direction | Width | Description |
---|---|---|---|
bit_select | input | 2 | The output mux selection bits that selects the output bit (start bit, serial data bit, parity bit, or stop bit). |
serial_data | input | 1 | The bit that is serially transmitted from the UART transmitter (i.e. the output bit from the serializer). |
parity_bit | input | 1 | The parity bit of the parallel data to be transmitted. |
mux_out | output | 1 | The output of the mux that selects between start, serial data, parity, or stop bits according to the state of the transmission. |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
The following figure illustrates how oversampling with prescale = 8 works, the recieved bit is sampled 3 times and output bit from the sampler is the most represented bit in those 3 bits.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parity_type | input | 1 | A signal to indicate the parity type (1 for odd, 0 for even). |
parity_enable | input | 1 | A signal to enable the transmission of the parity bit in the frame. |
prescale | input | 6 | The ratio between the frequency of the receiver and the frequecy of the transmitter (The avaialable oversampling_prescale values are: 8, 16, 32). |
serial_data_in | input | 1 | The data which is received serially. |
data_valid | output | 1 | A signal to indicate that the received data was free of errors. |
parallel_data | output | DATA_WIDTH (default value is 8) | The data which is received serially bit by bit. |
parity_error | output | 1 | A signal to indicate that there is parity mismatch between the received parity bit and the calculated parity bit. |
frame_error | output | 1 | A signal to indicate that the start bit or the stop bit was incorrect. |
Note that if any omitted condition occurs, the current state won't change.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parity_enable | input | 1 | A signal to enable the transmission of the parity bit in the frame. |
prescale | input | 6 | The ratio between the frequency of the receiver and the frequecy of the transmitter (The avaialable prescale values are: 8, 16, 32). |
serial_data_in | input | 5 | The data which is received serially. |
start_bit_error | input | 1 | A signal to indicate that the sampled start bit is wrong (i.e. the samples are 011 or 111 or 110 or 101). |
parity_bit_error | input | 1 | A signal to indicate that the sampled parity bit is wrong. |
stop_bit_error | input | 1 | A signal to indicate that the sampled stop bit is wrong (i.e. the samples are 100 or 000 or 001 or 010). |
edge_count | input | 5 | A counter value which indicates the number of the current edge. Its value depends on the prescale value (because prescale of value 8 means that the counter should stop at 7 and wrap around again). |
edge_count_done | input | 1 | A signal to indicate that a full cycle of the UART tranmsitter has passed (when prescale value is 8, edge_count_done becomes high when the edge counter value is 7). |
start_bit_check_enable | output | 1 | A signal to enable the operation of the start bit checker. |
parity_bit_check_enable | output | 1 | A signal to enable the operation of the parity bit checker. |
stop_bit_check_enable | output | 1 | A signal to enable the operation of the stop bit checker. |
edge_counter_and_data_sampler_enable | output | 1 | A signal to enable the operation of the edge counter and data sampler. |
deserializer_enable | output | 1 | A signal to enable the operation of the deserializer. |
data_index | output | log2(DATA_WIDTH) (default value is 3) | The index of the of bit to be received in the frame. |
data_valid | output | 1 | A signal to indicate that the received data by the UART receiver was free of errors. |
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
prescale | input | 6 | The ratio between the frequency of the receiver and the frequecy of the transmitter (The avaialable prescale values are: 8, 16, 32). |
enable | input | 1 | A signal to enable the operation of the edge counter. |
edge_count | output | 5 | A counter value which indicates the number of the current edge. Its value depends on the prescale value (because prescale of value 8 means that the counter should stop at 7 and wrap around again). |
edge_count_done | output | 1 | A signal to indicate that a full cycle of the UART tranmsitter has passed (when prescale value is 8, edge_count_done becomes high when the edge counter value is 7). |
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
serial_data_in | input | 1 | The data which is received serially. |
prescale | input | 5 | The ratio between the frequency of the receiver and the frequecy of the transmitter (The avaialable prescale values are: 8, 16, 32). These are the 5 MSBs of the prescale, becuase the data sampler module operates on the prescale after shifting its value. |
enable | input | 1 | A signal to enable the data sampler. |
edge_count | output | 5 | A counter value which indicates the number of the current edge. Its value depends on the prescale value (because prescale of value 8 means that the counter should stop at 7 and wrap around again). |
sampled_bit | output | 1 | The resulting sampled bit out of three samples taken at three different edges. It is equal to the bit appearing the most times in the samples (e.g. if samples = 101, sampled_bit = 1. if samples = 100, sampled_bit = 0). |
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
enable | input | 1 | A signal to enable the operation of the deserializer. |
data_index | input | log2(DATA_WIDTH) (default value is 3) | The index of the of bit to be received in the frame. |
sampled_bit | input | 1 | The resulting sampled bit out of three samples taken at three different edges. It is equal to the bit appearing the most times in the samples (e.g. if samples = 101, sampled_bit = 1. if samples = 100, sampled_bit = 0). |
parallel_data | output | DATA_WIDTH (default value is 8) | The data which is received serially bit by bit. |
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
parity_type | input | 1 | A signal to indicate the parity type (1 for odd, 0 for even). |
enable | input | 1 | A signal to enable the operation of the parity bit checker. |
sampled_bit | input | 1 | The sampled bit from the data sampler. It is always the parity bit for this module because it is only enabled when the parity bit is received. |
parallel_data | input | DATA_WIDTH (default value is 8) | The data which is received serially bit by bit. |
parity_bit_error | output | 1 | A signal to indicate that there is parity mismatch between the received parity bit and the calculated parity bit. |
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
enable | input | 1 | A signal to enable the operation of the start bit checker. |
sampled_bit | input | 1 | The sampled bit from the data sampler. It is always the start bit for this module because it is only enabled when the start bit is received. |
start_bit_error | output | 1 | A signal to indicate that the start bit is incorrect (the sampled bit is 1). |
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | UART clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
enable | input | 1 | A signal to enable the operation of the stop bit checker. |
sampled_bit | input | 1 | The sampled bit from the data sampler. It is always the stop bit for this module because it is only enabled when the stop bit is received. |
stop_bit_error | output | 1 | A signal to indicate that the stop bit is incorrect (the sampled bit is 0). |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
Port | Direction | Width | Description |
---|---|---|---|
reference_clk | input | 1 | The source clock (UART clock). |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
clk_divider_enable | input | 1 | An enable signal for the clock divider. |
division_ratio | input | 6 | The division ratio of the clock divider (oversampling_prescale), it is connected to register_file[3]. |
output_clk | output | 1 | The output divided clock (UART_transmitter_clk). |
This module is verified through logic simulation in Modelsim. The simulation can be run using run.tcl
script.
Test case (1) (division_ratio = 3)
Test case (2) (division_ratio = 5)
Test case (3) (division_ratio = 6)
Test case (4) (division_ratio = 8)
Latch-based clock gating cell:
This module is replaced by the integrated clock gating cell (ICG) from the standard cell library. ICG is a full custom cell whose path delays are well balanced which eliminates the occurrence of pulse clipping and spurious clocking issues (those issues may occur only after fabrication due to imbalanced delays but not in simulation because ideally, the latch-based clock gating cell doesn't suffer from any clock issues). The replacement procedure is done automatically using the place_ICG_cell.tcl
script to ease the process of placement (in the backend flow) and removal (in the functional simulation and verification) of the ICG cell.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | The source clock to be gated (reference clock). |
clk_enable | input | 1 | An enable signal for the clock gating. |
gated_clk | output | 1 | The output gated clock (ALU clock). |
This module is verified through logic simulation in Modelsim. The simulation can be run using run.tcl
script.
'ALU result' logic diagram:
'ALU result valid' logic diagram:
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | The ALU result is synchronized to this clock (reference clock). |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
A | input | DATA_WIDTH (default value is 8) | ALU's first operand (it is connected to register_file[0]). |
B | input | DATA_WIDTH (default value is 8) | ALU's second operand (it is connected to register_file[1]). |
ALU_function | input | 4 | A binary code to determine the function of the ALU. |
enable | input | 1 | ALU enable signal. |
ALU_result_valid | output | 1 | A signal to indicate the ALU result is valid. |
ALU_result | output | 2 * DATA_WIDTH (default value is 16) | The result of the ALU. |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | The read and write operations are synchronized to this clock (reference clock). |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
address | input | log2(REGISTER_FILE_DEPTH) (default value is 4) | The address of the register to be read from or written to. |
write_enable | input | 1 | A signal to enable the write operation. |
write_data | input | DATA_WIDTH (default value is 8) | The data to be written in the given address. |
read_enable | input | 1 | A signal to enable the read operation. |
read_data_valid | output | 1 | A signal to indicate that the data on the 'read_data' bus is a valid data. |
read_data | output | DATA_WIDTH (default value is 8) | The data read from the given address. |
register0 | output | DATA_WIDTH (default value is 8) | The first register in the register file, it stores the first operand of the ALU. |
register1 | output | DATA_WIDTH (default value is 8) | The second register in the register file, it stores the second operand of the ALU. |
register2 | output | DATA_WIDTH (default value is 8) | The third register in the register file, it stores the parity configuration (parity enable and parity type) of the UART. |
register3 | output | DATA_WIDTH (default value is 8) | The fourth register in the register file, it stores the value of the oversampling prescale used in the clock divider (it is the ratio between the clock frequency of the UART receiver and the clock frequency of the UART transmitter). |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
It acts as a gray encoded bus synchronizer or a single bit synchronizer according to the value of the BUS_WIDTH parameter. It consists of multiple registers (single bit or multiple bits) connected in a cascaded scheme, and the number of stages is parametrized with a default value of 2.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | The clock of the destination domain. |
reset | input | 1 | Global active low asynchronous reset of the destination domain after synchronization. |
asynchronous_data | input | BUS_WIDTH (default value is 1) | The data to be synchronized (it is sent from another asynchronous domain to the destination domain). |
synchronous_data | output | BUS_WIDTH (default value is 1) | The data after synchronization to the destination domain. |
This module is verified through a Python script which generates all the possible binary gray codes of a given size, the testbench reads the gray codes from the external file and waveform simulation is performed in Modelsim. The simulation can be run using run.tcl
script.
This module is used to synchronize any arbitrary bus by synchronizing its 'data valid' signal through a bit synchronizer then passing it through a pulse generator (flip-flop + NOT gate + AND gate) to produce a pulse whose width is the same as the width of destination domain. This pulse can be considered as a new 'data valid' signal synchronized to the destination domain and the data can be read safely from the 'asynchronous_data' port without the risk of entering metastability.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | The clock of the destination domain. |
reset | input | 1 | Global active low asynchronous reset of the destination domain after synchronization. |
asynchronous_data_valid | input | 1 | A signal to indicate that the data on the 'asynchronous_data' bus is valid. |
asynchronous_data | input | BUS_WIDTH (default value is 8) | The data to be synchronized (it is sent from another asynchronous domain to the destination domain). |
Q_pulse_generator | output | 1 | The output of the pulse generator register. |
synchronous_data_valid | output | 1 | A signal to indicate that the synchronized data is valid. |
synchronous_data | output | BUS_WIDTH (default value is 8) | The data after synchronization to the destination domain. |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
This moudule consists of the two submodules (UART Transmitter Controller, UART Receiver Controller) in which the transmitter controller sends an enable signal to the receiver controller to prevent it from processing a frame sent from the UART receiver while there is another frame being sent through the UART transmitter.
This module consists of two FSMs:
- Main FSM: It controls the state which corresponds to which frame to send by the UART transmitter (e.g. TRANSMIT_REGISTER_FILE_DATA, TRANSMIT_LOWER_ALU_RESULT, TRANSMIT_UPPER_ALU_RESULT)
- Transmission FSM: It is used to determine the status of the transmission (e.g. NO_TRANSMISSION, TRANSMISSION_BEGAN, TRANSMISSION_ENDED). It is used in the Main FSM to transition its state when the transmission has ended.
Main FSM: This FSM controls the following output ports according to the current state: transmitter_parallel_data_valid, transmitter_parallel_data, UART_receiver_controller_enable.
The "WAIT_FOR_UPPER_ALU_RESULT" state is used so that the "transmitter_parallel_data_valid" can be set to logic zero during that state, this is done so that the pulse generator can accept a new "data_valid" signal (this is because when the "transmitter_parallel_data_valid" is high (because of the transmission of the lower ALU result), the output of the NOT gate of the pulse generator is low and it won't change its value until the "transmitter_parallel_data_valid" becomes low. After that, the output of the NOT gate is high and a new "data_valid" signal can be sent). To exit "WAIT_FOR_UPPER_ALU_RESULT" state, the output of the pulse generator register "Q_pulse_generator" must be low (to ensure that the zero that was sent during the "WAIT_FOR_UPPER_ALU_RESULT" state has reached the pulse generator) and then the upper ALU result can be sent with a new "data_valid" signal. Note that the "Q_pulse_generator" is asynchronous to the reference clock domain (i.e. it is generated by the UART transmitter clock) so a bit synchronizer is used to synchronize it to the reference clock domain.
Transmission FSM:
Note that if any omitted condition occurs, the current state won't change.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | Reference clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
ALU_result_valid | input | 1 | A signal to indicate the ALU result is valid. |
ALU_result | input | 2 * DATA_WIDTH (defualt value is 16) | The result of the ALU. |
read_data_valid | input | 1 | The data read from the given address from the register file. |
read_data | input | DATA_WDITH (default value is 8) | The data read from the given address from the register file. |
transmitter_busy_synchronized | input | 1 | The UART transmitter busy signal after synchronization. |
transmitter_Q_pulse_generator | input | 1 | The output signal of the pulse generator of the data synchronizer that synchronizes the UART transmitter data after being synchronized to the reference clock domain. |
transmitter_parallel_data_valid | output | 1 | A signal to indicate that there is new data to be transmitted. |
transmitter_parallel_data | output | DATA_WIDTH (default value is 8) | The data sent to the UART transmitter to transmit it serially. |
UART_receiver_controller_enable | output | 1 | A signal to enable the operation of the controller, this signal is used to prevent the processing of frames while there is another frame being sent by the UART transmitter. |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
This module controls the ALU control signals (ALU_function, ALU_enable, ALU_clk_enable) and register file control signals (address, write_enable, write_data, read_enable) based on the received frames from the UART receiver (i.e. according to the command to be executed).
The "EVALUATE_RESULT" state is a dummy state whose function is to delay the return to the "IDLE" state for one cycle so that "ALU_clk_enable" signal goes high for one cycle after the ALU result is evaluated.
The waveform after executing an ALU operation is shown in the following figure.
Note that if any omitted condition occurs, the current state won't change.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | Reference clock. |
reset | input | 1 | Global active low asynchronous reset after synchronization. |
enable | input | 1 | A signal to enable the operation of the controller, this signal is used to prevent the processing of frames while there is another frame being sent by the UART transmitter. |
parallel_data_valid_synchronized | input | 1 | A synchrnoized signal to indicate that there exist new data received by the UART receiver. |
parallel_data_synchronized | output | DATA_WIDTH (default value is 8) | The data received by the UART receiver. |
ALU_function | output | 4 | A binary code to determine the function of the ALU. |
ALU_enable | output | 1 | A signal to enable the operation of the ALU. |
ALU_clk_enable | output | 1 | A signal to enable the clock gating cell which outputs the ALU clock. |
address | output | log2(REGISTER_FILE_DEPTH) (defult value is 4) | The address of the register to be read from or written to. |
write_enable | output | 1 | A signal to enable the read operation from the register file. |
write_data | output | DATA_WIDTH (default value is 8) | The data to be written in the given address in the register file. |
read_enable | output | 1 | A signal to enable the read operation in the register file. |
This module is verified through self-checking testbench in Modelsim. The testbench can be run using run.tcl
script.
It is used to synchronize a global reset signal to different clock domains. It consists of multiple flip-flops connected in a cascaded scheme, and the number of stages is parametrized with a default value of 2.
Port | Direction | Width | Description |
---|---|---|---|
clk | input | 1 | The clock of the destination domain. |
reset | input | 1 | Unsynchronized global active low reset. |
reset_synchronized | output | 1 | The reset signal after synchronization to the destination domain. |
This module is verified through logic simulation in Modelsim. The simulation can be run using run.tcl
script.
The whole system is synthesized using Synopsys Design Compiler and "TSMC 130nm CL013G-FSG Process 1.2V MetroTM v1.0" Standard Cell Library. There are 3 scripts that are used in the synthesis process:
- 'constraints.tcl': It defines all clock source using 'create_clock' and 'create_generated_clock', defines asynchronous clock groups using 'set_clock_groups -asynchronous', sets input and output delays, sets the output load on the output ports, and sets the different operating conditions to be used in timing analysis (slow-slow library is used for setup analysis, fast-fast library is used for hold analysis).
- 'logic_synthesis_script.tcl': It places the ICG instead of the RTL module, reads libraries, reads RTL modules, links and compiles the design, generates the reports, netlist, and SDF file, and replaces the ICG with the RTL module again (to be used in simulations).
- 'run.sh': It runs the 'logic_synthesis_script.tcl' and produces the log file which contains all the steps of the synthesis and their outputs and cleans the directory from temporary files after the synthesis is done.
Formal verification takes the original RTL modules (golden RTL files) and the netlist generated fom the synthesis process and performs functional equivalence checking and reports whether the two systems (RTL and netlist) have the same functionality or not. The formal verification is run using Synopsys Formality. There are 2 scripts that are used in the formal verification process:
- 'formal_verification_script.tcl': It places the ICG instead of the RTL module, reads libraries, reads golden RTL modules, reads the netlist generated from the synthesis process, compares the two designs, and replaces the ICG with the RTL module again (to be used in simulations).
- 'run.sh': It runs the 'formal_verification_script.tcl' and produces the log file which contains all the steps of the formal verification and their outputs and cleans the directory from temporary files after the verification is done.
DFT is used to insert additional logic (not for functional mode) to ensure that chip is free of manufacturing errors. The main concept is to create scan chains consisting of all registers in the system to be able to observe and control most of the nodes of the circuit (i.e. achieve high coverage). After DFT insertion, all the flip-flops (except shift registers) are replaced with scan flip-flops.
DFT ports:
- scan_clk: The clock which the system uses in test mode
- scan_reset: The reset which the system uses in the test mode
- SE: Scan enable signal, when SE = 1: all the registers in the scan chain acts as asingle shift register, when SE = 0: The output of the register passes through the combinational logic following it (this is used to to test nodes in the circuit). SE is connected to all the SE pins of all the scan flops.
- test_mode: A signal to indicate whether the chip is operating in the functional mode or the test mode.
- SI: The input to the scan chain. The width of this port is the number of the scan chains
- SO: The output from the scan chain. The width of this port is the number of the scan chains
Number of scan chains: These results are produced after the logic synthesis indicating that there should exist 3 scan chains in the system if each chain has 100 flip-flops.
The logic of the scan_clk and scan_reset is manually inserted by using 2x1 muxes for all the clocks and resets in the system. The clock gating cell is bypassed by connecting its enable pin with (test_mode | enable) (i.e. the clock gating cell will be always enabled in the test mode). This leads to a violation on this cell because it is not controllable.
There are 3 modes of operation after the DFT logic is inserted:
- Functional mode: test_mode = 0, SE = 0
- Test scan mode: test_mode = 1, SE = 1
- Test capture mode: test_mode = 1, SE = 0
The whole system is synthesized and DFT insertion is performed using Synopsys DFT Compiler and "TSMC 130nm CL013G-FSG Process 1.2V MetroTM v1.0" Standard Cell Library. The synthesis and DFT insertion is repeated 3 times (once for each mode of operation) to ensure that there is no violations in any mode. There are 3 scripts that are used in the DFT insertion process:
- 'constraints.tcl': Same constraints of the logic synthesis process but it additionaly defines the scan clock to be used in timing analysis
- 'DFT_script.tcl': Same operations done in the logic synthesis process but it additionaly defines the DFT ports, identify the shift registers in the design so that they are not replaced with scan flip-flops, and insert DFT logic.
- 'run.sh': It runs the 'DFT_script.tcl' 3 times by using the 'change_mode.py' script which changes the mode of operation automatically and produces the log file which contains all the steps of the synthesis and their outputs and cleans the directory from temporary files after the DFT insertion is done.
There exist 3 don't compare points which are the 3 output pins of the SO port, they are not verified because they doesn't exist in the golden RTL files (i.e. their logic was automatically inserted by the DFT tool).
In this step: the netlist generated from the DFT process, standard cells' lef file, technology lef file (6 metal layers are used), and system_top lef file (which specifies the layout of the whole system) are loaded. The MMMC (multi-mode multi-corner) constraints file is also loaded, this file defines the three modes of operations of the system (as explained in the DFT section) and also defines all the corners to be used in static timing analysis (the corners are defined by the fast-fast library and slow-slow library).
The chip size is 240.67x160.0 μm2.
The power planning step is performed to reduce the IR drop and minimize the effect of electromigration. In this step: power rings, power stripes, and power rails are inserted.
Place the stanard cells and optimize if there is any timing violation.
Build the clock tree to minimize the clock skew between registers and optimize if there is any timing violation.
Route the standard cells and optimize if there is any timing violation.
Insert filler cells, generate gate level netlist (used in gate-level simulation (GLS)), gate level netlist with PG pins (used for analog IR drop simulation), SDF file (used in GLS), and GDS file which is used in manufacturing of the chip.
There exist 3 don't compare points which are the 3 output pins of the SO port, they are not verified because they doesn't exist in the golden RTL files (i.e. their logic was automatically inserted by the DFT tool).
Gate-level simulation using:
- SDF file generated from the physical design process.
- Testbench used in the RTL functional verification.
- Gate-level netlist generated from the physical design process.
- Verilog standard cell library.