Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug Network Flow Control #2

Open
wsong83 opened this issue Jan 14, 2016 · 6 comments
Open

Debug Network Flow Control #2

wsong83 opened this issue Jan 14, 2016 · 6 comments

Comments

@wsong83
Copy link
Member

wsong83 commented Jan 14, 2016

One concern has been discussed in the email with Stefan regarding the possible deadlock caused by the incapability of debug modules to ensure a packet can be fully accepted (enough space in their local buffer).

Here is another one.
The Debug Packet Datagram (DPD) used between host and physical transport has the length field in the header. However, there is no length field in debug packets.
So for the host interface, it needs to receive the whole packet to count a length and then it can send out the packet in the format of DPD.
This requires the host interface must has enough space for at least one debug packet; however, the length of a debug packet can be arbitrarily long as indicated in the spec.

Besides, let us consider the scenario that the host interface is busy and there is a long debug packet incoming for host interface. The host interface is suppose to ignore the packet (low ready) as it is busy. Then the packet will continue cycling on the NoC. The available pipeline stages in the Ring NoC is limited (the number of Ring Routers). What happens when the packet header reaches the sending Router and that router is still sending more flits? Will the header be lost or the router must stop sending flits? If the router stops sending flits, it is suppose the remember where to resume? And could other router start sending a packet see there is a gap (host interface begins to accept), but actually the the previous packet is not finished yet. One solution could be enforcing all routers to track tail flits.

@wsong83 wsong83 changed the title Concerns related to the langth of debug packets Concerns related to the length of debug packets Jan 14, 2016
@wallento
Copy link
Member

Hi Wei,

you are right, the spec should be updated to include a size limitation of the debug packet. This size is the minimum size of the new Host Interface Module derived from https://github.com/TUM-LIS/lisnoc/blob/master/rtl/infrastructure/lisnoc_packet_buffer.v
This should solve the inconsistencies, right?

Regarding the second point, packets can generally not pass their destination. If it is not ready, the ring blocks. Otherwise you run into serious deadlock problems.

But this leads me to my plan to open a long term issue to discuss flow control in the debug network. I will hijack this one now :)

wallento added a commit that referenced this issue Jan 14, 2016
As highlighted in #2 the packet cannot be arbitrary long as it has to
be fully buffered before being transfered to the host.
@wallento wallento changed the title Concerns related to the length of debug packets Debug Network Flow Control Jan 14, 2016
@wsong83
Copy link
Member Author

wsong83 commented Jan 14, 2016

Still not sure how the NoC can be blocked considering the NoC is distributed without backpressure?
Having a global enable/block signal is kind of heavy.
Also now seems it could also have clock domain issues.
What happens if routers are actually not in the same clock domain.
Or may be you can force them to be letting cross-domain FIFOs inside Ring routers or even debug modules.
For the 1st test chip, I assume we can ignore the clock domain issue as there is no DVFS yet.

@wallento
Copy link
Member

There is flit flow control from router to router for backpressure.

All routers must be in the same clock domain. Clock domain crossing should occur in each debug module between the debugged clock domain and the debug clock domain. There can be multiple different debugged clock domains, but only one debug clock domain.

Does that sound reasonable?

@wallento
Copy link
Member

@wsong83
Copy link
Member Author

wsong83 commented Jan 14, 2016

Yes, router to router flow control should be fine.
So far I am OK with the clock domain issue but perhaps need more investigation when DVFS is actually implemented. My small concern is the number of FIFOs needed for the cross clk issue if FIFOs are in debug modules.
Thanks for the clarification.

@wallento
Copy link
Member

Debug Network

The debug network is a ring with the following properties:

  • Output-buffered to allow backpressure
  • Flit flow control with valid and ready
  • Multi-flit packets with first and last bit
  • Wormhole routing, meaning no packet can be interleaved with another
  • Distributed routing based on a header field
  • Deadlock-free on packet level (means there can still be message-dependent deadlocks)
  • Uni-directional or two uni-directional to form bi-directional

Host Interface

The host interface width is identical to the width on the debug network. Instead of the extra first and last bit it convert it to the length-value format. The same applies to the other direction.

Packets from the Host

Packets from the host go into the debug module. For the moment we can safely assume they are processed sufficiently fast.

Packets from the Debug Modules

The debug modules can produce packets at an arbitrary rate.

Backpressure from the Host Interface

The host interface can be blocked, so that backpressure gets into the network. If the network can not further compensate with its buffers, the debug modules are accordingly blocked. Hence the debug modules should have capabilities to buffer debug packets. When one debug module overflows that means it has to raise an overflow signal.

There are currently two strategies for overflow signals planned:

  • Drop packets until the next packet can be put into the local buffer. A small extra packet that gives the number of lost packets should be inserted
  • Clock Gating of the whole system, except peripherals. This has to be further discussed, in the previous use cases this was pretty robust, but we only had very few I/O

Problem: How to implement proper flow control?

This leads to the issue of how to have proper flow control, namely how to avoid blockages of one debug module by another. I think there are many strategies out there. I will think about it, but maybe we can also get some real networking/NoC experts into the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants