At its root, #750 is about a race condition between the DMA
engine and the STM32 Ethernet, and the host CPU. We can
prevent this race by properly maintaining the Rx tail pointer.
Some background information to contextual this change:
The device and host participate in a simple protocol to drive
a state machine: the host gives descriptors to the device
that specify buffers to DMA incoming Ethernet frames into.
The host gives a descriptor to the device by setting the
ownership bit and setting the tail pointer: this driver
handles the former properly, but does not handle the latter
correctly.
The device consumes these descriptors as frames arrive and
it DMAs them host memory. In turn, the device lets the host
know that it is done with a descriptor by doing two things:
1. Resetting the ownership bit in the descriptor, and
2. Sending an interrupt to the host.
Internally, the device _also_ increments its "head" pointer in
the Rx ring (modulo the size of the ring). If the device hits
the end of the ring (as specified by the value in the Rx tail
pointer register), it goes into a "Suspended" state and stops
DMAing packets to the host; it resumes when the Rx tail pointer
register is written again (note: writing a value that is the
same as the head value on the _first_ write doesn't seem to
start the device).
Both device and host maintain (or, perhaps more precisely,
should maintain) two pointers into the ring: a producer pointer,
and a consumer pointer.
One may think of this as the host producing available
descriptors that are consumed by the device, and the device
producing received frames that are consumed by the host.
The tail pointer sets a demarkation point between the host
and device: descriptors before the tail pointer are owned by
the device, those after are owned by the host. The ownership
bit is used to communicate to the host that the device is
done with a given descriptor.
Put another way, the tail pointer represents the host's producer
pointer; the host's consumer pointer is simply the next
descriptor in the ring after the last one that it processed.
The device's consumer pointer is whatever the host set the tail
pointer to, and it's producer pointer is its head pointer. If
the device encounters a situation where _its_ head and tail
pointer are equal, the ring is full, and the DMA engine enters
the suspended state until the tail pointer is rewritten.
An invariant that both sides of the protocol must maintain is
that host software ensures that it's producer pointer (that
is, what it sets the tail pointer to) is always equal to or
(ie, the ring is full), or greater than (ie, there is space
available in the ring) to the devices's producer pointer
(that is, the devices's "head" pointer).
Of course, we talk about an ordering relationship here,
but all of this is happening modulo the size of the ring,
but that's a detail.
Note that there are two cases here:
1. The first time you write to the tail pointer to set
things in motion at the beginning of time, and
2. Every other time.
For the first time, this change writes the address of the last
valid descriptor in the ring into the tail pointer register (not
one beyond the last!), as we have observed there is no non-racy
way to provide the full ring to the device immediately after
reset: writing the beginning of the ring to the tail pointer
does not start the peripheral, and writing just beyond the end
of it open us up to a race against updating the first descriptor
(until the first time the tail pointer is reset).
After consuming and resetting a descriptor on the host, we write
the address of the next descriptor in the ring to the tail
pointer, indicating that the just-processed descriptor is
available to the device. As we do this for every received frame
we do not starve the ring.
In this scenario, the device follows the host around the ring;
the problem with the first write is solved since, after the
first frame reception, the device's head pointer will be
somewhere beyond the first descriptor in the ring. Thus,
writing the tail pointer to the address of (say) the 0'th
element is fine, since that won't be equal to the head pointer
on the peripheral. Similarly, all available descriptors can
be consumed by the device.
Finally, this allows for some simplification with respect to
barriers: since the tail pointer indicates which descriptors
are owned by the device and which by the host, the host can
batch updates to the descriptors and do a single flush at the
end of its processing, write before writing to the tail pointer.
This was already done in `rx_notify`.
A similar change for the Tx ring has also been sent.
I have validated that, whereas I could reproduce this issue in
a demo branch prior to this change, I cannot reproduce it with
this change.