Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
Robbbert committed May 14, 2024
2 parents 3967dcb + b5205c7 commit d95d2e3
Show file tree
Hide file tree
Showing 5 changed files with 407 additions and 8 deletions.
167 changes: 167 additions & 0 deletions docs/source/techspecs/cpu_device.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
CPU devices
===========

.. contents:: :local:


1. Overview
-----------

CPU devices derivatives are used, unsurprisingly, to implement the
emulation of CPUs, MCUs and SOCs. A CPU device is first a combination
of ``device_execute_interface``, ``device_memory_interface``,
``device_state_interface`` and ``device_disasm_interface``. Refer to
the associated documentations when they exist.

Two more functionalities are specific to CPU devices which are the DRC
and the interruptibility support.


2. DRC
------

TODO.


3. Interruptibility
-------------------

3.1 Definition
~~~~~~~~~~~~~~

An interruptible CPU is defined as a core which is able to suspend the
execution of a instruction at any time, exit execute_run, then at the
next call of ``execute_run`` keep going from where it was. This
includes begin able to abort an issued memory access, quit
execute_run, then upon the next call of execute_run reissue the exact
same access.


3.2 Implementation requirements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Memory accesses must be done with ``read_interruptible`` or
``write_interruptible`` on a ``memory_access_specific`` or a
``memory_access_cache``. The access must be done as bus width and bus
alignment.

After each access the core must test whether ``icount <= 0``. This
test should be done after ``icount`` is decremented of the time taken
by the access itself, to limit the number of tests. When ``icount``
reaches 0 or less it means that the instruction emulation needs to be
suspended.

To know whether the access needs to be re-issued,
``access_to_be_redone()`` needs to be called. If it returns true then
the time taken by the access needs to be credited back, since it
hasn't yet happened, and the access will need to be re-issued. The
call to ``access_to_be_redone()`` clears the reissue flag. If you
need to check the flag without clearing it use
``access_to_be_redone_noclear()``.

The core needs to do enough bookkeeping to eventually restart the
instruction execution just before the access or just after the test,
depending on the need of reissue.

Finally, to indicate to the rest of the infrastructure the support, it
must override cpu_is_interruptible() to return true.


3.3 Example implementation with generators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To ensure decent performance, the current implementations (h8, 6502
and 68000) use a python generator to generate two versions of each
instruction interpreter, one for the normal emulation, and one for
restarting the instruction.

The restarted version looks like that (for a 4-cycles per access cpu):

.. code-block:: C++

void device::execute_inst_restarted()
{
switch(m_inst_substate) {
case 0:
[...]

m_address = [...];
m_mask = [...];
[[fallthrough]];
case 42:
m_result = specific.read_interruptible(m_address, m_mask);
m_icount -= 4;
if(m_icount <= 0) {
if(access_to_be_redone()) {
m_icount += 4;
m_inst_substate = 42;
} else
m_inst_substate = 43;
return;
}
[[fallthrough]];
case 43:
[...] = m_result;
[...]
}
m_inst_substate = 0;
return;
}

The non-restarted version is the same thing with the switch and the
final ``m_inst_substate`` clearing removed.

.. code-block:: C++

void device::execute_inst_non_restarted()
{
[...]
m_address = [...];
m_mask = [...];
m_result = specific.read_interruptible(m_address, m_mask);
m_icount -= 4;
if(m_icount <= 0) {
if(access_to_be_redone()) {
m_icount += 4;
m_inst_substate = 42;
} else
m_inst_substate = 43;
return;
}
[...] = m_result;
[...]
return;
}

The main loop then looks like this:

.. code-block:: C++

void device::execute_run()
{
if(m_inst_substate)
call appropriate restarted instrution handler
while(m_icount > 0) {
debugger_instruction_hook(m_pc);
call appropriate non-restarted instruction handler
}
}

The idea is thus that ``m_inst_substate`` indicates where in an
instruction one is, but only when an interruption happens. It
otherwise stays at 0 and is essentially never looked at. Having two
versions of the interpretation allows to remove the overhead of the
switch and the end-of-instruction substate clearing.

It is not a requirement to use a generator-based that method, but a
different one which does not have unacceptable performance
implications has not yet been found.


3.4 Interaction with DRC
~~~~~~~~~~~~~~~~~~~~~~~~

At this point, interruptibility and DRC are entirely incompatible. We
do not have a method to quit the generated code before or after an
access. It's theorically possible but definitely non-trivial.

1 change: 1 addition & 0 deletions docs/source/techspecs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ MAME’s source or working on scripts that run within the MAME framework.
device_rom_interface
device_disasm_interface
memory
cpu_device
floppy
nscsi
m6502
Expand Down
156 changes: 149 additions & 7 deletions docs/source/techspecs/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,77 @@ or the view can be disabled using the ``disable`` method. A disabled
view can be re-enabled at any time.


.. _3.5:

3.5 Bus contention handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some specific CPUs have be upgraded to be interruptible which allows
to add bus contention and wait states capabitilites. Being
interruptible means, in practice, that an instruction can be
interrupted at any time and the execute_run method of the core exited.
Other devices can then run, then eventually controls returns to the
core and the instruction continues from the point it was started.
Importantly, this can be triggered from a handler and even be used to
interrupt just before the access that is currently done
(e.g. continuation will redo the access).

The CPUs supporting that declare their capability by overriding the
method ``cpu_is_interruptible`` to return true.

Three intermediate contention handlers can be added to accesses:

* ``before_delay``: wait a number of cycles before doing the access.
* ``after_delay``: wait a number of cycles after doing the access.
* ``before_time``: wait for a given time before doing the access.

For the delay handlers, a method or lambda is called which returns the
number of cycles to wait (as a u32).

The ``before_time`` is special. First, the time is compared to the
current value of cpu->total_cycles(). That value is the number of
cycles elapsed since the last reset of the cpu. It is passed as a
parameter to the method as a u64 and must return the earliest time as
a u64 when the access can be done, which can be equal to the passed-in
time. From there two things can happen: either the running cpu has
enough cycles left to consume to reach that time. In that case, the
necessary number of cycles is consumed, and the access is done.
Otherwise, when there isn't enough, the remaining cycles are consumed,
the access aborted, scheduling happens, and eventually the access is
redone. In that case the method is called again with the new current
time, and must return the (probably same) earliest time again. This
will happen until enough cycles to consume are available to directly
do the access.

This approach allows to for instance handle consecutive DMAs. A first
DMA grabs the bus for a transfer. This shows up as the method
answering for the earliest time for access the time of the end of the
dma. If no timer happens until that time the access will then happen
just after the dma finishes. But if a timer elapses before that and
as a consequence another dma is queued while the first is running, the
cycle will be aborted for lack of remaining time, and the method will
eventually be called again. It will then give the time of when the
second dma will finish, and all will be well.

It can also allow to reduce said earlier time when circonstances
require it. For instance a PIO latch that waits up to 64 cycles that
data arrives can indicate that current time + 64 as a target (which
will trigger a bus error for instance) but if a timer elapses and
fills the latch meanwhile the method will be called again and that
time can just return the current time to let the access pass though.
Beware that if the timer elapsing did not fill the latch then the
method must return the time it returned previously, e.g. the initial
access time + 64, otherwise irrelevant timers happening or simply
scheduling quantum effects will delay the timeout, possibly to
infinity if the quantum is small enough.

Contention handlers on the same address are taken into account in the
``before_time``, ``before_delay`` then ``after_delay`` order.
Contention handlers of the same type on the same address at
last-one-wins. Installing any non-contention handler on a range where
a contention handler was removes it.


4. Address maps API
-------------------

Expand All @@ -292,13 +363,14 @@ The general syntax for entries uses method chaining:

.. code-block:: C++

map(start, end).handler(...).handler_qualifier(...).range_qualifier();
map(start, end).handler(...).handler_qualifier(...).range_qualifier().contention();

The values start and end define the range, the handler() block
determines how the access is handled, the handler_qualifier() block
specifies some aspects of the handler (memory sharing for instance) and
the range_qualifier() block refines the range (mirroring, masking, lane
selection, etc.).
specifies some aspects of the handler (memory sharing for instance)
and the range_qualifier() block refines the range (mirroring, masking,
lane selection, etc.). The contention methods handle bus contention
and wait states for cpus supporting them.

The map follows a “last one wins” principle, where the handler specified
last is selected when multiple handlers match a given address.
Expand Down Expand Up @@ -607,7 +679,20 @@ behaviour. An example of use the i960 which marks burstable zones
that way (they have a specific hardware-level support).


4.5 View setup
4.5 Contention
~~~~~~~~~~~~~~

.. code-block:: C++

(...).before_time(method).(...)
(...).before_delay(method).(...)
(...).after_delay(method).(...)

These three methods allow to add the contention methods to a handler.
See section `3.5`_. Multiple methods can be handler to one handler.


4.6 View setup
~~~~~~~~~~~~~~

.. code-block:: C++
Expand Down Expand Up @@ -641,6 +726,7 @@ can be installed only once. A view can also be part of “what was there
before”.



5. Address space dynamic mapping API
------------------------------------

Expand Down Expand Up @@ -803,8 +889,32 @@ with an optional mirror and flags.
Install a device address with an address map in a space. The
``unitmask``, ``cswidth`` and ``flags`` arguments are optional.

5.9 View installation
~~~~~~~~~~~~~~~~~~~~~
5.9 Contention
~~~~~~~~~~~~~~

.. code-block:: C++

using ws_time_delegate = device_delegate<u64 (offs_t, u64)>;
using ws_delay_delegate = device_delegate<u32 (offs_t)>;

space.install_read_before_time(addrstart, addrend, addrmirror, ws_time_delegate)
space.install_write_before_time(addrstart, addrend, addrmirror, ws_time_delegate)
space.install_readwrite_before_time(addrstart, addrend, addrmirror, ws_time_delegate)

space.install_read_before_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
space.install_write_before_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
space.install_readwrite_before_delay(addrstart, addrend, addrmirror, ws_delay_delegate)

space.install_read_after_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
space.install_write_after_delay(addrstart, addrend, addrmirror, ws_delay_delegate)
space.install_readwrite_after_delay(addrstart, addrend, addrmirror, ws_delay_delegate)

Install a contention handler in the decode path. The addrmirror
parameter is optional.


5.10 View installation
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: C++

Expand All @@ -820,3 +930,35 @@ by indexing to call a dynamic mapping method on it.

A view can be installed into a variant of another view without issues,
with only the usual constraint of single installation.

5.11 Taps
~~~~~~~~~

.. code-block:: C++

using tap = std::function<void (offs_t offset, uNN &data, uNN mem_mask)

memory_passthrough_handler mph = space.install_read_tap(addrstart, addrend, name, read_tap, &mph);
memory_passthrough_handler mph = space.install_write_tap(addrstart, addrend, name, write_tap, &mph);
memory_passthrough_handler mph = space.install_readwrite_tap(addrstart, addrend, name, read_tap, write_tap, &mph);

mph.remove();

A tap is a method that is be called when a specific range of addresses
is accessed without overriding the actual access. Taps can change the
data passed around. A write tap happens before the access, and can
change the value to be written. A read tap happens after the access,
and can change the value returned.

Taps must be of the same width and alignement than the bus. Multiple
taps can act over the same addresses.

The ``memory_passthrough_handler`` object collates a number of taps
and allow to remove them all in one call. The ``mph`` parameter is
optional and a new one will be created if absent.

Taps are lost when a new handler is installed at the same addresses
(under the usual principle of last one wins). If they need to be
preserved, one should install a change notifier on the address space,
and remove + reinstall the taps when notified.

Loading

0 comments on commit d95d2e3

Please sign in to comment.