Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring and support for QD #65

Merged
merged 66 commits into from
Sep 17, 2024
Merged

Refactoring and support for QD #65

merged 66 commits into from
Sep 17, 2024

Conversation

sk1p
Copy link
Member

@sk1p sk1p commented May 16, 2024

The idea is to extract the common parts into a shared crate, have common traits etc. and then implement QD Merlin as a proof of concept

Fixes #61

@sk1p sk1p added the enhancement New feature or request label May 16, 2024
sk1p added 24 commits June 7, 2024 20:57
* generic background thread trait: works
* generic connection types: works
* generic cam client types: WIP
* Fix stats: properly count the last frame stack
* Fix `impl_py_connection` close implementation to actually
  `Option::take` the connection implementation and call `close` on it
* Add a lot of debug logging to make it easier to diagnose issues
* Fix off-by-one in decoding function
* Implement dispatch by dtype in decoding logic
  (`Complex{32,64}` still missing)
* Python interface for decoding parts of a stack
* Extract helper `try_cast_if_safe`
* Add support for explicitly free'ing empty stack frames
* Implement methods for (unsafe) zero-copy data access
* Simplify: move `GetStats` impl to common `frame_stack` module
* Add `_Py*Connection::get_meta` for access to full `FrameMeta` vector
* Use generic connection and background thread infrastructure for ASI
  MPX3
* libertem_dectris: rename misleading `common` modules to `base_types`
* Extract `num_from_byte_slice` helper
* Build up basic infrastructure and types
    * Scaffolding for connection, decoder, cam client etc.
* `background_thread`: work on header parsing, reading and peeking
* Extract `three_way_shift` helper
* Add `SharedSlabAllocator.try_get_mut` which has a `Result` instead of
  an `Option`, so we can easily convert it into other error types,
  instead of having to `ok_or(..)` or similar.
* Fix peek: don't retry infinitely, as it's possible the requested
  buffer size is larger than the underlying socket buffer, in which case
  we will never peek enough! Some retrying in case there is really not
  enough data yet
* Parses for the line protocol: MPX prefix, acquisition header, frame
  header
* `recv_frame` that implements logic for receiving into either a primary
  stack frame or a spare (can be extracted into a generic function in
  the future! very similar for other impls)
* `acquisition` function: receive all the frames
* Properly send `ReceiverMsg::ReceiverArmed` when we have connected
  (which is what `start_passive` is waiting for)
* Intruduce `DecoderTargetPixelType` trait and implement that for
  all the common int/float formats
* Add some more test cases for frame/acquisition header parsing
* Fix nasty off-by-one for some test cases
* Start implementing decoding
    * u1/u8/u16/u32/u64 for single and quad
    * r1/r6/r12 for single, raw formats are still missing for quad
* Decoders are still to be validated!
* Add a generic `RawType` trait and `R1`, `R6`, `R12` implementations
* Fix decoding of interleaved quad raw format
    * One generic impl that uses the underlying
      `decode_chunk`/`decode_all` functions on the `RawType` trait
* Add hacky Python interface for decoders for integration testing; not
  for production use (later: for offline decoding? needs better
  interface though...)
* Make `QdFrameMeta` constructible from parts (only for testing)
* Add tests for `decode_ints_be` and `try_cast_if_safe` helpers
* Add another helper: `try_cast_primitive`; useful for decoders etc.
* Check input/output buffer sizes when decoding integer formatted frames
* `start_passive`
    * short-circuit if we are already Armed
    * drop the GIL while waiting for the message from the background
      thread
    * Allow to customize the timeout for waiting for the status change;
      this is useful for example in case we know that the bg thread
      first has to drain stuff
* Implement draining (mostly for backwards-compat.)
* Add numerous logging messages
* `QdFrameMeta::parse_bytes`: ignore additional data after the header
* Add `QdAcquisitionHeader::frames_in_acquisition` Python method
sk1p added 5 commits July 23, 2024 16:27
Default to the old `RecoveryStrategy::ImmediateReconnect`, but allow to
switch to `RecoveryStrategy::DrainThenReconnect`.

Need to experiment if the drain strategy needs to become the default.
On the slow Mac OS 12 workers, there were some spurious timeouts. Let's
see if this is enough.
The test is based on some assumptions, but this should cover the non-raw
use-case.
sk1p added 22 commits July 24, 2024 17:18
Right now, we only support the "pass through the raw data" option,
meaning we don't zero out the border pixels. But we do insert
a two-pixel gap between the sensors, meaning `Layout::L2x2G` results in
a 514x514 output.
In quad raw mode, we need to somehow map the input size (1024x256) to
a sensible size, which depends on the layout and thus also the gap mode.
This is implemented here for the normal quad setup; the EELS layout
might need a fix in the future, too.
* All of them are now >= 1GiB/s on my system w/ AVX2 enabled, most of
  them >= 6GiB/s, greatly increasing efficiency and freeing up resources
  for actual work
    * Instead of using a reversible iterator for output, decode into
      temporary array and reverse that
    * Provide const length guarantees for input and output of the
      `decode_chunk` function
    * Provide more guarantees to the generic `Decoder` impls (like being
      able to convert using `_ as OutputType` from `u8` and `u16`)
* More comprehensive benchmarks
AVX2 looks to be most important; this also adds a more complete list of
Zen2-ish features as one version.
Directly use the macro generated type instead.
This is a combination of the acquisition header and the first frame
header, meaning we can accurately get the detector shape

Still exporting the `QdAcquisitionHeader` to Python, which can be used
to parse from raw bytes
When calling `wait_for_arm`, we must first cancel any already running
acquisitions, wait for the system to be idle, and then can arm it again.
@sk1p sk1p merged commit 0f1683b into LiberTEM:main Sep 17, 2024
33 checks passed
@sk1p sk1p deleted the qd-merlin branch September 17, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port our QD MerlinEM support to rust
1 participant