Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft the design document and prepare a rough mockup of the C++ API #1

Merged
merged 25 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# yakut-native
A CLI tool for diagnostics and debugging of Cyphal networks, written in C++, suitable for embedded computers
# OpenCyphal Vehicle System Management Daemon

👻
122 changes: 122 additions & 0 deletions docs/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Open Cyphal Vehicle System Management Daemon for GNU/Linux

This project implements a user-facing C++14 library backed by a GNU/Linux daemon used to asynchronously perform certain common operations on an OpenCyphal network. Being based on LibCyphal, the solution can theoretically support all transport protocols supported by LibCyphal, notably Cyphal/UDP and Cyphal/CAN.

The implementation is planned to proceed in multiple stages. The milestones achieved at every stage are described here along with the overall longer-term vision.

The design of the C++ API is inspired by the [`ravemodemfactory`](https://github.com/aleksander0m/ravemodemfactory) project (see `src/librmf/rmf-operations.h`).

[Yakut](https://github.com/OpenCyphal/yakut) is a distantly related project with the following key differences:

- Yakut is a developer tool, while OCVSMD is a well-packaged component intended for deployment in production systems.

- Yakut is a user-interactive tool with a CLI, while OCVSMD is equipped with a machine-friendly interface -- a C++ API. Eventually, OCVSMD may be equipped with a CLI as well, but it will always come secondary to the well-formalized C++ API.

- OCVSMD will be suitable for embedded Linux systems including such systems running on single-core "cross-over" processors.

- OCVSMD will be robust.

## Long-term vision

Not all of the listed items will be implemented the way they are seen at the time of writing this document, but the current description provides a general direction things are expected to develop in.

OCVSMD is focused on solving problems that are pervasive in intra-vehicular OpenCyphal networks with minimal focus on any application-specific details. This list may eventually include:

- Publish/subscribe on Cyphal subjects with arbitrary DSDL data types loaded at runtime, with the message objects represented as dynamically typed structures. More on this below.
- RPC client for invoking arbitrarily-typed RPC servers with DSDL types loaded at runtime.
- Support for the common Cyphal network services out of the box, configurable via the daemon API:
- File server running with the specified set of root directories (see Yakut).
- Firmware update on a directly specified remote node with a specified filename.
- Automatic firmware update as implemented in Yakut.
- Centralized (eventually could be distributed for fault tolerance) plug-and-play node-ID allocation server.
- Depending on how the named topics project develops (many an intern has despaired over it), the Cyphal resource name server may also be implemented as part of OCVSMD at some point.
pavel-kirienko marked this conversation as resolved.
Show resolved Hide resolved
- A possible future node authentication protocol may also be implemented in this project.

Being a daemon designed for unattended operation in embedded vehicular computers, OCVSMD must meet the following requirements:

- Ability to operate from a read-only filesystem.
- Startup time much faster than that of Yakut. This should not be an issue for a native application since most of the Yakut startup time is spent on the Python runtime initialization, compilation, and module importing.
- Local node configuration ((redundant) transport configuration, node-ID, node description, etc) is loaded from a file, which is common for daemons.

### Dynamic DSDL loading

Dynamic DSDL loading is proposed to be implemented by creating serializer objects whose behavior is defined by the DSDL definition ingested at runtime. The serialization method is to accept a byte stream and to produce a DSDL object model providing named field accessors, similar to what one would find in a JSON serialization library; the deserialization method is the inverse of that. Naturally, said model will heavily utilize PMR for storage. An API mockup is given in `dsdl.hpp`.

One approach assumes that instances of `dsdl::Object` are not exchanged between the client and the daemon; instead, only their serialized representations are transferred between the processes; thus, the entire DSDL support machinery exists in the client's process only. This approach involves certain work duplication between clients, and may impair their ability to start up quickly if DSDL parsing needs to be done. Another approach is to use shared-memory-friendly containers, e.g., via specialized PMR.

Irrespective of how the dynamic DSDL loading is implemented, the standard data types located in the `uavcan` namespace will be compiled into both the daemon and the clients, as they are used in the API definition -- more on this below.

### C++ API

The API will consist of several well-segregated C++ interfaces, each dedicated to a particular feature subset. The interface-based design is chosen to simplify testing in client applications. The API is intentionally designed to not hide the structure of the Cyphal protocol itself; that is to say that it is intentionally low-level. Higher-level abstractions can be built on top of it on the client side rather than the daemon side to keep the IPC protocol stable.

The `Error` type used in the API definition here is a placeholder for the actual algebraic type listing all possible error states per API entity.

The main file of the C++ API is the `daemon.hpp`, which contains the abstract factory `Daemon` for the specialized interfaces, as well as the static factory factory (sic) `connect() -> Daemon`.

### Anonymous mode considerations

Normally, the daemon should have a node-ID of its own. It should be possible to run it without one, in the anonymous mode, with limited functionality:

- The Monitor will not be able to query GetInfo.
- The RegisterClient, PnPNodeIDAllocator, FileServer, NodeCommandClient, etc. will not be operational.

### Configuration file format

The daemon configuration is stored in a TSV file, where each row contains a key, followed by at least one whitespace separator, followed by the value. The keys are register names. Example:

```tsv
uavcan.node.id 123
uavcan.node.description This is the OCVSMD
uavcan.udp.iface 192.168.1.33 192.168.2.33
```

For the standard register names, refer to <https://github.com/OpenCyphal/public_regulated_data_types/blob/f9f67906cc0ca5d7c1b429924852f6b28f313cbf/uavcan/register/384.Access.1.0.dsdl#L103-L199>.

### CLI

TBD

### Common use cases

#### Firmware update

Per the design of the OpenCyphal's standard network services, the firmware update process is entirely driven by the node being updated (updatee) rather than the node providing the new firmware file (updater). While it is possible to indirectly infer the progress of the update process by observing the offset of the file reads done by the updatee, this solution is fragile because there is ultimately no guarantee that the updatee will read the file sequentially, or even read it in its entirety. Per the OpenCyphal design, the only relevant parameters of a remote node that can be identified robustly are:

- Whether a firmware update is currently in progress or not.
- The version numbers, CRC, and VCS ID of the firmware that is currently being executed.

The proposed API allows one to commence an update process and wait for its completion as follows:

1. Identify the node that requires a firmware update, and locate a suitable firmware image file on the local machine.
2. `daemon.get_file_server().add_root(firmware_path)`, where `firmware_path` is the path to the new image.
3. `daemon.get_node_command_client().begin_software_update(node_id, firmware_name)`, where `firmware_name` is the last component of the `firmware_path`.
4. Using `daemon.get_monitor().snapshot()`, ensure that the node in question has entered the firmware update mode. Abort if not.
5. Using `daemon.get_monitor().snapshot()`, wait until the node has left the firmware update mode.
6. Using `daemon.get_monitor().snapshot()`, ensure that the firmware version numbers match those of the new image.

It is possible to build a convenience method that manages the above steps. Said method will be executed on the client side as opposed the daemon side.

##### Progress monitoring

To enable monitoring the progress of a firmware update process, the following solutions have been considered and rejected:

- Add an additional general-purpose numerical field to `uavcan.node.ExecuteCommand.1` that returns the progress information when an appropriate command (a new standard command) is sent. This is rejected because an RPC-based solution is undesirable.

- Report the progress via `uavcan.node.Heartbeat.1.vendor_specific_status_code`. This is rejected because the VSSC is vendor-specific, so it shouldn't be relied on by the standard.

The plan that is tentatively agreed upon is to define a new standard message with a fixed port-ID for needs of progress reporting. The message will likely be placed in the diagnostics namespace as `uavcan.diagnostic.ProgressReport` with a fixed port-ID of 8183. The `uavcan.node.ExecuteCommand.1` RPC may return a flag indicating if the progress of the freshly launched process will be reported via the new message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With fixed port-id approach, for me it looks like there can be potentially only one operation in progress. IMO more flexible approach would be if ExecuteCommand has a new field with desired port id (f.e. with progressPortId field name) where server should post uavcan.diagnostic.ProgressReport-s for this operation we are executing. If this progressPortId is set (>0) then it means that client wants progresses. With this approach...:

  • we don't need "RPC may return a flag indicating if the progress ..." thing
  • RTP client can decide whether progresses are needed or not
  • potentially multiple simultaneous commands could be executed, each with their own progress msgs flow

Alternately, especially if a fixed port-ID is desired, we should at least add some unique operation id (int64?) into ExecuteCommand and into corresponding ProgressReport messages, so that it will be possible to distinguish to which operation this particular progress msg is related to. Again, we can use default 0 operation id as indication that client doesn't want progresses for this command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this approach...:

I was originally reluctant to agree to this because opening a new publisher on the updatee is a highly fallible operation that is better done at the initialization stage, but then I concluded that the advantages probably outweigh the disadvantages. You have already enumerated the advantages of this approach correctly. I might add that the fixed port-ID solution is too reminiscent of DroneCAN in its rigidity, and if one wants DroneCAN, one knows where to find it.

In the most basic case, each node sending ExecuteCommand requests would have a subscriber for the progress reports configured via a register; the same subject-ID will be sent to the clients when progress reports are desired.

In this case we don't need a new message type as we can just leverage uavcan.primitive.scalar.Natural8.1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe how we currently handle this in our software install is that we query a register on the updatee. Would that be sufficient?

yakut register [NODE_ID] updater.download.status

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lydia-at-amazon That would work, but: we considered a similar solution where we add a progress field to the response of uavcan.protocol.ExecuteCommand. Scott pointed out that an RPC-based solution is considered harmful, so we focused on pub/sub based alternatives. If RPC is acceptable, then I think your current register-based approach is probably the most obvious choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For maintenance operations, such as software update, we're already using lots of RPC services, so I think it should be ok to use the RPC-based solution.


If this message-based approach is chosen, the daemon will subscribe to the message and provide the latest received progress value per node via the monitor interface.

## Milestone 0

This milestone includes the very barebones implementation, including only:

- The daemon itself, compatible with System V architecture only. Support for systemd will be introduced in a future milestone.
- Running a local Cyphal/UDP node. No support for other transports yet.
- Loading the configuration from the configuration file as defined above.
- File server.
- Node command client.

These items will be sufficient to perform firmware updates on remote nodes, but not to monitor the update progress. Progress monitoring will require the Monitor module.
35 changes: 35 additions & 0 deletions docs/daemon.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
namespace ocvsmd
{

/// An abstract factory for the specialized interfaces.
class Daemon
{
public:
virtual std::expected<std::unique_ptr<Publisher>, Error> make_publisher(const dsdl::Type& type,
const std::uint16_t subject_id) = 0;

virtual std::expected<std::unique_ptr<Subscriber>, Error> make_subscriber(const dsdl::Type& type,
const std::uint16_t subject_id) = 0;

virtual std::expected<std::unique_ptr<RPCClient>, Error> make_client(const dsdl::Type& type,
const std::uint16_t server_node_id,
const std::uint16_t service_id) = 0;

virtual FileServer& get_file_server() = 0;
virtual const FileServer& get_file_server() const = 0;

virtual NodeCommandClient& get_node_command_client() = 0;

virtual RegisterClient& get_register_client() = 0;

virtual Monitor& get_monitor() = 0;
virtual const Monitor& get_monitor() const = 0;

virtual PnPNodeIDAllocator& get_pnp_node_id_allocator() = 0;
virtual const PnPNodeIDAllocator& get_pnp_node_id_allocator() const = 0;
};

/// A factory for the abstract factory that connects to the daemon.
/// The pointer is never null on success.
std::expected<std::unique_ptr<Daemon>, Error> connect();
}
47 changes: 47 additions & 0 deletions docs/dsdl.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
namespace ocvsmd::dsdl
{
/// Represents a DSDL object of any type.
class Object
{
friend class Type;
public:
/// Field accessor by name. Empty if no such field.
std::optional<Object> operator[](const std::string_view field_name) const;

/// Array element accessor by index. Empty if out of range.
std::optional<std::span<Object>> operator[](const std::size_t array_index);
std::optional<std::span<const Object>> operator[](const std::size_t array_index) const;

/// Coercion to primitives (implicit truncation or the loss of precision are possible).
operator std::optional<std::int64_t>() const;
operator std::optional<std::uint64_t>() const;
operator std::optional<double>() const;

/// Coercion from primitives (implicit truncation or the loss of precision are possible).
Object& operator=(const std::int64_t value);
Object& operator=(const std::uint64_t value);
Object& operator=(const double value);

const class Type& get_type() const noexcept;

std::expected<void, Error> serialize(const std::span<std::byte> output) const;
std::expected<void, Error> deserialize(const std::span<const std::byte> input);
};

/// Represents a parsed DSDL definition.
class Type
{
friend std::pmr::unordered_map<TypeNameAndVersion, Type> read_namespaces(directories, pmr, ...);
public:
/// Constructs a default-initialized Object of this Type.
Object instantiate() const;
...
};

using TypeNameAndVersion = std::tuple<std::pmr::string, std::uint8_t, std::uint8_t>;

/// Reads all definitions from the specified namespaces and returns mapping from the full type name
/// and version to its type model.
/// Optionally, the function should cache the results per namespace, with an option to disable the cache.
std::pmr::unordered_map<TypeNameAndVersion, Type> read_namespaces(directories, pmr, ...);
}
26 changes: 26 additions & 0 deletions docs/file_server.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
namespace ocvsmd
{

/// The daemon always has the standard file server running.
/// This interface can be used to configure it.
/// It is not possible to stop the server; the closest alternative is to remove all root directories.
class FileServer
{
public:
/// When the file server handles a request, it will attempt to locate the path relative to each of its root
/// directories. See Yakut for a hands-on example.
/// The daemon will canonicalize the path and resolve symlinks.
/// The same path may be added multiple times to avoid interference across different clients.
/// The path may be that of a file rather than a directory.
virtual std::expected<void, Error> add_root(const std::string_view path);

/// Does nothing if such root does not exist (no error reported).
/// If such root is listed more than once, only one copy is removed.
/// The daemon will canonicalize the path and resolve symlinks.
virtual std::expected<void, Error> remove_root(const std::string_view path);

/// The returned paths are canonicalized. The entries are not unique.
virtual std::expected<std::pmr::vector<std::pmr::string>, Error> list_roots() const;
};

}
68 changes: 68 additions & 0 deletions docs/monitor.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#include <uavcan/node/Heartbeat_1.hpp>
#include <uavcan/node/GetInfo_1.hpp>

namespace ocvsmd
{

/// The monitor continuously maintains a list of online nodes in the network.
class Monitor
{
public:
using Heartbeat = uavcan::node::Heartbeat_1;
using NodeInfo = uavcan::node::GetInfo_1::Response;

/// A shadow represents the latest known state of the remote node.
/// The info struct is available only if the node responded to a uavcan.node.GetInfo request since last bootup.
/// GetInfo requests are sent continuously until a response is received.
/// If heartbeat publications cease, the corresponding node is marked as offline.
struct Shadow final
{
std::uint16_t node_id;

bool is_online; ///< If not online, the other fields contain the latest known information.

std::chrono::system_clock::time_point last_heartbeat_at;
Heartbeat last_heartbeat;

/// The info is automatically reset when the remote node is detected to have restarted.
/// It is automatically re-populated as soon as a GetInfo response is received.
struct Info final
{
std::chrono::system_clock::time_point received_at;
NodeInfo info;
};
std::optional<Info> info;

/// The port list is automatically reset when the remote node is detected to have restarted.
/// It is automatically re-populated as soon as an update is received.
struct PortList final
{
std::chrono::system_clock::time_point received_at;
std::bitset<65536> publishers;
std::bitset<65536> subscribers;
std::bitset<512> clients;
std::bitset<512> servers;
};
std::optional<PortList> port_list;
};

struct Snapshot final
{
/// If a node appears online at least once, it will be given a slot in the table permanently.
/// If it goes offline, it will be retained in the table but it's is_online field will be false.
/// The table is ordered by node-ID. Use binary search for fast lookup.
std::pmr::vector<Shadow> table;
std::tuple<Heartbeat, NodeInfo> daemon;
bool has_anonymous; ///< If any anonymous nodes are online (e.g., someone is trying to get a PnP node-ID allocation)
};

/// Returns a snapshot of the current network state plus the daemon's own node state.
virtual Snapshot snap() const = 0;

// TODO: Eventually, we could equip the monitor with snooping support so that we could also obtain:
// - Actual traffic per port.
// - Update node info and local register cache without sending separate requests.
// Yakut does that with the help of the snooping support in PyCyphal, but LibCyphal does not currently have that capability.
};

}
42 changes: 42 additions & 0 deletions docs/node_command_client.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#include <uavcan/node/ExecuteCommand_1.hpp>

namespace ocvsmd
{

/// A helper for invoking the uavcan.node.ExecuteCommand service on the specified remote nodes.
/// The daemon always has a set of uavcan.node.ExecuteCommand clients ready.
class NodeCommandClient
{
public:
using Request = uavcan::node::ExecuteCommand_1::Request;
using Response = uavcan::node::ExecuteCommand_1::Response;

/// Empty response indicates that the associated node did not respond in time.
using Result = std::expected<std::pmr::unordered_map<std::uint16_t, std::optional<Response>>, Error>;

/// Empty option indicates that the corresponding node did not return a response on time.
/// All requests are sent concurrently and the call returns when the last response has arrived,
/// or the timeout has expired.
virtual Result send_custom_command(const std::span<const std::uint16_t> node_ids,
const Request& request,
const std::chrono::microseconds timeout = 1s) = 0;

/// A convenience method for invoking send_custom_command() with COMMAND_RESTART.
Result restart(const std::span<const std::uint16_t> node_ids, const std::chrono::microseconds timeout = 1s)
{
return send_custom_command(node_ids, {65535, ""}, timeout);
}

/// A convenience method for invoking send_custom_command() with COMMAND_BEGIN_SOFTWARE_UPDATE.
/// The file_path is relative to one of the roots configured in the file server.
Result begin_software_update(const std::span<const std::uint16_t> node_ids,
const std::string_view file_path,
const std::chrono::microseconds timeout = 1s)
{
return send_custom_command(node_ids, {65533, file_path}, timeout);
}

// TODO: add convenience methods for the other standard commands.
};

}
Loading