Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hex-encode defmt symbols to avoid compatibility issues. #878

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Dirbaio
Copy link
Contributor

@Dirbaio Dirbaio commented Oct 28, 2024

Hex-encode defmt symbols to avoid compatibility issues.

defmt currently puts json as-is in symbol names, which can contain special
characters not normally found in symbol names like quotes ", braces {}
and spaces .

This can cause compatibility issues with language features or tools that don't
expect this. For example it breaks sym in asm!.

This is a breaking change of the wire format, so this PR increases the
format numbre. defmt-decoder is updated to be able to decode the new format,
while keeping ability to decode older formats.

  • decoder: improve version mismatch error message, don't mention probe-run.
  • Hex-encode defmt symbols to avoid compatibility issues.

@Dirbaio
Copy link
Contributor Author

Dirbaio commented Oct 28, 2024

  • the cargo-semver-check ci error seems bogus? it's from a previous already-merged PR
  • How to fix backcompat CI tests? it's expected that they fail, this is a wire format version bump.

@jonathanpallant
Copy link
Contributor

I've pinged the team internally so we can work out how to fix those tests. We're on version 4 of the wire format already, so we must have a process for bumping it.

@@ -38,7 +41,11 @@ pub(super) enum SymbolTag {

impl Symbol {
pub fn demangle(raw: &str) -> anyhow::Result<Self> {
serde_json::from_str(raw)
let mut raw = Cow::from(raw);
if let Some(s) = raw.strip_prefix("__defmt_") {
Copy link
Contributor

@jonathanpallant jonathanpallant Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This magic string exists twice (once here, once in the macro), and it doesn't say a lot about the encoding used.

  • Perhaps it should be __defmt_hex_, allowing for some future __defmt_base64_ or similar.
  • Is there somewhere it can be defined once and imported in the other place? If not, we should document the canonical form in a third place (like the book) and have the other two places refer to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it should be _defmt_hex, allowing for some future _defmt_base64 or similar.

changed!

Is there somewhere it can be defined once and imported in the other place? If not, we should document the canonical form in a third place (like the book) and have the other two places refer to it.

there's no crate that's a shared dep of the macro and the decoder. I've added a section to the book.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no crate that's a shared dep of the macro and the decoder. I've added a section to the book.

There is! It's the defmt-parser. I think it makes sense to put the magic string there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh.

defmt git:(main) cargo tree -p defmt-macros | grep parser
├── defmt-parser v0.3.4 (/Users/jonathan/Documents/knurling-rs/defmt/parser)defmt git:(main) cargo tree -p defmt-decoder | grep parser
├── defmt-parser v0.3.4 (/Users/jonathan/Documents/knurling-rs/defmt/parser)

defmt currently puts json as-is in symbol names, which can contain special
characters not normally found in symbol names like quotes `"`, braces `{}`
and spaces ` `.

This can cause compatibility issues with language features or tools that don't
expect this. For example it breaks `sym` in `asm!`.

This is a *breaking change* of the wire format, so this PR increases the
format numbre. `defmt-decoder` is updated to be able to decode the new format,
while keeping ability to decode older formats.
@Urhengulas Urhengulas added the breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version label Nov 5, 2024
@Urhengulas
Copy link
Member

  • the cargo-semver-check ci error seems bogus? it's from a previous already-merged PR

Yes, that's a known problem. I marked the PR as breaking change, which is not true, but skips the check.

* How to fix backcompat CI tests? it's expected that they fail, this is a wire format version bump.

Just bump the commit hash

const REVISION_UNDER_TEST: &str = "0e92d3a88aa472377b964979f522829d961d8986";
.

Copy link
Contributor

@jonathanpallant jonathanpallant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, and one thought.

By hex encoding the symbols, you stop people from scanning the output of objdump -t and seeing the log strings, modules, etc.

Is there some other encoding which would solve the problem of invalid characters (", {, @, etc), but which retains the ability for a human to read the text? I'm thinking of something like C trigraphs perhaps. Or (shudder) C++/Rust symbol mangling.

- The double-quote character `'"'` causes issues with escaping if you use it with `sym` inside an `asm!()` call.

To deal with this, as of Wire Format Version 5, strings are encoded to bytes as UTF-8, and then the bytes are hex-encoded.
The symbol is prefixed with `__defmt_hex_` to denote it's is hex-encoded, and to allow for future expansion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The symbol is prefixed with `__defmt_hex_` to denote it's is hex-encoded, and to allow for future expansion.
The symbol is prefixed with `__defmt_hex_` to denote that it's hex-encoded, and to allow for future expansion.

@jonathanpallant
Copy link
Contributor

We should probably fix the input section name at the same time (i.e. the link_section = "...."). Currently this is JSON, unless you are on macOS when we hex encode it. We should probably always hex encode it, or hash it, or use the disambiguator. I *think* we do still need to put somewhere there (i.e. don't throw them all into .defmt.error` et al) so that the garbage collection can throw out symbols that aren't referenced (which happens at the input section level).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change fix / feature / improvement involves a breaking change and needs to wait until next minor version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants