serdect: is it actually constant-time? #1111

fjarri · 2023-06-20T21:35:49Z

The binary serializer uses serializer.serialize_tuple() and serialize_element() which, in some formats at least, makes it data-dependent. E.g. MessagePack prepends every element greater than 127 with 0xCC.

Also, this contradicts the documentation claim:

When using a binary format, the data is serialized as-is into binary.

What was the reason behind not using serialize_bytes()? Seems like it would provide better constant-time guarantees?

The text was updated successfully, but these errors were encountered:

tarcieri · 2023-06-20T21:47:54Z

The README describes it as follows:

This crate provides "best effort" constant-time helper methods for reducing the amount of timing variability involved in serializing/deserializing data when using serde, Rust's standard serialization framework.

Really the goal is to be less noisy than e.g. a JSON serialization of bytes as an integer array. PRs accepted to improve the wording in the README.

What was the reason behind not using serialize_bytes()? Seems like it would provide better constant-time guarantees?

The slice serializer just uses the Serialize::serialize impl on [u8]. I'm not familiar with how that differs from serialize_bytes, but if you think there's a compelling case I'd suggest opening a PR. We test with a number of different serde format impls so perhaps you can point out specific deficiencies with a particular one.

serialize_tuple is used by the array serializer, which avoids length prefixing the data because it's known a priori by the type system. That implementation avoids a problematic deficiency with serde which prevents it from working with const generics: serde-rs/serde#1937

fjarri · 2023-06-20T22:03:23Z

The slice serializer just uses the Serialize::serialize impl on [u8]. I'm not familiar with how that differs from serialize_bytes

It seems that it is different, since serialize_bytes serializes bytes verbatim in MessagePack, while serialize doesn't.

serialize_tuple is used by the array serializer, which avoids length prefixing the data because it's known a priori by the type system.

That's true, but then binary formats instead prefix separate elements, which kind of defeats the point.

tarcieri · 2023-06-20T22:05:15Z

Can you point to a specific implementation, and ideally, write some test cases which capture the issue?

tarcieri · 2023-06-20T22:07:27Z

cc @daxpedda

fjarri · 2023-06-20T22:10:40Z

Can you point to a specific implementation, and ideally, write some test cases which capture the issue?

Let me make an MRE.

Also, speaking of serialize_tuple(), I would argue that while it would be nice to have an array serialization with no length specifier, it is secondary to the intent of serdect which is to a) strive for constant-timeness, and b) serialize bytes unchanged.

tarcieri · 2023-06-20T22:11:15Z

Again, we provide both options. Use the slice serializer if you're okay with a length prefix.

fjarri · 2023-06-20T22:16:06Z

They are not completely interchangeable. slice::deserialize_hex_or_bin() doesn't fail if the buffer is larger than what's being deserialized, which is not what I want if I'm actually deserializing an array.

tarcieri · 2023-06-20T22:22:06Z

I am loathe to change it without an actual empirical demonstration of timing variability or other performance issue. Can you put one together?

tarcieri · 2023-06-20T22:23:18Z

Also note there is a long, long history of anecdotal reports like this with the various serde serializers across the @RustCrypto project, which is why we put this crate together and why it tests against various format implementations, so I would really like to see running code as the motivation for any changes.

fjarri · 2023-06-20T22:25:30Z

I am loathe to change it without an actual empirical demonstration of timing variability or other performance issue. Can you put one together?

On it now. But if the length of the serialized bytestring is data-dependent, doesn't timing variability naturally follow? Not to mention that it's quite unexpected to get a variable-sized serialized data when you thought you were serializing a constant-sized array.

tarcieri · 2023-06-20T22:30:04Z

If the length varies, then the timing will as well, yes.

However, that would seem like a sharp edge and counterclaim to your argument that using a length-prefixed serialization would be better than one guaranteed fixed-length by the type system, if anything.

fjarri · 2023-06-20T22:36:27Z

use serde::{Serialize, Serializer};

#[derive(Serialize)]
struct A(
    #[serde(serialize_with = "serdect::slice::serialize_hex_lower_or_bin")]
    [u8; 8]
    );

#[derive(Serialize)]
struct B(
    #[serde(serialize_with = "serdect::array::serialize_hex_lower_or_bin")]
    [u8; 8]
    );

struct C([u8; 8]);

impl Serialize for C {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        serializer.serialize_bytes(&self.0)
    }
}


fn main() {
    let a = A([1, 2, 254, 255, 3, 4, 252, 253]);

    let b = rmp_serde::encode::to_vec(&a);
    println!("messagepack + slice: {:?}", b);

    let b = bincode::serialize(&a).unwrap();
    println!("bincode + slice: {:?}", b);

    let a = B([1, 2, 254, 255, 3, 4, 252, 253]);

    let b = rmp_serde::encode::to_vec(&a);
    println!("messagepack + array: {:?}", b);

    let b = bincode::serialize(&a).unwrap();
    println!("bincode + array: {:?}", b);

    let a = C([1, 2, 254, 255, 3, 4, 252, 253]);

    let b = rmp_serde::encode::to_vec(&a);
    println!("messagepack + serialize_bytes: {:?}", b);

    let b = bincode::serialize(&a).unwrap();
    println!("bincode + serialize_bytes: {:?}", b);
}

Output:

messagepack + slice: Ok([152, 1, 2, 204, 254, 204, 255, 3, 4, 204, 252, 204, 253])
bincode + slice: [8, 0, 0, 0, 0, 0, 0, 0, 1, 2, 254, 255, 3, 4, 252, 253]
messagepack + array: Ok([152, 1, 2, 204, 254, 204, 255, 3, 4, 204, 252, 204, 253])
bincode + array: [1, 2, 254, 255, 3, 4, 252, 253]
messagepack + serialize_bytes: Ok([196, 8, 1, 2, 254, 255, 3, 4, 252, 253])
bincode + serialize_bytes: [8, 0, 0, 0, 0, 0, 0, 0, 1, 2, 254, 255, 3, 4, 252, 253]

So Bincode behaves well, MessagePack doesn't.

However, that would seem like a sharp edge and counterclaim to your argument that using a length-prefixed serialization would be better than one guaranteed fixed-length by the type system, if anything.

The type system guarantees that the format will receive a constant-sized tuple, but not what it will do with it. Of course, giving it a byte slice instead does not guarantee it either, but it seems to me that it is a better "best effort" for constant-timeness.

tarcieri · 2023-06-20T22:38:59Z

I'm not sure I see anything actionable there?

fjarri · 2023-06-20T22:39:39Z

The "204" elements are not in the actual data and are added before any byte with the value above 127, making the serialization length (and time) data-dependent.

Also it shows the difference between serialize() (used in serdect::slice) and serialize_bytes

tarcieri · 2023-06-20T22:40:31Z

What are you suggesting as a mitigation?

(Honestly, this seems like an argument against MessagePack, not serdect)

tarcieri · 2023-06-20T22:41:26Z

It sounds like moving to serialize_bytes might be a good idea, but has other potential tradeoffs. Can you open a separate issue specific to that?

It also sounds like it might be a difficult breaking change, depending on how various format implementations handle it.

fjarri · 2023-06-20T22:43:41Z

(Honestly, this seems like an argument against MessagePack, not serdect)

There may be other formats that do the same. My argument is that serialize_bytes() provides better constant-time guarantees than serialize() or serialize_tuple(), across the whole existing zoo of formats.

Yes, it will be certainly a breaking ABI change, there's no way around it.

tarcieri · 2023-06-20T22:45:23Z

There may be other formats that do the same

We can only provide guarantees for the formats we test against, so for any format you want to have guarantees, it needs to be added to our test suite.

Obviously we can't provide guarantees for things we haven't tested against.

fjarri · 2023-06-20T22:48:23Z

If one variant works (in the sense of providing constant-time guarantees) in Bincode and CBOR (currently tested ones), and the other works in Bincode, CBOR, and MessagePack, I would argue that the latter variant is better. MessagePack is about 70 times more popular than CBOR according to crates.io download data.

tarcieri · 2023-06-20T22:53:09Z

Can you open a separate issue for a potential switch to serialize_bytes()?

Also I think we can close this issue as "asked and answered". If you feel the existing documentation is insufficient, please open a PR.

daxpedda · 2023-06-20T23:01:37Z

Still waiting for the new issue, but I'm not exactly following what the issue is here.

serialize_bytes() is for variable length data, which ours is not. serialize_tuple() seems to be the right call to me.
I think this is a deficiency in MessagePack, or it's implementation, I know nothing about MessagePack.

Also this doesn't seem like a constant-time issue to me, that different sizes have different timings is kind of unavoidable. I don't think this is what serdect is trying to solve. What it does intend to address is different data having different timings.
Please correct me if I'm wrong here @tarcieri.

fjarri · 2023-06-20T23:03:47Z

I think this is a deficiency in MessagePack

Deficiency or not, it's a de-facto standard used by a lot of people.

that different sizes have different timings is kind of unavoidable

Same sizes, different data.

tarcieri · 2023-06-20T23:05:14Z

We deal with both types of data. serialize_bytes() seems good for slices, less so for arrays since it adds an unnecessary length prefix.

Whether or not we need to add a redundant length prefix to every other format for encoded arrays as a workaround for how MessagePack handles tuples is a rather ugly and highly debatable tradeoff.

I think we should perhaps get a bit more diligent about our list of supported format implementations as it seems we should be worrying about formats that do wacky things like serialize binary data in variable-time. It'd be good to get a table added to the README.md at least.

fjarri · 2023-06-20T23:07:22Z

it seems we should be worrying about formats that do wacky things like serialize binary data in variable-time

To be fair, with the tuple approach, we are serializing separate u8s, not binary data. Which is probably funneled into some "unsized integer" internal representation, and that's how MessagePack sees it.

daxpedda · 2023-06-20T23:12:47Z

I think this is a deficiency in MessagePack

Deficiency or not, it's a de-facto standard used by a lot of people.

So MessagePack seems to support fixed sized arrays and tuples.
I'm not sure what exactly rmp_serde does, but MessagePack should support the right thing.

that different sizes have different timings is kind of unavoidable

Same sizes, different data.

Are you saying the output length is different depending on data if the size is the same?

fjarri · 2023-06-20T23:14:59Z

Are you saying the output length is different depending on data if the size is the same?

Yes, see the posted example. Every byte over 127 gets prefixed with 0xCC.

tarcieri · 2023-06-20T23:15:15Z

I'm not sure what exactly rmp_serde does, but MessagePack should support the right thing.

Curious if there are other Rust (or otherwise) implementations of MessagePack which handle this better

fjarri · 2023-06-20T23:17:07Z

So MessagePack seems to support fixed sized arrays and tuples.

It's not about that. The problem is that serializing u8 is funneled into serializing u64, which is then packed. It seems though that rmp has a write_u8 method, so I'll see if that helps.

daxpedda · 2023-06-20T23:28:22Z

Yes, see the posted example. Every byte over 127 gets prefixed with 0xCC.

Apologies, I think I finally understand the problem now, I should have been reading more carefully.

Yeah, that's definitely an issue. So as you are saying, rmp should use write_u8 when it can to avoid packing in this case. Let's hope this is an issue in the implementation, because I think it would be bad to have to switch to serialize_bytes() for fixed size arrays.

fjarri · 2023-06-20T23:32:07Z

I'm looking into it now, and it doesn't seem like the format supports fixed-size u8 values. write_u8 is exactly the one that adds the 0xCC prefix (which is a marker of a 1-byte integer, see the spec).

daxpedda · 2023-06-20T23:34:28Z

But shouldn't it add this prefix for every u8 then, regardless of it's value?

fjarri · 2023-06-20T23:35:30Z

Nope, see the first line:

positive fixint | 0xxxxxxx | 0x00 - 0x7f

A single byte starting from the bit 0 is an u8 under 128.

daxpedda · 2023-06-20T23:44:05Z

Ah I see ... that's so weird ...

Apologies if you already explained that, but why is serialize_bytes() not doing that?

fjarri · 2023-06-20T23:51:09Z

serialize_bytes() creates https://github.com/msgpack/msgpack/blob/master/spec.md#bin-format-family

daxpedda · 2023-06-20T23:56:37Z

I guess the issue is that serde doesn't really have proper support for fixed sized arrays, tuple is just a "bad" workaround ... which specifically MessagePack implementers can't really do anything about.

@tarcieri I think I'm in favor of changing this now, simply because serde doesn't properly support it. WDYT?

tarcieri · 2023-06-20T23:57:42Z

As I said, I think it might make sense for slices. Arrays are more debatable.

fjarri · 2023-06-20T23:57:49Z

Personally, while the array thing is inconvenient, I can live without it, but I would like to use serialize_bytes() at least for slices. I'm working on a PR now, having some issues with CBOR.

daxpedda · 2023-06-21T00:01:06Z

As I said, I think it might make sense for slices. Arrays are more debatable.

Yeah, I'm in favor of both. Though I understand that there is a tradeoff here.

fjarri · 2023-06-21T00:17:06Z

By the way, apparently CBOR uses packing as well. I replaced 0x0F with 0xFF in the EXAMPLE_BYTES in the tests, and both array and slice tests started to fail. It represents 0xFF as [24, 255]. serialize_bytes fixes this.

fjarri · 2023-06-21T00:35:21Z

Created PR #1112

See #1111 for the discussion leading to this PR.

fjarri mentioned this issue Jun 20, 2023

Use custom (de)serialization for SessionStaticKey into bytes nucypher/nucypher-core#63

Merged

7 tasks

tarcieri closed this as completed Jun 20, 2023

fjarri mentioned this issue Jun 21, 2023

serdect: safer bytes handling #1112

Merged

tarcieri pushed a commit that referenced this issue Oct 31, 2023

serdect: safer bytes handling (#1112)

57b7d02

See #1111 for the discussion leading to this PR.

tarcieri mentioned this issue Nov 1, 2023

hybrid-array: Added serde impls for Array RustCrypto/utils#979

Closed

serdect: is it actually constant-time? #1111

serdect: is it actually constant-time? #1111

Comments

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023 • edited Loading

tarcieri commented Jun 20, 2023

tarcieri commented Jun 20, 2023 • edited Loading

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

daxpedda commented Jun 20, 2023

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023 • edited Loading

fjarri commented Jun 20, 2023

daxpedda commented Jun 20, 2023 • edited Loading

fjarri commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

daxpedda commented Jun 20, 2023

fjarri commented Jun 20, 2023

daxpedda commented Jun 20, 2023 • edited Loading

fjarri commented Jun 20, 2023

daxpedda commented Jun 20, 2023

fjarri commented Jun 20, 2023 • edited Loading

daxpedda commented Jun 20, 2023

tarcieri commented Jun 20, 2023

fjarri commented Jun 20, 2023

daxpedda commented Jun 21, 2023

fjarri commented Jun 21, 2023

fjarri commented Jun 21, 2023 • edited Loading

fjarri commented Jun 20, 2023 •

edited

Loading

tarcieri commented Jun 20, 2023 •

edited

Loading

tarcieri commented Jun 20, 2023 •

edited

Loading

daxpedda commented Jun 20, 2023 •

edited

Loading

daxpedda commented Jun 20, 2023 •

edited

Loading

fjarri commented Jun 20, 2023 •

edited

Loading

fjarri commented Jun 21, 2023 •

edited

Loading