Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in protocol buffers specification? #28

Open
Gabriella439 opened this issue Aug 1, 2017 · 4 comments
Open

Inconsistency in protocol buffers specification? #28

Gabriella439 opened this issue Aug 1, 2017 · 4 comments

Comments

@Gabriella439
Copy link
Contributor

The protocol buffers encoding specification says:

Normally, an encoded message would never have more than one instance of a non-repeated field. However, parsers are expected to handle the case in which they do. For numeric types and strings, if the same field appears multiple times, the parser accepts the last value it sees. For embedded message fields, the parser merges multiple instances of the same field, as if with the Message::MergeFrom method – that is, all singular scalar fields in the latter instance replace those in the former, singular embedded messages are merged, and repeated fields are concatenated. The effect of these rules is that parsing the concatenation of two encoded messages produces exactly the same result as if you had parsed the two messages separately and merged the resulting objects. That is, this:

MyMessage message;
message.ParseFromString(str1 + str2);

is equivalent to this:

MyMessage message, message2;
message.ParseFromString(str1);
message2.ParseFromString(str2);
message.MergeFrom(message2);

Let's call this a monoid homomorphism law for ParseFromString

However, this seems to conflict with another protocol buffers design decision to omit default fields when serializing message. Specifically, the protocol buffers 3 language guide says:

When a message is parsed, if the encoded message does not contain a particular singular element, the corresponding field in the parsed object is set to the default value for that field.

Defaulting values breaks the above monoid homomorphism law if you consider the following message type:

message Example {
  int32 foo = 1;
}

... and then you serialize the following two messages:

  • message1 : an Example message with the field foo set to 1
  • message2: an Example message with the field foo set to 0

If you deserialize each ByteString independently and combine the resulting Messages you would get the field foo set to 0 (since the latter Message would default to foo = 0 and fields from the latter Message take precedence), but if you combined the ByteStrings before deserializing, you would get the field foo set to 1 (since the 0 field was never explicitly serialized)

Is there something that I'm missing or is the specification inconsistent?

@j6carey
Copy link
Collaborator

j6carey commented Aug 1, 2017

I cannot tell from the documentation, but perhaps MergeFrom would take the 1 over the 0, even if the 0 comes second, simply because 0 is the default, and therefore equivalent in some sense to omission?

@crclark
Copy link
Collaborator

crclark commented Aug 1, 2017

This is indeed confusing. I just tried it with the official Python implementation, and both cases result in foo = 1.

I believe the key is interpreting the phrase

if the same field appears multiple times, the parser accepts the last value it sees.

It seems that "appears" means "occurs explicitly in the serialized message". A field set to 0 never "appears" because it's set to a default value, and so it isn't present in the serialized message.

This seems like a bad idea (it's surprising that MergeFrom doesn't merge in defaults), but I guess that's the spec...

Here's some code:

syntax = "proto3";

package test;

message Example {
  int32 foo = 1;
}
import test_pb2

x = test_pb2.Example()
x.foo = 1
xStr = x.SerializeToString()

y = test_pb2.Example()
y.foo = 0
yStr = y.SerializeToString()

fromConcat = test_pb2.Example()
fromConcat.ParseFromString(xStr + yStr)

fromMerge = test_pb2.Example()
fromMerge.ParseFromString(xStr)
yParsed = test_pb2.Example()
yParsed.ParseFromString(yStr)
fromMerge.MergeFrom(yParsed)

print('fromConcat: ' + fromConcat.__str__())
print('fromMerge: ' + fromMerge.__str__())

@Gabriella439
Copy link
Contributor Author

Gabriella439 commented Aug 1, 2017

That seems to be consistent with @j6carey's interpretation, which is that merging two scalar fields returns the last field with a non-default value. In other words:

newtype ProtobufInt32 = ProtobufInt { unProtobufInt :: Int32 }

instance Monoid ProtobufInt32 where
    mempty = ProtobufInt32 0

    mappend x (ProtobufInt32 0) = x
    mappend _  x                = x

... which is actually a law-abiding Monoid, albeit weird. That in turns preserves the monoid morphism law

@intractable
Copy link
Collaborator

intractable commented Aug 1, 2017

Agreed. Keep in mind that since the required/optional distinction of proto2 went away, default values are often used semantically to indicate value absence. I think this is why they provide things like Int32Value (among many similar types) for when one wants to e.g. actually use zero as a meaningful value.

So the Monoid is indeed weird but it makes sense along the lines of "merge only the present values together, and when no values are present, use the default value." That's my $0.02, anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants