-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reading of proto Uint32Value #3113
base: master
Are you sure you want to change the base?
Fix reading of proto Uint32Value #3113
Conversation
@@ -260,7 +260,7 @@ private <T> Builder<? extends Builder<?, GroupBuilder<T>>, GroupBuilder<T>> addF | |||
return builder.primitive(INT32, getRepetition(descriptor)); | |||
} | |||
if (messageType.equals(UInt32Value.getDescriptor())) { | |||
return builder.primitive(INT64, getRepetition(descriptor)); | |||
return builder.primitive(INT32, getRepetition(descriptor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An unsigned 32-bit integer might overflow a signed 32-bit integer, right? It would be good to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just added a test using the min and max value of Integer and doing a round trip to make sure we get it back. It looks a bit weird because uint32 are represented as signed 32 bits integer in java, but it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the test really test the out of bounds case. The Integer.MAX_VALUE
is 2^31-1 = 2147483647
, but the unsigned integer could go up to 2^32-1 = 4294967295
. Maybe the original author @mwong38 has an opinion on this (I'm not a protobuf expert myself).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my understanding.
uint32
is not a java concept. Therefore a protobuf field of type uint32
, in the java generated code, will be treated as signed 32 bit integer. What java calls an int
.
Something like:
message TestMessage {
uint32 value = 1;
}
Will generate setters and getters that use an int
:
public Builder setValue(int value) // ....
However if I set the value to a negative value, for example builder.setValue(-1)
and call toString
, it will correctly interpret it as 4294967295.
Again what's important to note is that the underlying representation of a uint32 protobuf type in java is (32bit) int. Say I am to store, from a language that support unsigned integers (eg python), the value of a uint32
to 4294967295
. Then if I read it in java I would get -1
. But if I serialize the message again and pass it back to C or python, I would get back 4294967295
which is what I want.
I've extended the test to illustrate the behaviour better (adding calls to toString
). Also I've added -1
on top of Integer.MIN_VALUE
and Integer.MAX_VALUE
to correctly test the full spectrum of unsigned integer.
Rationale for this change
Fix for #3112
What changes are included in this PR?
addLong
implementation of theProtoUInt32ValueConverter
Are these changes tested?
There was extensive test coverage of this feature for round trip reading. The main change is the physical / underlying type change from int32 to int64 which is tested.
There are however no test for backward compatibility.
Are there any user-facing changes?
The parquet file generated from protobuf using Uint32Value will now use int32 instead of int64.
Related MR:
Closes #3112