Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reading of proto Uint32Value #3113

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -763,6 +763,12 @@ public ProtoUInt32ValueConverter(ParentValueContainer parent) {
this.parent = parent;
}

@Override
public void addInt(int value) {
parent.add(UInt32Value.of(value));
}

// This is left for backward compatibility with the old implementation which used int64 for uint32
@Override
public void addLong(long value) {
parent.add(UInt32Value.of(Math.toIntExact(value)));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ private <T> Builder<? extends Builder<?, GroupBuilder<T>>, GroupBuilder<T>> addF
return builder.primitive(INT32, getRepetition(descriptor));
}
if (messageType.equals(UInt32Value.getDescriptor())) {
return builder.primitive(INT64, getRepetition(descriptor));
return builder.primitive(INT32, getRepetition(descriptor));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unsigned 32-bit integer might overflow a signed 32-bit integer, right? It would be good to test this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added a test using the min and max value of Integer and doing a round trip to make sure we get it back. It looks a bit weird because uint32 are represented as signed 32 bits integer in java, but it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the test really test the out of bounds case. The Integer.MAX_VALUE is 2^31-1 = 2147483647, but the unsigned integer could go up to 2^32-1 = 4294967295. Maybe the original author @mwong38 has an opinion on this (I'm not a protobuf expert myself).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my understanding.

uint32 is not a java concept. Therefore a protobuf field of type uint32, in the java generated code, will be treated as signed 32 bit integer. What java calls an int.

Something like:

message TestMessage {
  uint32 value = 1;
}

Will generate setters and getters that use an int:

 public Builder setValue(int value) // ....

However if I set the value to a negative value, for example builder.setValue(-1) and call toString, it will correctly interpret it as 4294967295.

Again what's important to note is that the underlying representation of a uint32 protobuf type in java is (32bit) int. Say I am to store, from a language that support unsigned integers (eg python), the value of a uint32 to 4294967295. Then if I read it in java I would get -1. But if I serialize the message again and pass it back to C or python, I would get back 4294967295 which is what I want.

I've extended the test to illustrate the behaviour better (adding calls to toString). Also I've added -1 on top of Integer.MIN_VALUE and Integer.MAX_VALUE to correctly test the full spectrum of unsigned integer.

}
if (messageType.equals(BytesValue.getDescriptor())) {
return builder.primitive(BINARY, getRepetition(descriptor));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -763,7 +763,7 @@ void writeRawValue(Object value) {
class UInt32ValueWriter extends FieldWriter {
@Override
void writeRawValue(Object value) {
recordConsumer.addLong(((UInt32Value) value).getValue());
recordConsumer.addInteger(((UInt32Value) value).getValue());
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,34 @@ public void testProto3WrappedMessageClass() throws Exception {
assertEquals(msgNonEmpty, result.get(1));
}

@Test
public void testProto3Uint32Behaviour() throws Exception {

TestProto3.SchemaConverterAllDatatypes intMin = TestProto3.SchemaConverterAllDatatypes.newBuilder()
.setOptionalUInt32(Integer.MIN_VALUE)
.build();
assertEquals(intMin.toString(), "optionalUInt32: 2147483648\n");
TestProto3.SchemaConverterAllDatatypes uintMin = TestProto3.SchemaConverterAllDatatypes.newBuilder()
.setOptionalUInt32(-1)
.build();
assertEquals(uintMin.toString(), "optionalUInt32: 4294967295\n");
TestProto3.SchemaConverterAllDatatypes uintMax = TestProto3.SchemaConverterAllDatatypes.newBuilder()
.setOptionalUInt32(Integer.MAX_VALUE)
.build();
assertEquals(uintMax.toString(), "optionalUInt32: 2147483647\n");

Configuration conf = new Configuration();
Path outputPath = new WriteUsingMR(conf).write(intMin, uintMin, uintMax);
ReadUsingMR readUsingMR = new ReadUsingMR(conf);
String customClass = TestProto3.SchemaConverterAllDatatypes.class.getName();
ProtoReadSupport.setProtobufClass(readUsingMR.getConfiguration(), customClass);
List<Message> result = readUsingMR.read(outputPath);

assertEquals(result.get(0), intMin);
assertEquals(result.get(1), uintMin);
assertEquals(result.get(2), uintMax);
}

/**
* Runs job that writes input to file and then job reading data back.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ public void testProto3ConvertWrappedMessageUnwrapped() throws Exception {
+ " optional int64 wrappedInt64 = 3;\n"
+ " optional int64 wrappedUInt64 = 4;\n"
+ " optional int32 wrappedInt32 = 5;\n"
+ " optional int64 wrappedUInt32 = 6;\n"
+ " optional int32 wrappedUInt32 = 6;\n"
+ " optional boolean wrappedBool = 7;\n"
+ " optional binary wrappedString (UTF8) = 8;\n"
+ " optional binary wrappedBytes = 9;\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,7 @@
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertTrue;

import com.google.protobuf.BoolValue;
import com.google.protobuf.ByteString;
import com.google.protobuf.BytesValue;
import com.google.protobuf.Descriptors;
import com.google.protobuf.DoubleValue;
import com.google.protobuf.DynamicMessage;
import com.google.protobuf.FloatValue;
import com.google.protobuf.Int32Value;
import com.google.protobuf.Int64Value;
import com.google.protobuf.Message;
import com.google.protobuf.MessageOrBuilder;
import com.google.protobuf.StringValue;
import com.google.protobuf.Timestamp;
import com.google.protobuf.UInt32Value;
import com.google.protobuf.UInt64Value;
import com.google.protobuf.Value;
import com.google.protobuf.*;
import com.google.protobuf.util.Timestamps;
import java.io.IOException;
import java.time.LocalDate;
Expand Down Expand Up @@ -1372,6 +1357,40 @@ public void testProto3WrappedMessageUnwrappedRoundTrip() throws Exception {
gotBackFirst.getWrappedBytes().getValue());
}

@Test
public void testProto3WrappedMessageUnwrappedRoundTripUint32() throws Exception {

TestProto3.WrappedMessage msgMin = TestProto3.WrappedMessage.newBuilder()
.setWrappedUInt32(UInt32Value.of(Integer.MIN_VALUE))
.build();
assertEquals(TextFormat.shortDebugString(msgMin), "wrappedUInt32 { value: 2147483648 }");

TestProto3.WrappedMessage msgMax = TestProto3.WrappedMessage.newBuilder()
.setWrappedUInt32(UInt32Value.of(Integer.MAX_VALUE))
.build();
assertEquals(TextFormat.shortDebugString(msgMax), "wrappedUInt32 { value: 2147483647 }");

TestProto3.WrappedMessage msgMinusOne = TestProto3.WrappedMessage.newBuilder()
.setWrappedUInt32(UInt32Value.of(-1))
.build();
assertEquals(TextFormat.shortDebugString(msgMinusOne), "wrappedUInt32 { value: 4294967295 }");

Path tmpFilePath = TestUtils.someTemporaryFilePath();
ParquetWriter<MessageOrBuilder> writer = ProtoParquetWriter.<MessageOrBuilder>builder(tmpFilePath)
.withMessage(TestProto3.WrappedMessage.class)
.config(ProtoWriteSupport.PB_UNWRAP_PROTO_WRAPPERS, "true")
.build();
writer.write(msgMin);
writer.write(msgMax);
writer.write(msgMinusOne);
writer.close();
List<TestProto3.WrappedMessage> gotBack = TestUtils.readMessages(tmpFilePath, TestProto3.WrappedMessage.class);

assertEquals(msgMin, gotBack.get(0));
assertEquals(msgMax, gotBack.get(1));
assertEquals(msgMinusOne, gotBack.get(2));
}

@Test
public void testProto3WrappedMessageWithNullsRoundTrip() throws Exception {
TestProto3.WrappedMessage.Builder msg = TestProto3.WrappedMessage.newBuilder();
Expand Down