-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor XML deserialization #1042
Conversation
…rrect tag to decode from
writer.deserializeLoop(serdeCtx) { innerCtx -> | ||
members.forEach { member -> | ||
val name = member.getTrait<XmlNameTrait>()?.value ?: member.memberName | ||
write("// ${member.memberName} ${escape(member.id.toString())}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opinion; I don't think these member name comments are super useful, did you mean to include them or just used for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to include them, I think they are helpful if you have to debug something it points you exactly to the model shape.
.../src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/XmlParserGenerator.kt
Outdated
Show resolved
Hide resolved
.../src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/XmlParserGenerator.kt
Show resolved
Hide resolved
runtime/serde/serde-xml/common/src/aws/smithy/kotlin/runtime/serde/xml/XmlTagReader.kt
Outdated
Show resolved
Hide resolved
return nextTok?.tagReader(reader).also { newScope -> | ||
last = newScope | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: not questioning the correctness but why is last
set to the newly created newScope
reader rather than the old input reader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the last tag reader we dispensed via nextTag
, we use it to ensure that when nextTag
is invoked again that we have the correct state.
runtime/serde/serde-xml/common/test/aws/smithy/kotlin/runtime/serde/xml/XmlTagReaderTest.kt
Show resolved
Hide resolved
@@ -8,20 +8,20 @@ This project contains micro benchmarks for the serialization implementation(s). | |||
./gradlew :runtime:serde:serde-benchmarks:jvmBenchmark | |||
``` | |||
|
|||
Baseline `0.7.8-beta` on EC2 **[m5.4xlarge](https://aws.amazon.com/ec2/instance-types/m5/)** in **OpenJK 1.8.0_312**: | |||
Baseline on EC2 **[m5.4xlarge](https://aws.amazon.com/ec2/instance-types/m5/)** in **Corretto-17.0.10.8.1**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to keep the SDK version here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to but it's kind of a chicken and an egg problem. In a branch we aren't on a tagged version so we either guess what that version is going to be, use a commit sha/PR number, or just let the commit history speak for itself. I chose let the commit history speak for itself.
// FIXME - this task up-to-date checks are wrong, likely something is not setup right with inputs/outputs somewhere | ||
// for now just always run it | ||
outputs.upToDateWhen { false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: is there a backlog task for this?
// FIXME - https://github.com/awslabs/smithy-kotlin/issues/1040 | ||
// @Test | ||
// fun testUnitField() { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be uncommented and filled out? Even if failing, we can add an @Ignore
and then ensure it passes once the bug is fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is what to write as a test? Nothing we put here is going to be right at the moment.
...-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/core/AbstractCodeWriterExt.kt
Show resolved
Hide resolved
...in-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/SerdeExt.kt
Outdated
Show resolved
Hide resolved
...in-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/SerdeExt.kt
Outdated
Show resolved
Hide resolved
...in-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/SerdeExt.kt
Show resolved
Hide resolved
renderDeserializerBody(ctx, shape, members.toList(), writer) | ||
writer.write("return value ?: throw #T(#S)", RuntimeTypes.Serde.DeserializationException, "Deserialized union value unexpectedly null: ${symbol.name}") | ||
renderDeserializerBody(ctx, serdeCtx, shape, members.toList(), writer) | ||
writer.write("return value ?: throw #T(#S)", Serde.DeserializationException, "Deserialized union value unexpectedly null: ${symbol.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: I generally find non-top-level imports to be confusing and would rather read RuntimeTypes.Serde.DeserializationException
than Serde.DeserializationException
, even though the latter is shorter.
runtime/serde/serde-xml/common/src/aws/smithy/kotlin/runtime/serde/xml/XmlStreamReader.kt
Outdated
Show resolved
Hide resolved
runtime/serde/serde-xml/common/src/aws/smithy/kotlin/runtime/serde/xml/XmlTagReader.kt
Outdated
Show resolved
Hide resolved
var cand = nextToken() | ||
while (cand != null && cand !is XmlToken.BeginElement) { | ||
cand = nextToken() | ||
} | ||
|
||
val nextTok = cand as? XmlToken.BeginElement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Might be simpler with a sequence:
val nextTok = generateSequence(::nextToken)
.filterIsInstance<XmlToken.BeginElement>()
.firstOrNull()
runtime/serde/serde-xml/common/src/aws/smithy/kotlin/runtime/serde/xml/XmlTagReader.kt
Outdated
Show resolved
Hide resolved
...-xml/common/src/aws/smithy/kotlin/runtime/serde/xml/deserialization/LexingXmlStreamReader.kt
Outdated
Show resolved
Hide resolved
.../src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/XmlParserGenerator.kt
Show resolved
Hide resolved
.../src/main/kotlin/software/amazon/smithy/kotlin/codegen/rendering/serde/XmlParserGenerator.kt
Show resolved
Hide resolved
Quality Gate failedFailed conditions |
@@ -14,7 +14,7 @@ internal data class Ec2QueryErrorResponse(val errors: List<Ec2QueryError>, val r | |||
internal data class Ec2QueryError(val code: String?, val message: String?) | |||
|
|||
@InternalApi | |||
public fun parseEc2QueryErrorResponse(payload: ByteArray): ErrorDetails { | |||
public suspend fun parseEc2QueryErrorResponse(payload: ByteArray): ErrorDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: this suspend
is now unnecessary but I'm assuming you kept it for backwards-compatibility. should we also deprecate this suspend fun
and create a new non-suspending function?
same for parseRestXmlErrorResponse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could I chose not to for now but don't feel strongly. Yes I kept it for binary compat, this wasn't necessary for some time I don't think since our deserializers haven't been suspend for a very long time.
Issue #
aws-sdk-kotlin#1220
Description of changes
The context for this issue is in aws-sdk-kotlin#1220. Essentially we made a bad assumption that flat collections would always be serialized sequentially. In reality services are returning flat collections interspersed with other XML elements.
Our original approach to deserialization followed closely with kotlinx.serialization where we have a common
Serializer
andDeserializer
interface. Each format we want to support (xml, json, etc) implements those and then codegen is the same across all types. The issue is (1) we end up duplicating information already in the model (field traits) and (2) we have to bend over backwards to make the format work within the interface instead of just creating a runtime type that more closely matches the medium. We discussed as a team our options for addressing this issue and decided to just refactor the way we do XML deserialization to closer match that of Go + Rust. This was something we had discussed prior to GA and just didn't have time to do. Rather than implement a one off workaround tailored specifically to this issue we're going to move in the desired end state which is to generate serializers/deserializers specific to each format (starting with just XML deserialization).This is a large PR so I'm going to try and summarize the important bits for easier review. In particular because a lot of this PR is net new test code.
XmlParserGenerator
guts were replaced to no longer use the common struct/union deserializer and instead generate something directly off the lower levelserde-xml
types. See Codegen Output below for example output and differences.XmlTagReader
- new type that sits on top ofXmlStreamReader
and provides some small conveniences for iterating tokens and maintaining deserializer state.tests/codegen/serde-tests
. This new module has a bit of overlap with the existing protocol tests but the iteration time is quicker and is independent of the protocols. This new module has greater coverage than our previous unit tests and I even found some bugs with how we were generating nested lists/maps inside union types as well as found Union members targeting smithy.api#Unit generates extraneous structures #1040. The idea is the same as protocol tests, use the generated code to test with rather than hand writing tests that mimic the structure of generated code. This approach is both easier to write tests for but more accurate as it is testing the real codegen output as opposed to hand written versions of what we expect codegen to output.serde-benchmarks
project contained a companion moduleserde-codegen-support
that implemented a custom dummy protocol for json + xml. There wasn't anything specific to benchmarks here though so I refactored it to remove notion of benchmarks and moved it totests/serde
as a common module that can be re-used for both the serde benchmarks and the new codegen integration testsCodegen Output
For the S3
ListObjectVersions
output type:Previously
After:
Effect on Artifact Sizes
The 1.0.64 S3 release was 5,072,329 bytes. Local builds are coming in at 5,039,276 bytes (~0.6% smaller).
Benchmarks
I've updated the benchmarks. They are included inline here for easy review. The tl;dr is that the generated deserializers are adding less overhead to raw token lexing than before and as a result is faster.
The lexer internals didn't change so they are nearly the same as the prior baseline. The deserialize benchmark came
in at
33.566
ms/op compared to the prior90.697
ms/op (62% faster).Binary Compatibility
This change intentionally breaks binary compatibility on a few
@InternalApi
APIs:XmlDeserializer
- removed completely as it's no longer usedparseRestXmlError
/parseEc2QueryError
- removed erroneous suspendXmlToken
andXmlToken.QualifiedName
- renamed fields to improve readabilityBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.