fix!: improve case utils for more optimal member names #975

aajtodd · 2023-10-10T21:06:59Z

Issue #

Description of changes

Pull in splitOnWordBoundaries implementation from Rust to fix awslabs/aws-sdk-kotlin#1064.
Adds a current snapshot of all the member names from every AWS model and the current expected value to allow us to see the scope of these changes and in the future ensure we don't break them if we need to handle new edge cases. Same for sdk IDs.

This is a breaking change to how names for pretty much everything are formed.

See this commit for the differences in member and client names.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lauzadis

All the new service names seem good to me!

lauzadis · 2023-10-17T20:30:48Z

...ithy-kotlin-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/utils/CaseUtils.kt

-    // all non-alphanumeric characters: "acm-success"-> "acm success"
-    result = result.replace(Regex("[^A-Za-z0-9+]"), " ")
+    // emit the current word and update from the next character
+    val emit = { next: Char ->


this lambda function definitely seems Rusty

lauzadis · 2023-10-17T20:34:36Z

...ithy-kotlin-codegen/src/test/kotlin/software/amazon/smithy/kotlin/codegen/core/NamingTest.kt

+    @Test
+    fun testAllNames() {
+        // Set this to true to write a new test expectation file
+        val publishUpdate = true


Should we just always have this enabled?

lauzadis · 2023-10-17T20:35:32Z

...ithy-kotlin-codegen/src/test/kotlin/software/amazon/smithy/kotlin/codegen/core/NamingTest.kt

+        val publishUpdate = true
+        val allNames = this::class.java.getResource("/all-names-test-output.txt")?.readText()!!
+        val errors = mutableListOf<String>()
+        val output = StringBuilder()


can we add a header to the file input, actual?

lauzadis · 2023-10-17T20:35:52Z

...ithy-kotlin-codegen/src/test/kotlin/software/amazon/smithy/kotlin/codegen/core/NamingTest.kt

+    fun testClientNames() {
+        // jq '.. | select(.sdkId?).sdkId' codegen/sdk/aws-models/*.json > /tmp/sdk-ids.txt
+        // Set this to true to write a new test expectation file
+        val publishUpdate = true


same comment about whether this should be configurable or not

lauzadis · 2023-10-17T20:36:06Z

...ithy-kotlin-codegen/src/test/kotlin/software/amazon/smithy/kotlin/codegen/core/NamingTest.kt

+        val publishUpdate = true
+        val allNames = this::class.java.getResource("/sdk-ids-test-output.txt")?.readText()!!
+        val errors = mutableListOf<String>()
+        val output = StringBuilder()


also same comment about file header, it makes it easier to know what's what

lauzadis · 2023-10-17T20:42:59Z

codegen/smithy-kotlin-codegen/src/test/resources/sdk-ids.txt

Is there a way to make this auto-generated also?

ianbotsf · 2023-10-17T21:53:02Z

...ithy-kotlin-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/utils/CaseUtils.kt

+    // with minor changes (s3 and iot as whole words). Previously we used the Java v2 implementation
+    // https://github.com/aws/aws-sdk-java-v2/blob/2.20.162/utils/src/main/java/software/amazon/awssdk/utils/internal/CodegenNamingUtils.java#L36
+    // but this has some edge cases it doesn't handle well
+    val out = mutableListOf<String>()


Style: Could be replaced with return buildList { ... }.

ianbotsf · 2023-10-17T22:17:03Z

...ithy-kotlin-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/utils/CaseUtils.kt

+    }
+
+    // Skip cases like `AR[N]s`, `AC[L]s` but not `IAM[U]ser`
+    if (peek == 's' && (doublePeek == null || !doublePeek.isLowerCase())) {


Nit: (doublePeek == null || !doublePeek.isLowerCase()) → doublePeek?.isLowerCase() != true

ianbotsf · 2023-10-17T22:27:02Z

...ithy-kotlin-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/utils/CaseUtils.kt

+    // Skip cases like `DynamoD[B]v2`
+    return !(peek == 'v' && doublePeek?.isDigit() == true)


Correctness/Question: Why do we want to skip cases like this? It seems to me that "DynamoDBv2" in this case represents the words ["Dynamo", "DB", "v2"] not ["Dynamo", "DBv2"]. I think "IPv4" and "IPv6" are exceptions because they're actually whole initialisms but I think generally "vn" is the start of a new word. Several test expectations below changed for the worse IMHO (e.g., "TlsV1_2" → "Tlsv1_2", "MsChapV1" → "MsChapv1", etc.).

Changing this results in:

Member Names

DynamoDBv2Action,dynamoDbv2Action => dynamoDBv2Action (expected dynamoDbv2Action) dynamoDBv2,dynamoDbv2 => dynamoDBv2 (expected dynamoDbv2)

Enum Names

expected 'MsChapv1' != actual 'MsChaPv1' for input: MS-CHAPv1 expected 'Tlsv1' != actual 'TlSv1' for input: TLSv1 expected 'Tlsv1_2' != actual 'TlSv1_2' for input: TLSv1.2 org.opentest4j.AssertionFailedError: expected 'MsChapv1' != actual 'MsChaPv1' for input: MS-CHAPv1 expected 'Tlsv1' != actual 'TlSv1' for input: TLSv1 expected 'Tlsv1_2' != actual 'TlSv1_2' for input: TLSv1.2

Client Names

SESv2,Sesv2 => SeSv2 (expected Sesv2)

Right, turns out I didn't fully understand the endOfAcronym function. Since it's working on determining whether current is the end, I think we need to add a new special handling case between // Not an acronym in progress and // We aren't at the next word yet:

if (nextChar == 'v' && peek?.isDigit() == true) { // Handle cases like `DynamoDB[v]2` return true }

This yields dynamoDbV2, MsChapV1, TlsV1_2, etc.

Full function code

private fun endOfAcronym(current: String, nextChar: Char, peek: Char?, doublePeek: Char?): Boolean { if (!current.last().isUpperCase()) { // Not an acronym in progress return false } if (nextChar == 'v' && peek?.isDigit() == true) { // Handle cases like `DynamoDB[v]2` return true } if (!nextChar.isUpperCase()) { // We aren't at the next word yet return false } if (peek?.isLowerCase() != true) { return false } // Skip cases like `AR[N]s`, `AC[L]s` but not `IAM[U]ser` if (peek == 's' && doublePeek?.isLowerCase() != true) { return false } // Skip cases like `DynamoD[B]v2` return !(peek == 'v' && doublePeek?.isDigit() == true) }

ianbotsf · 2023-10-17T22:45:36Z

codegen/smithy-kotlin-codegen/src/test/resources/all-names-test-output.txt

Nit: This file appears to specifically be a CSV, not just a TXT. Applies to all-names.txt and sdk-ids-test-output.txt as well (and arguably sdk-ids.txt).

ianbotsf · 2023-10-17T22:47:28Z

...ithy-kotlin-codegen/src/test/kotlin/software/amazon/smithy/kotlin/codegen/core/NamingTest.kt

+        if (publishUpdate) {
+            File("sdk-ids-test-output.txt").writeText(output.toString())
+        }
+        if (errors.isNotEmpty()) {
+            fail(errors.joinToString("\n"))
+        }


Correctness/Question: Should we really publish the update if there are errors? Or should the error check (and potential fail) happen first?

the update doesn't overwrite the file, it ends up generated at the root of the project so you can diff them

ianbotsf · 2023-10-23T17:37:29Z

...ithy-kotlin-codegen/src/main/kotlin/software/amazon/smithy/kotlin/codegen/utils/CaseUtils.kt

+        // println("completeWordInProgress: $completeWordInProgress, curr: $currentWord")
+


Nit: Remove

sonarqubecloud · 2023-10-23T17:46:55Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

aajtodd added 4 commits October 13, 2023 12:53

add baseline expectations for case utils on all known member names today

85e8ca9

update case utils with failing tests

b13a341

refactor case utils to use a new word split algorithm

4aa1928

bump version

8b42d62

aajtodd force-pushed the fix-case-utils branch from 65c3f5f to 8b42d62 Compare October 13, 2023 16:53

aajtodd marked this pull request as ready for review October 13, 2023 16:55

aajtodd requested a review from a team as a code owner October 13, 2023 16:55

lauzadis approved these changes Oct 17, 2023

View reviewed changes

ianbotsf reviewed Oct 17, 2023

View reviewed changes

aajtodd added 4 commits October 19, 2023 10:02

feedback

5e9f6a9

minor tweaks to algorithm based on feedback

d9cd9d7

fix test

8666f18

fix downstream test

207cdcf

ianbotsf approved these changes Oct 23, 2023

View reviewed changes

fix ci and remove debug code

ad0a1d9

aajtodd merged commit cb23b6b into main Oct 23, 2023
13 of 14 checks passed

aajtodd deleted the fix-case-utils branch October 23, 2023 18:02

aajtodd mentioned this pull request Nov 15, 2023

fix: optimize splitOnWordBoundaries #999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix!: improve case utils for more optimal member names #975

fix!: improve case utils for more optimal member names #975

aajtodd commented Oct 10, 2023 •

edited

Loading

lauzadis left a comment

lauzadis Oct 17, 2023

lauzadis Oct 17, 2023

lauzadis Oct 17, 2023

lauzadis Oct 17, 2023

lauzadis Oct 17, 2023

lauzadis Oct 17, 2023

ianbotsf Oct 17, 2023

ianbotsf Oct 17, 2023

ianbotsf Oct 17, 2023

aajtodd Oct 19, 2023

ianbotsf Oct 20, 2023

ianbotsf Oct 17, 2023

ianbotsf Oct 17, 2023

aajtodd Oct 19, 2023

ianbotsf Oct 23, 2023

sonarqubecloud bot commented Oct 23, 2023

		// Skip cases like `DynamoD[B]v2`
		return !(peek == 'v' && doublePeek?.isDigit() == true)

		// println("completeWordInProgress: $completeWordInProgress, curr: $currentWord")

fix!: improve case utils for more optimal member names #975

fix!: improve case utils for more optimal member names #975

Conversation

aajtodd commented Oct 10, 2023 • edited Loading

Issue #

Description of changes

lauzadis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Oct 23, 2023

aajtodd commented Oct 10, 2023 •

edited

Loading