Bi-Directional Transcoding of Invalid Identifiers #3

miguelh-nvidia · 2024-03-04T12:24:42Z

Description of Proposal

TfMakeValidIdentifier, was used in OpenUSD to convert any identifier into a valid identifier. However, it creates a
non-bidirectional relationship, for example, something like カーテンウォール would be transformed into ________________.

The objective of this proposal is to provide an alternative to TfMakeValidIdentifier that can take any identifier
(potentially with invalid characters) and transform it into a OpenUSD valid identifier.

Contributing

I agree to and accept the Supplemental Terms.

nvmkuruc

The proposed implementation should touch on whether or not invalid UTF-8 code points are allowed to be encoded or decoded, specifically code points that are greater than the max code point, the replacement code point, and the surrogate ranges.

proposals/transcoding_invalid_identifiers/README.md

miguelh-nvidia · 2024-03-05T13:32:45Z

The proposed implementation should touch on whether or not invalid UTF-8 code points are allowed to be encoded or decoded, specifically code points that are greater than the max code point, the replacement code point, and the surrogate ranges.

Agree, it is not clear in the proposed API. I changed to reflect on that.

input should be valid UTF-8
output with invalid UTF-8
reference to TfUtf8CodePoint for the other aspects (i.e. max code point, replacement code point and surrogate ranges).

proposals/transcoding_invalid_identifiers/README.md

nvmkuruc · 2024-03-06T14:34:58Z

proposals/transcoding_invalid_identifiers/README.md

+);
+```
+
+Existing valid identifiers with `tn__` prefix will produce no changes.


At one point we had discussed that the right behavior is to attempt to decode and then reencode. Motivation-- Let's say I have "tn__MünchenGermany_rEi5, an identifier previously encoded with SdfBoostringEncodeIdentifier, and I want to ensure it's ASCII encode with SdfBoostringEncodeAsciiIdentiifer. It also can function as a validator. Run SdfBootrstringEncodeIdentifier as a way of ensuring that an identifier is properly encoded.

Or actually vice versa-- I'm leaving a domain that required Ascii identifiers and now I want to "upgrade" to Utf8.

I think the input for encode should by any UTF-8 string and we do not really have any interpretation of what that string is. The output of encoding should be:

empty string: in case of invalid UTF-8 string

the same string: if the UTF-8 string is already in the domain of valid characters

encoded string (i.e. with tn__ prefix): if the UTF-8 string is not in the domain of valid characters.

I think the 3 proposed methods pose the minimum set of operations to fix the problem mentioned above:

std::optional<std::string> SdfBootstringReencodeIdentifier(const std::string& identifier) { std::string originalIdentifier = SdfBootstringDecodeIdentifier(identifier).value_or(identifier); return SdfBootstringEncodeIdentifier(originalIdentifier); }

That can be added into the set of methods to the API, and I think the intent is clear: it will attempt to check the passed identifier is a valid encoded identifier and it will encode it again.

miguelh-nvidia requested review from asluk and nvmkuruc March 4, 2024 12:25

nvmkuruc reviewed Mar 4, 2024

View reviewed changes

proposals/transcoding_invalid_identifiers/README.md Show resolved Hide resolved

proposals/transcoding_invalid_identifiers/README.md Outdated Show resolved Hide resolved

nvmkuruc reviewed Mar 5, 2024

View reviewed changes

proposals/transcoding_invalid_identifiers/README.md Outdated Show resolved Hide resolved

nvmkuruc reviewed Mar 5, 2024

View reviewed changes

proposals/transcoding_invalid_identifiers/README.md Outdated Show resolved Hide resolved

nvmkuruc reviewed Mar 5, 2024

View reviewed changes

proposals/transcoding_invalid_identifiers/README.md Outdated Show resolved Hide resolved

nvmkuruc reviewed Mar 6, 2024

View reviewed changes

miguelh-nvidia added 5 commits April 26, 2024 19:35

Bi-Directional Transcoding of Invalid Identifiers

f04497c

Add reviewer comments

bbd7826

Wording

f337909

Add invalid examples, and other case of disadvantage

57c7e94

Add optional changes and break the invalid examples a bit further

376b8e8

miguelh-nvidia force-pushed the bidirectional_transcoding_identifiers branch from 4b2a201 to 376b8e8 Compare April 26, 2024 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bi-Directional Transcoding of Invalid Identifiers #3

Bi-Directional Transcoding of Invalid Identifiers #3

miguelh-nvidia commented Mar 4, 2024

nvmkuruc left a comment

miguelh-nvidia commented Mar 5, 2024

nvmkuruc Mar 6, 2024

nvmkuruc Mar 6, 2024

miguelh-nvidia Mar 7, 2024 •

edited

Loading

Bi-Directional Transcoding of Invalid Identifiers #3

Are you sure you want to change the base?

Bi-Directional Transcoding of Invalid Identifiers #3

Conversation

miguelh-nvidia commented Mar 4, 2024

Description of Proposal

Contributing

nvmkuruc left a comment

Choose a reason for hiding this comment

miguelh-nvidia commented Mar 5, 2024

nvmkuruc Mar 6, 2024

Choose a reason for hiding this comment

nvmkuruc Mar 6, 2024

Choose a reason for hiding this comment

miguelh-nvidia Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

miguelh-nvidia Mar 7, 2024 •

edited

Loading