-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] [NFC] Fix as_bytes()
documentation
#3864
Conversation
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
return Span[Byte, StaticConstantOrigin]( | ||
ptr=self.unsafe_ptr(), length=self.byte_length() | ||
) | ||
|
||
@always_inline | ||
fn as_bytes(ref self) -> Span[Byte, __origin_of(self)]: | ||
"""Returns a contiguous slice of the bytes owned by this string. | ||
"""Returns a contiguous slice of bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how this is better. The bytes are owned by the literal.
Returns a contiguous slice of bytes.
doesn't really describe what the bytes contain at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to keep the documentation consistent across owning and non-owning types.
doesn't really describe what the bytes contain at all.
"utf-8 bytes" sounds less confusing 4u?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that "owned by this string" was descriptive about ownership and this PR would make it more ambiguous.
|
||
Returns: | ||
A contiguous slice pointing to the bytes owned by this string. | ||
A contiguous slice pointing to bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems grammatically/contextually confusing. If bytes
was a variable it would make sense, but in this case it doesn't read quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes is the plural of a byte (?), and a contiguous slice of bytes is [byte, byte, ...]
(?). Not sure how that's confusing. I'll change it to "utf-8 bytes" if that solves it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I think it will be more restraining on the API (AsBytes trait) to say explicitly UTF-8, these bytes could encode anything. Bytes is a type in Python, so this makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could change just the String
, StringLiteral
, and StringSlice
docs to say UTF-8 bytes though WDYT?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was more:
A contiguous slice pointing to bytes.
What bytes? Who owns them? What do they contain? While this may seem obvious, your change removes that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What bytes? [...] What do they contain?
UTF-8 bytes
Who owns them?
The origin that the Span carries
My main goal is to eventually abstract this API over all Stringlike types, owning and non-owning, once the compiler Origin issues are resolved (see PR #3677 files, it won't necessarily look like that but along those lines to be able to receive a T: Stringlike
and use .as_btyes()
instead of .unsafe_ptr()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not seeing how this is more descriptive as a documentation line. It seems less descriptive to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'd hold off a little on string unification work until the team has time to discuss your proposal. We should definitely improve the general API surface, but we need to think as a team how we want to handle it. For example, implicit conversion into StringSlice
will handle a lot of the general string cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I'm trying to make the documentation consistent across all Stringlike
types, not just owning ones. And that means making it more generic. I can add the encoding (UTF-8
), but even that information might not hold in the future if we implement UTF-16
& UTF-32
as a parametrized String
, and since Python uses encode(string, encoding) -> bytes
represents all possible encodings, that complicates things even further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lsh that removing "the bytes owned by this string" isn't an improvement. You haven't made a compelling case for why having less information here helps people using the type.
There's no requirement that different types implementing the same trait have identical API doc for its methods. And where those methods have differences—like how the data is owned—the API doc should note it.
I'm going to close this out for now since I don't think this is better than what we have already. I also am opposed to removing the note about the null terminator, which is important to know. |
I never removed anything about null termination except a redundant code comment (because the docstrings have that info), besides Python doesn't enforce null terminated bytes. |
|
I'll talk more with the team about whether more generic and less descriptive docs are desirable for specific types. In general, I think a trait can have a more general description, but implementors should add specific information that may be relevant to the concrete usage |
I know that, but it also has the misleading docs: struct StringLiteral(...):
"""This type represents a string literal.
String literals are all null-terminated for compatibility with C APIs, but
this is subject to change. String literals store their length as an integer,
and this does not include the null terminator.
"""
... so whenever I'm doing things with all Stringlike types I just build a span with an unsafe pointer and their |
Yeah I also think we could be more specific and use "UTF-8 bytes that make up this string" or something like that, I'm mostly just standing by not calling those bytes owned |
Fix
as_bytes()
documentation