Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] [NFC] Fix as_bytes() documentation #3864

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions stdlib/src/builtin/string_literal.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -530,28 +530,28 @@ struct StringLiteral(

@always_inline
fn as_bytes(self) -> Span[Byte, StaticConstantOrigin]:
"""
Returns a contiguous Span of the bytes owned by this string.
"""Returns a contiguous slice of bytes.

Returns:
A contiguous slice pointing to the bytes owned by this string.
"""
A contiguous slice pointing to bytes.

Notes:
This does not include the trailing null terminator.
"""
return Span[Byte, StaticConstantOrigin](
ptr=self.unsafe_ptr(), length=self.byte_length()
)

@always_inline
fn as_bytes(ref self) -> Span[Byte, __origin_of(self)]:
"""Returns a contiguous slice of the bytes owned by this string.
"""Returns a contiguous slice of bytes.
Copy link
Contributor

@lsh lsh Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how this is better. The bytes are owned by the literal.

Returns a contiguous slice of bytes.

doesn't really describe what the bytes contain at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to keep the documentation consistent across owning and non-owning types.

doesn't really describe what the bytes contain at all.

"utf-8 bytes" sounds less confusing 4u?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that "owned by this string" was descriptive about ownership and this PR would make it more ambiguous.


Returns:
A contiguous slice pointing to the bytes owned by this string.
A contiguous slice pointing to bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems grammatically/contextually confusing. If bytes was a variable it would make sense, but in this case it doesn't read quite right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes is the plural of a byte (?), and a contiguous slice of bytes is [byte, byte, ...] (?). Not sure how that's confusing. I'll change it to "utf-8 bytes" if that solves it

Copy link
Contributor Author

@martinvuyk martinvuyk Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, I think it will be more restraining on the API (AsBytes trait) to say explicitly UTF-8, these bytes could encode anything. Bytes is a type in Python, so this makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change just the String, StringLiteral, and StringSlice docs to say UTF-8 bytes though WDYT?.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was more:

A contiguous slice pointing to bytes.

What bytes? Who owns them? What do they contain? While this may seem obvious, your change removes that context.

Copy link
Contributor Author

@martinvuyk martinvuyk Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bytes? [...] What do they contain?

UTF-8 bytes

Who owns them?

The origin that the Span carries


My main goal is to eventually abstract this API over all Stringlike types, owning and non-owning, once the compiler Origin issues are resolved (see PR #3677 files, it won't necessarily look like that but along those lines to be able to receive a T: Stringlike and use .as_btyes() instead of .unsafe_ptr())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not seeing how this is more descriptive as a documentation line. It seems less descriptive to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'd hold off a little on string unification work until the team has time to discuss your proposal. We should definitely improve the general API surface, but we need to think as a team how we want to handle it. For example, implicit conversion into StringSlice will handle a lot of the general string cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I'm trying to make the documentation consistent across all Stringlike types, not just owning ones. And that means making it more generic. I can add the encoding (UTF-8), but even that information might not hold in the future if we implement UTF-16 & UTF-32 as a parametrized String, and since Python uses encode(string, encoding) -> bytes represents all possible encodings, that complicates things even further.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lsh that removing "the bytes owned by this string" isn't an improvement. You haven't made a compelling case for why having less information here helps people using the type.

There's no requirement that different types implementing the same trait have identical API doc for its methods. And where those methods have differences—like how the data is owned—the API doc should note it.


Notes:
This does not include the trailing null terminator.
"""
# Does NOT include the NUL terminator.
return Span[Byte, __origin_of(self)](
ptr=self.unsafe_ptr(), length=self.byte_length()
)
Expand Down
8 changes: 3 additions & 5 deletions stdlib/src/collections/string.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -1620,18 +1620,16 @@ struct String(

@always_inline
fn as_bytes(ref self) -> Span[Byte, __origin_of(self)]:
"""Returns a contiguous slice of the bytes owned by this string.
"""Returns a contiguous slice of bytes.

Returns:
A contiguous slice pointing to the bytes owned by this string.
A contiguous slice pointing to bytes.

Notes:
This does not include the trailing null terminator.
"""

# Does NOT include the NUL terminator.
return Span[Byte, __origin_of(self)](
ptr=self._buffer.unsafe_ptr(), length=self.byte_length()
ptr=self.unsafe_ptr(), length=self.byte_length()
)

@always_inline
Expand Down
8 changes: 3 additions & 5 deletions stdlib/src/memory/span.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,14 @@ from memory import Pointer, UnsafePointer


trait AsBytes:
"""
The `AsBytes` trait denotes a type that can be returned as a immutable byte
span.
"""The `AsBytes` trait denotes a type that can be returned as a byte span.
"""

fn as_bytes(ref self) -> Span[Byte, __origin_of(self)]:
"""Returns a contiguous slice of the bytes owned by this string.
"""Returns a contiguous slice of bytes.

Returns:
A contiguous slice pointing to the bytes owned by this string.
A contiguous slice pointing to bytes.

Notes:
This does not include the trailing null terminator.
Expand Down
7 changes: 5 additions & 2 deletions stdlib/src/utils/string_slice.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -755,10 +755,13 @@ struct StringSlice[is_mutable: Bool, //, origin: Origin[is_mutable]](

@always_inline
fn as_bytes(self) -> Span[Byte, origin]:
"""Get the sequence of encoded bytes of the underlying string.
"""Returns a contiguous slice of bytes.

Returns:
A slice containing the underlying sequence of encoded bytes.
A contiguous slice pointing to bytes.

Notes:
This does not include the trailing null terminator.
"""
return self._slice

Expand Down
Loading