[stdlib] [NFC] Fix `as_bytes()` documentation #3864

martinvuyk · 2024-12-11T17:05:42Z

Fix as_bytes() documentation

Signed-off-by: martinvuyk <[email protected]>

…mentation

Signed-off-by: martinvuyk <[email protected]>

lsh · 2024-12-12T23:30:03Z

stdlib/src/builtin/string_literal.mojo

        return Span[Byte, StaticConstantOrigin](
            ptr=self.unsafe_ptr(), length=self.byte_length()
        )

    @always_inline
    fn as_bytes(ref self) -> Span[Byte, __origin_of(self)]:
-        """Returns a contiguous slice of the bytes owned by this string.
+        """Returns a contiguous slice of bytes.


I'm not sure I see how this is better. The bytes are owned by the literal.

Returns a contiguous slice of bytes.

doesn't really describe what the bytes contain at all.

It's to keep the documentation consistent across owning and non-owning types.

doesn't really describe what the bytes contain at all.

"utf-8 bytes" sounds less confusing 4u?

My point was that "owned by this string" was descriptive about ownership and this PR would make it more ambiguous.

lsh · 2024-12-12T23:31:08Z

stdlib/src/builtin/string_literal.mojo


        Returns:
-            A contiguous slice pointing to the bytes owned by this string.
+            A contiguous slice pointing to bytes.


This seems grammatically/contextually confusing. If bytes was a variable it would make sense, but in this case it doesn't read quite right.

bytes is the plural of a byte (?), and a contiguous slice of bytes is [byte, byte, ...] (?). Not sure how that's confusing. I'll change it to "utf-8 bytes" if that solves it

Nvm, I think it will be more restraining on the API (AsBytes trait) to say explicitly UTF-8, these bytes could encode anything. Bytes is a type in Python, so this makes sense to me.

I could change just the String, StringLiteral, and StringSlice docs to say UTF-8 bytes though WDYT?.

My point was more:

A contiguous slice pointing to bytes.

What bytes? Who owns them? What do they contain? While this may seem obvious, your change removes that context.

What bytes? [...] What do they contain?

UTF-8 bytes

Who owns them?

The origin that the Span carries

My main goal is to eventually abstract this API over all Stringlike types, owning and non-owning, once the compiler Origin issues are resolved (see PR #3677 files, it won't necessarily look like that but along those lines to be able to receive a T: Stringlike and use .as_btyes() instead of .unsafe_ptr())

I'm still not seeing how this is more descriptive as a documentation line. It seems less descriptive to me.

Also I'd hold off a little on string unification work until the team has time to discuss your proposal. We should definitely improve the general API surface, but we need to think as a team how we want to handle it. For example, implicit conversion into StringSlice will handle a lot of the general string cases.

Because I'm trying to make the documentation consistent across all Stringlike types, not just owning ones. And that means making it more generic. I can add the encoding (UTF-8), but even that information might not hold in the future if we implement UTF-16 & UTF-32 as a parametrized String, and since Python uses encode(string, encoding) -> bytes represents all possible encodings, that complicates things even further.

I agree with @lsh that removing "the bytes owned by this string" isn't an improvement. You haven't made a compelling case for why having less information here helps people using the type.

There's no requirement that different types implementing the same trait have identical API doc for its methods. And where those methods have differences—like how the data is owned—the API doc should note it.

lsh · 2024-12-12T23:59:36Z

I'm going to close this out for now since I don't think this is better than what we have already. I also am opposed to removing the note about the null terminator, which is important to know.

martinvuyk · 2024-12-13T00:01:26Z

I never removed anything about null termination except a redundant code comment (because the docstrings have that info), besides Python doesn't enforce null terminated bytes.

lsh · 2024-12-13T00:02:59Z

StringLiteral does enforce null terminated bytes, even if Python does not. This makes StringLiteral directly compatible with FFI without reallocation.

lsh · 2024-12-13T00:05:57Z

I'll talk more with the team about whether more generic and less descriptive docs are desirable for specific types. In general, I think a trait can have a more general description, but implementors should add specific information that may be relevant to the concrete usage

martinvuyk · 2024-12-13T00:10:16Z

StringLiteral does enforce null terminated bytes, even if Python does not. This makes StringLiteral directly compatible with FFI without reallocation.

I know that, but it also has the misleading docs:

struct StringLiteral(...):
    """This type represents a string literal.

    String literals are all null-terminated for compatibility with C APIs, but
    this is subject to change. String literals store their length as an integer,
    and this does not include the null terminator.
    """
    ...

so whenever I'm doing things with all Stringlike types I just build a span with an unsafe pointer and their .byte_length(). I just want to make those operations more generic in functions. In PR #3861 you can see another attempt at going generic over owning or non-owning stringlike types, that is now blocked by UnsafePointer carrying an origin. There are many APIs which would benefit from being generic over T: Stringlike and avoid allocations or implicit casting

martinvuyk · 2024-12-13T00:12:24Z

I'll talk more with the team about whether more generic and less descriptive docs are desirable for specific types. In general, I think a trait can have a more general description, but implementors should add specific information that may be relevant to the concrete usage

Yeah I also think we could be more specific and use "UTF-8 bytes that make up this string" or something like that, I'm mostly just standing by not calling those bytes owned

Fix as_bytes() documentation

ec08bae

Signed-off-by: martinvuyk <[email protected]>

martinvuyk requested a review from a team as a code owner December 11, 2024 17:05

martinvuyk added 3 commits December 11, 2024 14:08

mojo format

f5b9cc3

Signed-off-by: martinvuyk <[email protected]>

Merge remote-tracking branch 'upstream/nightly' into fix-asbytes-docu…

feba72d

…mentation

unformat my_module.py

989e63a

Signed-off-by: martinvuyk <[email protected]>

lsh reviewed Dec 12, 2024

View reviewed changes

lsh closed this Dec 12, 2024

lsh reopened this Dec 13, 2024

JoeLoser assigned lsh Dec 17, 2024

martinvuyk closed this Dec 19, 2024

martinvuyk deleted the fix-asbytes-documentation branch December 19, 2024 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] [NFC] Fix `as_bytes()` documentation #3864

[stdlib] [NFC] Fix `as_bytes()` documentation #3864

martinvuyk commented Dec 11, 2024

lsh Dec 12, 2024 •

edited

Loading

martinvuyk Dec 12, 2024

lsh Dec 12, 2024

lsh Dec 12, 2024

martinvuyk Dec 12, 2024

martinvuyk Dec 12, 2024 •

edited

Loading

martinvuyk Dec 12, 2024

lsh Dec 12, 2024

martinvuyk Dec 12, 2024 •

edited

Loading

lsh Dec 12, 2024

lsh Dec 12, 2024

martinvuyk Dec 12, 2024

arthurevans Dec 19, 2024

lsh commented Dec 12, 2024

martinvuyk commented Dec 13, 2024

lsh commented Dec 13, 2024

lsh commented Dec 13, 2024

martinvuyk commented Dec 13, 2024

martinvuyk commented Dec 13, 2024

[stdlib] [NFC] Fix as_bytes() documentation #3864

[stdlib] [NFC] Fix as_bytes() documentation #3864

Conversation

martinvuyk commented Dec 11, 2024

lsh Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinvuyk Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinvuyk Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lsh commented Dec 12, 2024

martinvuyk commented Dec 13, 2024

lsh commented Dec 13, 2024

lsh commented Dec 13, 2024

martinvuyk commented Dec 13, 2024

martinvuyk commented Dec 13, 2024

[stdlib] [NFC] Fix `as_bytes()` documentation #3864

[stdlib] [NFC] Fix `as_bytes()` documentation #3864

lsh Dec 12, 2024 •

edited

Loading

martinvuyk Dec 12, 2024 •

edited

Loading

martinvuyk Dec 12, 2024 •

edited

Loading