[stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` #3795

msaelices · 2024-11-22T09:59:14Z

Similar to the Python's int.from_bytes() and int.to_bytes() one.

Similar to the Python int.from_bytes() one Signed-off-by: Manuel Saelices <[email protected]>

Signed-off-by: Manuel Saelices <[email protected]>

lsh · 2024-11-22T21:43:43Z

stdlib/src/builtin/int.mojo

@@ -1194,6 +1195,64 @@ struct Int(

        writer.write(self)

+    @staticmethod
+    fn from_bytes[
+        type: DType, big_endian: Bool = False


I'm in favor of adding a from_bytes and to_bytes method, but I'm not a fan of using a boolean to represent the endianness here. If we wanted to be Pythonic, we could use StringLiteral in the parameter and constrain it to be "little" or "big", or we could make an Endian wrapper struct which we can parameterize other methods on in the future. WDYT @JoeLoser?

I also don't think we want to add a DType parameter here. We can just get the sizeof[Int](), or perhaps add a helper to get the DType equivalent to Int.

I was adding these methods because they are very convenient for IO or networking protocols. For example I am needing these for a pure Mojo implementation of websockets.

If we don't have the DType parameter, and we get an array of 60 bytes from some stream, how can we do the following cases:

I want to get 10 UInt64 integers (10x6=60)

I want to get 10 Int64 integers

I want to get 30 UInt16 integers (30x2=60)

I want to get 30 Int16 integers

Python implementation of int.as_bytes() have the length and byteorder arguments, and int.from_bytes() assumes we always have a 4 bytes integers and only have the byteorder one, but as arguments (run-time) which is slower and in Mojo we can do it comptime.

If we want to implement something IO-related, like WebSockets, we already know the endianness of the target we want to read from or write to, so it's faster if we use parameters.

If anything, these comments make me think that from_bytes and to_bytes belongs on SIMD, not Int.

Python implementation of int.as_bytes() have the length and byteorder arguments, and int.from_bytes() assumes we always have a 4 bytes integers and only have the byteorder one, but as arguments (run-time) which is slower and in Mojo we can do it comptime.

I agree :) that's why I still said it should be a parameter, but my comments were questioning what the parameter should look like, because I'm not sold on Bool.

Yeah if we follow Python's lead, bytes is binary Byte and can be encoded in UInt16 etc. We could also hold off on doing it like Python and let this function be inferred and have bytes: Span[Scalar[D]].

I think type shouldn't be used as prolifically as it is for parameters since it's the name of Python's type builtin function that I think we'll eventually have in Mojo as well.

Suggested change

type: DType, big_endian: Bool = False

D: DType, big_endian: Bool = False

questioning what the parameter should look like, because I'm not sold on Bool.

As for this, I personally never liked sys.byteorder returning "big" or "little" because there is no third option, so a boolean makes sense to me 🤷‍♂️. But compatibility may be a solid argument in favor of it.

Completely agree with @martinvuyk . We are already diverging from the Python int.from_bytes() and int.to_bytes(). As a Python developer, I personally don't mind making a slight adaptation (e.g., "big" -> True and "small" -> False). The real issue would be if there is no equivalent implementation in Mojo's stdlib; you would need to implement the logic yourself.

Signed-off-by: Manuel Saelices <[email protected]>

martinvuyk

Hi @msaelices, cool to know someone else is also pushing things forward for binary serialization.

I have two main opinions:

I think we should let the DType be inferred and have the Span be of that DType. I know Python treats everything like bytes but I think it's something that will bite us in the long term. We have a statically typed language and we can make use of that. So IMO we should hold off on making everything revolve around Byte and having to if-else on every binary interface until we are absolutely sure we want to go that way.
I think we can hold off on controlling endianness in the parameter until users need it.

martinvuyk · 2024-11-25T19:16:52Z

stdlib/src/builtin/int.mojo

@@ -1194,6 +1195,64 @@ struct Int(

        writer.write(self)

+    @staticmethod
+    fn from_bytes[
+        type: DType, big_endian: Bool = False


Yeah if we follow Python's lead, bytes is binary Byte and can be encoded in UInt16 etc. We could also hold off on doing it like Python and let this function be inferred and have bytes: Span[Scalar[D]].

I think type shouldn't be used as prolifically as it is for parameters since it's the name of Python's type builtin function that I think we'll eventually have in Mojo as well.

Suggested change

type: DType, big_endian: Bool = False

D: DType, big_endian: Bool = False

questioning what the parameter should look like, because I'm not sold on Bool.

As for this, I personally never liked sys.byteorder returning "big" or "little" because there is no third option, so a boolean makes sense to me 🤷‍♂️. But compatibility may be a solid argument in favor of it.

martinvuyk · 2024-11-25T19:20:00Z

stdlib/src/builtin/int.mojo

+        var ptr: UnsafePointer[Byte] = UnsafePointer.address_of(bytes[0])
+        var type_ptr: UnsafePointer[Scalar[type]] = ptr.bitcast[Scalar[type]]()
+        var value = type_ptr[]
+
+        @parameter
+        if is_big_endian() and not big_endian:
+            value = byte_swap(value)
+        elif not is_big_endian() and big_endian:
+            value = byte_swap(value)
+        return int(value)


Suggested change

var ptr: UnsafePointer[Byte] = UnsafePointer.address_of(bytes[0])

var type_ptr: UnsafePointer[Scalar[type]] = ptr.bitcast[Scalar[type]]()

var value = type_ptr[]

@parameter

if is_big_endian() and not big_endian:

value = byte_swap(value)

elif not is_big_endian() and big_endian:

value = byte_swap(value)

return int(value)

var ptr = bytes.unsafe_ptr().bitcast[Scalar[type]]()

@parameter

if is_big_endian() and not big_endian:

return int(byte_swap(ptr[]))

elif not is_big_endian() and big_endian:

return int(byte_swap(ptr[]))

else:

return int(ptr[])

if is_big_endian() == big_endian: ... else: ...

martinvuyk · 2024-11-25T19:25:35Z

stdlib/src/builtin/int.mojo

+        var ptr: UnsafePointer[Scalar[type]] = UnsafePointer.address_of(value)
+        var byte_ptr: UnsafePointer[Byte] = ptr.bitcast[Byte]()
+        var list = List[Byte](capacity=type_len)
+
+        # TODO: Maybe this can be a List.extend(ptr, count) method
+        memcpy(list.unsafe_ptr(), byte_ptr, type_len)
+        list.size = type_len
+


I don't think we should add more UnsafePointer APIs than absolutely necessary

Suggested change

var ptr: UnsafePointer[Scalar[type]] = UnsafePointer.address_of(value)

var byte_ptr: UnsafePointer[Byte] = ptr.bitcast[Byte]()

var list = List[Byte](capacity=type_len)

# TODO: Maybe this can be a List.extend(ptr, count) method

memcpy(list.unsafe_ptr(), byte_ptr, type_len)

list.size = type_len

var ptr = UnsafePointer.address_of(value)

var list = List[Byte](capacity=type_len)

memcpy(list.unsafe_ptr(), ptr.bitcast[Byte](), type_len)

list.size = type_len

Agreed. Changed here: msaelices@30027df

martinvuyk · 2024-11-25T19:26:34Z

stdlib/test/builtin/test_int.mojo

@@ -19,6 +19,8 @@ from testing import assert_equal, assert_true, assert_false, assert_raises
 from python import PythonObject
 from memory import UnsafePointer

+alias Bytes = List[Byte]


Suggestion: This definition is not decided over yet. could you take this into the scope of your test?

Sure! Done: msaelices@7766ea3

Signed-off-by: Manuel Saelices <[email protected]>

msaelices · 2024-11-27T00:12:14Z

Hi @msaelices, cool to know someone else is also pushing things forward for binary serialization.

I have two main opinions:

I think we should let the DType be inferred and have the Span be of that DType. I know Python treats everything like bytes but I think it's something that will bite us in the long term. We have a statically typed language and we can make use of that. So IMO we should hold off on making everything revolve around Byte and having to if-else on every binary interface until we are absolutely sure we want to go that way.

Great suggestion. I did it here: msaelices@63fa5b4

Unfortunately, the tests do not work because this kind of errors:

/home/msaelices/src/mojo/stdlib/test/builtin/test_int.mojo:252:62: error: invalid call to 'from_bytes': failed to infer implicit parameter 'is_mutable' of argument 'bytes' type 'Span'            
    assert_equal(Int.from_bytes[DType.int16, big_endian=True](Bytes(0, 16)), 16)

My concern is that we will have to complicate the currently complicated signature by explicitly pass the is_mutable or origin param. Do you have any idea?

I think we can hold off on controlling endianness in the parameter until users need it.

martinvuyk · 2024-11-27T00:47:27Z

Unfortunately, the tests do not work because this kind of errors:
/home/msaelices/src/mojo/stdlib/test/builtin/test_int.mojo:252:62: error: invalid call to 'from_bytes': failed to infer implicit parameter 'is_mutable' of argument 'bytes' type 'Span'            
    assert_equal(Int.from_bytes[DType.int16, big_endian=True](Bytes(0, 16)), 16)  
My concern is that we will have to complicate the currently complicated signature by explicitly pass the is_mutable or origin param. Do you have any idea?

Hmm yet another place where origins get annoying 🙄. This seems like a complete bug, it should work:

struct Span:
    ...
    fn __init__(out self, ref [origin]list: List[T, *_]): ...

You could try something like forcing the casting of the origin:

assert_equal(Int.from_bytes[DType.int16, big_endian=True](Span(Bytes(0, 16)).get_immutable()), 16)

but this should definitely not be necessary.

msaelices · 2024-11-28T09:12:59Z

Unfortunately, the tests do not work because this kind of errors:
/home/msaelices/src/mojo/stdlib/test/builtin/test_int.mojo:252:62: error: invalid call to 'from_bytes': failed to infer implicit parameter 'is_mutable' of argument 'bytes' type 'Span'            
    assert_equal(Int.from_bytes[DType.int16, big_endian=True](Bytes(0, 16)), 16)  
My concern is that we will have to complicate the currently complicated signature by explicitly pass the is_mutable or origin param. Do you have any idea?
Hmm yet another place where origins get annoying 🙄. This seems like a complete bug, it should work:
struct Span:
    ...
    fn __init__(out self, ref [origin]list: List[T, *_]): ...
You could try something like forcing the casting of the origin:
assert_equal(Int.from_bytes[DType.int16, big_endian=True](Span(Bytes(0, 16)).get_immutable()), 16)  
but this should definitely not be necessary.

Maybe this change in the compiler will help here: 3c4f57c

Will check it probably today.

… it [6d395a1c691000ac06fa1bcbd96305d6359d1784] Signed-off-by: Manuel Saelices <[email protected]>

msaelices · 2024-11-28T12:43:39Z

Unfortunately, the tests do not work because this kind of errors:
/home/msaelices/src/mojo/stdlib/test/builtin/test_int.mojo:252:62: error: invalid call to 'from_bytes': failed to infer implicit parameter 'is_mutable' of argument 'bytes' type 'Span'            
    assert_equal(Int.from_bytes[DType.int16, big_endian=True](Bytes(0, 16)), 16)  
My concern is that we will have to complicate the currently complicated signature by explicitly pass the is_mutable or origin param. Do you have any idea?
Hmm yet another place where origins get annoying 🙄. This seems like a complete bug, it should work:
struct Span:
    ...
    fn __init__(out self, ref [origin]list: List[T, *_]): ...
You could try something like forcing the casting of the origin:
assert_equal(Int.from_bytes[DType.int16, big_endian=True](Span(Bytes(0, 16)).get_immutable()), 16)  
but this should definitely not be necessary.
Maybe this change in the compiler will help here: 3c4f57c

Will check it probably today.

I could not get this working even after the support in the compiler of setting the implicit is_mutable parameter :(

I am afraid that we need to revert some of the changes from [6d395a1c691000ac06fa1bcbd96305d6359d1784] make it work. Done here: msaelices@b5e09c5

@martinvuyk we can always revisit this in the future and make it more general.

Signed-off-by: Manuel Saelices <[email protected]>

logic Signed-off-by: Manuel Saelices <[email protected]>

Signed-off-by: Manuel Saelices <[email protected]>

lsh · 2024-12-04T19:44:29Z

stdlib/src/builtin/int.mojo

@@ -1212,6 +1217,65 @@ struct Int(

        writer.write(self)

+    @staticmethod
+    fn from_bytes[


I don't think my comment before was ever properly addressed. I think this method should be on SIMD, not Int. That way the DType makes more sense. We can think more about what it would look like on Int later, and doing int(UInt64.from_bytes(span)) doesn't seem like too much of a hit to me.

Sorry, I missed it. Done:
msaelices@776b4df
msaelices@265b648

Sorry for all the change requests, but to be clear I am comfortable merging these changes on SIMD, but not on Int. I think the SIMD API makes a ton of sense, but the int API would need some more thought since taking a DType feels wrong.

Agreed, I think we should/could land the uncontroversial part (from/to_byte on SIMD) first and iterate. WDYT @msaelices?

Yes, actually the whole point of this PR was to help Python developers who need to migrate Python code using int.from_bytes() and int.to_bytes(). So, to me, it's easier if we have it on the Int struct too, and iterate it.

I disagree with having it on the Int struct too because we would either:

Break code users relied on

Never get around to revising it and have a substandard API

I'm also not compelled by the idea that the goal is to mirror a Python API considering your earlier argument in favor of breaking away from the Python API using a parameter. As I mentioned before, I think int(UInt64.from_bytes(my_span)) is fine for now

…lar type Signed-off-by: Manuel Saelices <[email protected]>

Signed-off-by: Manuel Saelices <[email protected]>

stdlib/src/builtin/simd.mojo

Signed-off-by: Manuel Saelices <[email protected]>

stdlib/src/builtin/int.mojo

stdlib/src/builtin/simd.mojo

…es() Signed-off-by: Manuel Saelices <[email protected]>

Signed-off-by: Manuel Saelices <[email protected]>

lsh · 2024-12-22T02:51:00Z

!sync

lsh · 2025-01-07T02:11:46Z

!sync

modularbot · 2025-01-07T02:56:00Z

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

modularbot · 2025-01-07T17:43:33Z

Landed in ab7bfa5! Thank you for your contribution 🎉

… (#53303) [External] [stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` Similar to the Python's [int.from_bytes()](https://docs.python.org/3/library/stdtypes.html#int.from_bytes) and [int.to_bytes()](https://docs.python.org/3/library/stdtypes.html#int.to_bytes) one. Co-authored-by: Manuel Saelices <[email protected]> Co-authored-by: Lukas Hermann <[email protected]> Closes #3795 MODULAR_ORIG_COMMIT_REV_ID: 1e78cc9dd643009a98c7f3de77c4c250e7c5ea3f

msaelices added 6 commits November 22, 2024 10:53

Implement Int.from_bytes() function

fc608a7

Similar to the Python int.from_bytes() one Signed-off-by: Manuel Saelices <[email protected]>

Add entry to the changelog about Int.from_bytes()

cd91096

Signed-off-by: Manuel Saelices <[email protected]>

Make the Int.from_bytes() accept an span of bytes instead of a list

3eca741

Signed-off-by: Manuel Saelices <[email protected]>

[stdlib] Move the Int.from_bytes() big_endian arg to a comptime param

b9f189d

Signed-off-by: Manuel Saelices <[email protected]>

[stdlib] Implement Int.as_bytes()

cc97ffe

Signed-off-by: Manuel Saelices <[email protected]>

[stdlib] Update changelog to include Int.as_bytes()

b580bc9

Signed-off-by: Manuel Saelices <[email protected]>

msaelices requested a review from a team as a code owner November 22, 2024 09:59

msaelices mentioned this pull request Nov 22, 2024

[stdlib] Implement Int.from_bytes() and Int.as_bytes() #3768

Closed

msaelices added 2 commits November 22, 2024 11:06

[stdlib] Simplify is_big_endian import

569e6ff

Signed-off-by: Manuel Saelices <[email protected]>

Optimize copying bytes into the array as we know it's a trivial type

239bba6

Signed-off-by: Manuel Saelices <[email protected]>

lsh requested changes Nov 22, 2024

View reviewed changes

Merge branch 'nightly' into int-from-and-to-bytes

21384ec

Signed-off-by: Manuel Saelices <[email protected]>

msaelices requested a review from lsh November 24, 2024 18:27

lsh self-assigned this Nov 25, 2024

martinvuyk reviewed Nov 25, 2024

View reviewed changes

msaelices added 2 commits November 26, 2024 16:48

Merge branch 'nightly' into int-from-and-to-bytes

bc22870

Infer the Span type based on DType

63fa5b4

Signed-off-by: Manuel Saelices <[email protected]>

msaelices added 2 commits November 28, 2024 10:13

Merge branch 'nightly' into int-from-and-to-bytes

5b41e1f

[stdlib] Revert part of the changes here as compiler does not support…

b5e09c5

… it [6d395a1c691000ac06fa1bcbd96305d6359d1784] Signed-off-by: Manuel Saelices <[email protected]>

msaelices added 2 commits November 28, 2024 13:45

[stdlib] Move the Bytes alias in the test closer to the scope we use it

7766ea3

Signed-off-by: Manuel Saelices <[email protected]>

[stdlib] Less usage of UnsafePointer in the Int.from_bytes|as_bytes

30027df

logic Signed-off-by: Manuel Saelices <[email protected]>

msaelices requested a review from martinvuyk November 28, 2024 13:14

msaelices added 3 commits November 29, 2024 14:35

More meaningful test function name

eed9463

Signed-off-by: Manuel Saelices <[email protected]>

Merge branch 'nightly' into int-from-and-to-bytes

7f142a8

Fix issue in the previous merge conflict, as we need the Span

3b93cd6

Signed-off-by: Manuel Saelices <[email protected]>

lsh requested changes Dec 4, 2024

View reviewed changes

Move the Int.as_bytes() logic to SIMD, so we can call it with any sca…

265b648

…lar type Signed-off-by: Manuel Saelices <[email protected]>

msaelices requested review from lsh and soraros December 16, 2024 09:15

msaelices added 3 commits December 16, 2024 10:15

Add reference ti new SIMD methods to the changelog

0e5d293

Signed-off-by: Manuel Saelices <[email protected]>

Tests for SIMD.from_bytes and SIMD.as_bytes

6e144d5

Signed-off-by: Manuel Saelices <[email protected]>

Remove uneeded import sentences in int.mojo

3211d24

Signed-off-by: Manuel Saelices <[email protected]>

lsh reviewed Dec 16, 2024

View reviewed changes

stdlib/src/builtin/simd.mojo Outdated Show resolved Hide resolved

Try to not allocating memory by using InlineArray instead of List

8caad41

Signed-off-by: Manuel Saelices <[email protected]>

msaelices force-pushed the int-from-and-to-bytes branch from 047b471 to 8caad41 Compare December 17, 2024 11:28

msaelices requested a review from lsh December 17, 2024 12:39

msaelices added 2 commits December 18, 2024 22:55

Fix the call expansion failed issue. Adapt unit tests

dd29814

Signed-off-by: Manuel Saelices <[email protected]>

Merge branch 'nightly' into int-from-and-to-bytes

694693a

lsh requested changes Dec 18, 2024

View reviewed changes

stdlib/src/builtin/int.mojo Outdated Show resolved Hide resolved

stdlib/src/builtin/simd.mojo Show resolved Hide resolved

stdlib/src/builtin/simd.mojo Outdated Show resolved Hide resolved

msaelices added 2 commits December 21, 2024 23:54

Remove the raises declaration from Int.from_bytes() and SIMD.from_byt…

40b9fc0

…es() Signed-off-by: Manuel Saelices <[email protected]>

Remove the DType parameter from Int.from_bytes() and Int.as_bytes()

98c29fc

Signed-off-by: Manuel Saelices <[email protected]>

msaelices requested review from jackos and a team as code owners December 22, 2024 00:00

msaelices force-pushed the int-from-and-to-bytes branch 2 times, most recently from 2fcb115 to 98c29fc Compare December 22, 2024 00:19

msaelices added 2 commits December 22, 2024 01:24

Merge branch 'nightly' into int-from-and-to-bytes

8db9a83

Signed-off-by: Manuel Saelices <[email protected]>

No need for the Span here

3af72c5

Signed-off-by: Manuel Saelices <[email protected]>

modularbot added the imported-internally Signals that a given pull request has been imported internally. label Dec 22, 2024

modularbot added the merged-internally Indicates that this pull request has been merged internally label Jan 7, 2025

modularbot added the merged-externally Merged externally in public mojo repo label Jan 7, 2025

modularbot closed this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` #3795

[stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` #3795

msaelices commented Nov 22, 2024

lsh Nov 22, 2024

msaelices Nov 23, 2024 •

edited

Loading

lsh Nov 25, 2024 •

edited

Loading

martinvuyk Nov 25, 2024

msaelices Nov 26, 2024 •

edited

Loading

martinvuyk left a comment •

edited

Loading

martinvuyk Nov 25, 2024

martinvuyk Nov 25, 2024 •

edited

Loading

soraros Dec 4, 2024

martinvuyk Nov 25, 2024

msaelices Nov 28, 2024

martinvuyk Nov 25, 2024

msaelices Nov 28, 2024

msaelices commented Nov 27, 2024 •

edited

Loading

martinvuyk commented Nov 27, 2024

msaelices commented Nov 28, 2024 •

edited

Loading

msaelices commented Nov 28, 2024 •

edited

Loading

lsh Dec 4, 2024

msaelices Dec 16, 2024

lsh Dec 16, 2024

soraros Dec 17, 2024 •

edited

Loading

msaelices Dec 17, 2024

lsh Dec 17, 2024

lsh commented Dec 22, 2024

lsh commented Jan 7, 2025

modularbot commented Jan 7, 2025

modularbot commented Jan 7, 2025

	type: DType, big_endian: Bool = False
	D: DType, big_endian: Bool = False

[stdlib] Implement Int.from_bytes() and Int.as_bytes() #3795

[stdlib] Implement Int.from_bytes() and Int.as_bytes() #3795

Conversation

msaelices commented Nov 22, 2024

Choose a reason for hiding this comment

msaelices Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

lsh Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msaelices Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

martinvuyk left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinvuyk Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msaelices commented Nov 27, 2024 • edited Loading

martinvuyk commented Nov 27, 2024

msaelices commented Nov 28, 2024 • edited Loading

msaelices commented Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soraros Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lsh commented Dec 22, 2024

lsh commented Jan 7, 2025

modularbot commented Jan 7, 2025

modularbot commented Jan 7, 2025

[stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` #3795

[stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` #3795

msaelices Nov 23, 2024 •

edited

Loading

lsh Nov 25, 2024 •

edited

Loading

msaelices Nov 26, 2024 •

edited

Loading

martinvuyk left a comment •

edited

Loading

martinvuyk Nov 25, 2024 •

edited

Loading

msaelices commented Nov 27, 2024 •

edited

Loading

msaelices commented Nov 28, 2024 •

edited

Loading

msaelices commented Nov 28, 2024 •

edited

Loading

soraros Dec 17, 2024 •

edited

Loading