-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Implement Int.from_bytes()
and Int.as_bytes()
#3795
Conversation
Similar to the Python int.from_bytes() one Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
stdlib/src/builtin/int.mojo
Outdated
@@ -1194,6 +1195,64 @@ struct Int( | |||
|
|||
writer.write(self) | |||
|
|||
@staticmethod | |||
fn from_bytes[ | |||
type: DType, big_endian: Bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of adding a from_bytes
and to_bytes
method, but I'm not a fan of using a boolean to represent the endianness here. If we wanted to be Pythonic, we could use StringLiteral
in the parameter and constrain it to be "little"
or "big"
, or we could make an Endian
wrapper struct which we can parameterize other methods on in the future. WDYT @JoeLoser?
I also don't think we want to add a DType
parameter here. We can just get the sizeof[Int]()
, or perhaps add a helper to get the DType
equivalent to Int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was adding these methods because they are very convenient for IO or networking protocols. For example I am needing these for a pure Mojo implementation of websockets.
If we don't have the DType
parameter, and we get an array of 60 bytes from some stream, how can we do the following cases:
- I want to get 10 UInt64 integers (10x6=60)
- I want to get 10 Int64 integers
- I want to get 30 UInt16 integers (30x2=60)
- I want to get 30 Int16 integers
Python implementation of int.as_bytes()
have the length
and byteorder
arguments, and int.from_bytes()
assumes we always have a 4 bytes integers and only have the byteorder
one, but as arguments (run-time) which is slower and in Mojo we can do it comptime.
If we want to implement something IO-related, like WebSockets, we already know the endianness of the target we want to read from or write to, so it's faster if we use parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything, these comments make me think that from_bytes
and to_bytes
belongs on SIMD
, not Int
.
Python implementation of int.as_bytes() have the length and byteorder arguments, and int.from_bytes() assumes we always have a 4 bytes integers and only have the byteorder one, but as arguments (run-time) which is slower and in Mojo we can do it comptime.
I agree :) that's why I still said it should be a parameter, but my comments were questioning what the parameter should look like, because I'm not sold on Bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if we follow Python's lead, bytes
is binary Byte
and can be encoded in UInt16
etc. We could also hold off on doing it like Python and let this function be inferred and have bytes: Span[Scalar[D]]
.
I think type
shouldn't be used as prolifically as it is for parameters since it's the name of Python's type
builtin function that I think we'll eventually have in Mojo as well.
type: DType, big_endian: Bool = False | |
D: DType, big_endian: Bool = False |
questioning what the parameter should look like, because I'm not sold on Bool.
As for this, I personally never liked sys.byteorder
returning "big"
or "little"
because there is no third option, so a boolean makes sense to me 🤷♂️. But compatibility may be a solid argument in favor of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agree with @martinvuyk . We are already diverging from the Python int.from_bytes()
and int.to_bytes()
. As a Python developer, I personally don't mind making a slight adaptation (e.g., "big"
-> True
and "small"
-> False
). The real issue would be if there is no equivalent implementation in Mojo's stdlib; you would need to implement the logic yourself.
Signed-off-by: Manuel Saelices <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @msaelices, cool to know someone else is also pushing things forward for binary serialization.
I have two main opinions:
- I think we should let the
DType
be inferred and have theSpan
be of thatDType
. I know Python treats everything likebytes
but I think it's something that will bite us in the long term. We have a statically typed language and we can make use of that. So IMO we should hold off on making everything revolve aroundByte
and having toif-else
on every binary interface until we are absolutely sure we want to go that way. - I think we can hold off on controlling endianness in the parameter until users need it.
stdlib/src/builtin/int.mojo
Outdated
@@ -1194,6 +1195,64 @@ struct Int( | |||
|
|||
writer.write(self) | |||
|
|||
@staticmethod | |||
fn from_bytes[ | |||
type: DType, big_endian: Bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if we follow Python's lead, bytes
is binary Byte
and can be encoded in UInt16
etc. We could also hold off on doing it like Python and let this function be inferred and have bytes: Span[Scalar[D]]
.
I think type
shouldn't be used as prolifically as it is for parameters since it's the name of Python's type
builtin function that I think we'll eventually have in Mojo as well.
type: DType, big_endian: Bool = False | |
D: DType, big_endian: Bool = False |
questioning what the parameter should look like, because I'm not sold on Bool.
As for this, I personally never liked sys.byteorder
returning "big"
or "little"
because there is no third option, so a boolean makes sense to me 🤷♂️. But compatibility may be a solid argument in favor of it.
stdlib/src/builtin/int.mojo
Outdated
var ptr: UnsafePointer[Byte] = UnsafePointer.address_of(bytes[0]) | ||
var type_ptr: UnsafePointer[Scalar[type]] = ptr.bitcast[Scalar[type]]() | ||
var value = type_ptr[] | ||
|
||
@parameter | ||
if is_big_endian() and not big_endian: | ||
value = byte_swap(value) | ||
elif not is_big_endian() and big_endian: | ||
value = byte_swap(value) | ||
return int(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var ptr: UnsafePointer[Byte] = UnsafePointer.address_of(bytes[0]) | |
var type_ptr: UnsafePointer[Scalar[type]] = ptr.bitcast[Scalar[type]]() | |
var value = type_ptr[] | |
@parameter | |
if is_big_endian() and not big_endian: | |
value = byte_swap(value) | |
elif not is_big_endian() and big_endian: | |
value = byte_swap(value) | |
return int(value) | |
var ptr = bytes.unsafe_ptr().bitcast[Scalar[type]]() | |
@parameter | |
if is_big_endian() and not big_endian: | |
return int(byte_swap(ptr[])) | |
elif not is_big_endian() and big_endian: | |
return int(byte_swap(ptr[])) | |
else: | |
return int(ptr[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if is_big_endian() == big_endian:
...
else:
...
stdlib/src/builtin/int.mojo
Outdated
var ptr: UnsafePointer[Scalar[type]] = UnsafePointer.address_of(value) | ||
var byte_ptr: UnsafePointer[Byte] = ptr.bitcast[Byte]() | ||
var list = List[Byte](capacity=type_len) | ||
|
||
# TODO: Maybe this can be a List.extend(ptr, count) method | ||
memcpy(list.unsafe_ptr(), byte_ptr, type_len) | ||
list.size = type_len | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add more UnsafePointer
APIs than absolutely necessary
var ptr: UnsafePointer[Scalar[type]] = UnsafePointer.address_of(value) | |
var byte_ptr: UnsafePointer[Byte] = ptr.bitcast[Byte]() | |
var list = List[Byte](capacity=type_len) | |
# TODO: Maybe this can be a List.extend(ptr, count) method | |
memcpy(list.unsafe_ptr(), byte_ptr, type_len) | |
list.size = type_len | |
var ptr = UnsafePointer.address_of(value) | |
var list = List[Byte](capacity=type_len) | |
memcpy(list.unsafe_ptr(), ptr.bitcast[Byte](), type_len) | |
list.size = type_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Changed here: msaelices@30027df
stdlib/test/builtin/test_int.mojo
Outdated
@@ -19,6 +19,8 @@ from testing import assert_equal, assert_true, assert_false, assert_raises | |||
from python import PythonObject | |||
from memory import UnsafePointer | |||
|
|||
alias Bytes = List[Byte] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: This definition is not decided over yet. could you take this into the scope of your test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Done: msaelices@7766ea3
Signed-off-by: Manuel Saelices <[email protected]>
Great suggestion. I did it here: msaelices@63fa5b4 Unfortunately, the tests do not work because this kind of errors:
My concern is that we will have to complicate the currently complicated signature by explicitly pass the
|
Hmm yet another place where origins get annoying 🙄. This seems like a complete bug, it should work: struct Span:
...
fn __init__(out self, ref [origin]list: List[T, *_]): ... You could try something like forcing the casting of the origin: assert_equal(Int.from_bytes[DType.int16, big_endian=True](Span(Bytes(0, 16)).get_immutable()), 16) but this should definitely not be necessary. |
Maybe this change in the compiler will help here: 3c4f57c Will check it probably today. |
… it [6d395a1c691000ac06fa1bcbd96305d6359d1784] Signed-off-by: Manuel Saelices <[email protected]>
I could not get this working even after the support in the compiler of setting the implicit I am afraid that we need to revert some of the changes from [6d395a1c691000ac06fa1bcbd96305d6359d1784] make it work. Done here: msaelices@b5e09c5 @martinvuyk we can always revisit this in the future and make it more general. |
Signed-off-by: Manuel Saelices <[email protected]>
logic Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
@@ -1212,6 +1217,65 @@ struct Int( | |||
|
|||
writer.write(self) | |||
|
|||
@staticmethod | |||
fn from_bytes[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think my comment before was ever properly addressed. I think this method should be on SIMD
, not Int
. That way the DType
makes more sense. We can think more about what it would look like on Int
later, and doing int(UInt64.from_bytes(span))
doesn't seem like too much of a hit to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed it. Done:
msaelices@776b4df
msaelices@265b648
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the change requests, but to be clear I am comfortable merging these changes on SIMD
, but not on Int
. I think the SIMD
API makes a ton of sense, but the int
API would need some more thought since taking a DType
feels wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think we should/could land the uncontroversial part (from/to_byte
on SIMD
) first and iterate. WDYT @msaelices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually the whole point of this PR was to help Python developers who need to migrate Python code using int.from_bytes() and int.to_bytes(). So, to me, it's easier if we have it on the Int
struct too, and iterate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with having it on the Int struct too because we would either:
- Break code users relied on
- Never get around to revising it and have a substandard API
I'm also not compelled by the idea that the goal is to mirror a Python API considering your earlier argument in favor of breaking away from the Python API using a parameter. As I mentioned before, I think int(UInt64.from_bytes(my_span))
is fine for now
…lar type Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
047b471
to
8caad41
Compare
Signed-off-by: Manuel Saelices <[email protected]>
…es() Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
2fcb115
to
98c29fc
Compare
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
!sync |
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
Landed in ab7bfa5! Thank you for your contribution 🎉 |
… (#53303) [External] [stdlib] Implement `Int.from_bytes()` and `Int.as_bytes()` Similar to the Python's [int.from_bytes()](https://docs.python.org/3/library/stdtypes.html#int.from_bytes) and [int.to_bytes()](https://docs.python.org/3/library/stdtypes.html#int.to_bytes) one. Co-authored-by: Manuel Saelices <[email protected]> Co-authored-by: Lukas Hermann <[email protected]> Closes #3795 MODULAR_ORIG_COMMIT_REV_ID: 1e78cc9dd643009a98c7f3de77c4c250e7c5ea3f
Similar to the Python's int.from_bytes() and int.to_bytes() one.