Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Implement Int.from_bytes() and Int.as_bytes() #3795

Closed
wants to merge 31 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
fc608a7
Implement Int.from_bytes() function
msaelices Nov 12, 2024
cd91096
Add entry to the changelog about Int.from_bytes()
msaelices Nov 12, 2024
3eca741
Make the Int.from_bytes() accept an span of bytes instead of a list
msaelices Nov 12, 2024
b9f189d
[stdlib] Move the Int.from_bytes() big_endian arg to a comptime param
msaelices Nov 13, 2024
cc97ffe
[stdlib] Implement Int.as_bytes()
msaelices Nov 21, 2024
b580bc9
[stdlib] Update changelog to include Int.as_bytes()
msaelices Nov 21, 2024
569e6ff
[stdlib] Simplify is_big_endian import
msaelices Nov 22, 2024
239bba6
Optimize copying bytes into the array as we know it's a trivial type
msaelices Nov 22, 2024
21384ec
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Nov 24, 2024
bc22870
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Nov 26, 2024
63fa5b4
Infer the Span type based on DType
msaelices Nov 27, 2024
5b41e1f
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Nov 28, 2024
b5e09c5
[stdlib] Revert part of the changes here as compiler does not support…
msaelices Nov 28, 2024
7766ea3
[stdlib] Move the Bytes alias in the test closer to the scope we use it
msaelices Nov 28, 2024
30027df
[stdlib] Less usage of UnsafePointer in the Int.from_bytes|as_bytes
msaelices Nov 28, 2024
eed9463
More meaningful test function name
msaelices Nov 29, 2024
7f142a8
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Dec 4, 2024
3b93cd6
Fix issue in the previous merge conflict, as we need the Span
msaelices Dec 4, 2024
b0bc485
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Dec 16, 2024
776b4df
Move the logic to SIMD so we can call it with whatever scalar we want
msaelices Dec 16, 2024
265b648
Move the Int.as_bytes() logic to SIMD, so we can call it with any sca…
msaelices Dec 16, 2024
0e5d293
Add reference ti new SIMD methods to the changelog
msaelices Dec 16, 2024
6e144d5
Tests for SIMD.from_bytes and SIMD.as_bytes
msaelices Dec 16, 2024
3211d24
Remove uneeded import sentences in int.mojo
msaelices Dec 16, 2024
8caad41
Try to not allocating memory by using InlineArray instead of List
msaelices Dec 17, 2024
dd29814
Fix the call expansion failed issue. Adapt unit tests
msaelices Dec 18, 2024
694693a
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Dec 18, 2024
40b9fc0
Remove the raises declaration from Int.from_bytes() and SIMD.from_byt…
msaelices Dec 21, 2024
98c29fc
Remove the DType parameter from Int.from_bytes() and Int.as_bytes()
msaelices Dec 21, 2024
8db9a83
Merge branch 'nightly' into int-from-and-to-bytes
msaelices Dec 22, 2024
3af72c5
No need for the Span here
msaelices Dec 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ what we publish.

### ⭐️ New

- New `Int.from_bytes()` and `Int.as_bytes()` functions to convert a list of
bytes to an integer and vice versa, accepting the endianess as an argument.
Similar to Python `int.from_bytes()` and `int.to_bytes()` functions.

- `StringRef` is now representable so `repr(StringRef("hello"))` will return
`StringRef('hello')`.

Expand Down
66 changes: 65 additions & 1 deletion stdlib/src/builtin/int.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ These are Mojo built-ins, so you don't need to import them.
"""

from collections import KeyElement

from bit import byte_swap
from collections.string import (
_calc_initial_buffer_size_int32,
_calc_initial_buffer_size_int64,
Expand All @@ -29,8 +31,11 @@ from builtin.io import _snprintf
from memory import UnsafePointer
from python import Python, PythonObject
from python._cpython import Py_ssize_t
from memory import memcpy, UnsafePointer

from sys import is_big_endian, bitwidthof

from utils import Writable, Writer
from utils import Span, Writable, Writer
from utils._select import _select_register_value as select
from utils._visualizers import lldb_formatter_wrapping_type

Expand Down Expand Up @@ -1212,6 +1217,65 @@ struct Int(

writer.write(self)

@staticmethod
fn from_bytes[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think my comment before was ever properly addressed. I think this method should be on SIMD, not Int. That way the DType makes more sense. We can think more about what it would look like on Int later, and doing int(UInt64.from_bytes(span)) doesn't seem like too much of a hit to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed it. Done:
msaelices@776b4df
msaelices@265b648

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the change requests, but to be clear I am comfortable merging these changes on SIMD, but not on Int. I think the SIMD API makes a ton of sense, but the int API would need some more thought since taking a DType feels wrong.

Copy link
Contributor

@soraros soraros Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we should/could land the uncontroversial part (from/to_byte on SIMD) first and iterate. WDYT @msaelices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually the whole point of this PR was to help Python developers who need to migrate Python code using int.from_bytes() and int.to_bytes(). So, to me, it's easier if we have it on the Int struct too, and iterate it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with having it on the Int struct too because we would either:

  • Break code users relied on
  • Never get around to revising it and have a substandard API

I'm also not compelled by the idea that the goal is to mirror a Python API considering your earlier argument in favor of breaking away from the Python API using a parameter. As I mentioned before, I think int(UInt64.from_bytes(my_span)) is fine for now

D: DType, big_endian: Bool = False
lsh marked this conversation as resolved.
Show resolved Hide resolved
](bytes: Span[Byte]) raises -> Self:
"""Converts a byte array to an integer.

Args:
bytes: The byte array to convert.

Parameters:
D: The type of the integer.
big_endian: Whether the byte array is big-endian.

Returns:
The integer value.
"""
if D.sizeof() != len(bytes):
raise Error("Byte array size does not match the integer size.")

var ptr: UnsafePointer[Scalar[D]] = bytes.unsafe_ptr().bitcast[
Scalar[D]
]()
var value = ptr[]

@parameter
if is_big_endian() and not big_endian:
value = byte_swap(value)
elif not is_big_endian() and big_endian:
value = byte_swap(value)
return int(value)

fn as_bytes[D: DType, big_endian: Bool = False](self) -> List[Byte]:
"""Convert the integer to a byte array.

Parameters:
D: The type of the integer.
big_endian: Whether the byte array should be big-endian.

Returns:
The byte array.
"""
alias type_len = D.sizeof()
var value = Scalar[D](self)

@parameter
if is_big_endian() and not big_endian:
value = byte_swap(value)
elif not is_big_endian() and big_endian:
value = byte_swap(value)

var ptr = UnsafePointer.address_of(value)
var list = List[Byte](capacity=type_len)

# TODO: Maybe this can be a List.extend(ptr, count) method
memcpy(list.unsafe_ptr(), ptr.bitcast[Byte](), type_len)
list.size = type_len

return list^

@always_inline("nodebug")
fn __mlir_index__(self) -> __mlir_type.index:
"""Convert to index.
Expand Down
53 changes: 53 additions & 0 deletions stdlib/test/builtin/test_int.mojo
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,58 @@ def test_conversion_from_python():
assert_equal(Int.try_from_python(PythonObject(-1)), -1)


def test_from_bytes_as_bytes():
alias Bytes = List[Byte]

assert_equal(Int.from_bytes[DType.int16, big_endian=True](Bytes(0, 16)), 16)
assert_equal(
Int.from_bytes[DType.int16, big_endian=False](Bytes(0, 16)), 4096
)
assert_equal(
Int.from_bytes[DType.int16, big_endian=True](Bytes(252, 0)), -1024
)
assert_equal(
Int.from_bytes[DType.uint16, big_endian=True](Bytes(252, 0)), 64512
)
assert_equal(
Int.from_bytes[DType.int16, big_endian=False](Bytes(252, 0)), 252
)
assert_equal(
Int.from_bytes[DType.int32, big_endian=True](Bytes(0, 0, 0, 1)), 1
)
assert_equal(
Int.from_bytes[DType.int32, big_endian=False](Bytes(0, 0, 0, 1)),
16777216,
)
assert_equal(
Int.from_bytes[DType.int32, big_endian=True](Bytes(1, 0, 0, 0)),
16777216,
)
assert_equal(
Int.from_bytes[DType.int32, big_endian=True](Bytes(1, 0, 0, 1)),
16777217,
)
assert_equal(
Int.from_bytes[DType.int32, big_endian=False](Bytes(1, 0, 0, 1)),
16777217,
)
assert_equal(
Int.from_bytes[DType.int32, big_endian=True](Bytes(255, 0, 0, 0)),
-16777216,
)
for x_ref in List[Int](10, 100, -12, 0, 1, -1, 1000, -1000):
x = x_ref[]

@parameter
for b in range(2):
assert_equal(
Int.from_bytes[DType.int16, big_endian=b](
Int(x).as_bytes[DType.int16, big_endian=b]()
),
x,
)


def main():
test_properties()
test_add()
Expand All @@ -268,3 +320,4 @@ def main():
test_int_uint()
test_float_conversion()
test_conversion_from_python()
test_from_bytes_as_bytes()
Loading