-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Fix String.split()
and start fixing String.__len__()
#2960
Conversation
Signed-off-by: martinvuyk <[email protected]>
String.split()
& String.isspace()
and start fixing String.__len__()
@laszlokindrat no idea what this error is. When calling
|
Signed-off-by: martinvuyk <[email protected]>
@martinvuyk Something (not sure where) is calling |
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
@laszlokindrat ready for review :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! I have a couple questions, but I like the direction you're going! Could you please rebase on the latest nightly?
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Hey @laszlokindrat I ended up touching a lot more places than I thought I'd have to. Changed I used a I also added some code in places where we should use unicode code point indexing and currently assume ASCII, but we still need issue #933 solved to be able to switch to that. And also the code is the first thing that came to me, not necessarily efficient. |
Thanks! Can we split up this patch then? The length related stuff will touch a lot more things internally, and it's also semantically separate. |
I honestly don't know where to even split this since fn __len__(self) -> Int:
"""Nominally returns the _length in Unicode codepoints_ (not bytes!).
Returns:
The length in Unicode codepoints.
"""
# FIXME(MSTDL-160):
# Actually perform UTF-8 decoding here to count the codepoints.
return len(self._slice) fn __len__(self) -> Int:
"""Nominally returns the _length in Unicode codepoints_ (not bytes!).
Returns:
The length in Unicode codepoints.
"""
var unicode_length = self.byte_length()
for i in range(unicode_length):
if _utf8_byte_type(self._slice[i]) == 1:
unicode_length -= 1
return unicode_length
That's what I was afraid of. For now the only breaking change is |
Signed-off-by: martinvuyk <[email protected]>
Let's start with the |
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
String.split()
& String.isspace()
and start fixing String.__len__()
String.split()
and start fixing String.__len__()
Signed-off-by: martinvuyk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Can you please add a changelog entry describing the changes?
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
…en__()` (#43119) Fix string split to take in NoneType as input and its body to use `isspace(self)` method. Fixes #2880 Leave the fix for `lstrip` and `rstrip` to use `String.isspace()` as `TODO` once llvm intrinsics can be used at comp time Added a method `String.byte_length()` but left the builtin `String.__len__()` alone for now since many other methods assume it returns byte length and it will require mayor refactoring of too many places for 1 PR. Added deprecation warning to `_byte_length()` ORIGINAL_AUTHOR=martinvuyk <[email protected]> Closes #2960 MODULAR_ORIG_COMMIT_REV_ID: 8ed4f3418bc1681d45252d2373227d65eeb31e04
Landed in cd65e1b! Thank you for your contribution 🎉 |
…en__()` (#43119) Fix string split to take in NoneType as input and its body to use `isspace(self)` method. Fixes #2880 Leave the fix for `lstrip` and `rstrip` to use `String.isspace()` as `TODO` once llvm intrinsics can be used at comp time Added a method `String.byte_length()` but left the builtin `String.__len__()` alone for now since many other methods assume it returns byte length and it will require mayor refactoring of too many places for 1 PR. Added deprecation warning to `_byte_length()` ORIGINAL_AUTHOR=martinvuyk <[email protected]> Closes #2960 MODULAR_ORIG_COMMIT_REV_ID: 8ed4f3418bc1681d45252d2373227d65eeb31e04
Fix string split to take in NoneType as input and its body to use
isspace(self)
method. Fixes #2880Leave the fix for
lstrip
andrstrip
to useString.isspace()
asTODO
once llvm intrinsics can be used at comp timeAdded a method
String.byte_length()
but left the builtinString.__len__()
alone for now since many other methods assume it returns byte length and it will require mayor refactoring of too many places for 1 PR. Added deprecation warning to_byte_length()