-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Move StringRef
find()
implementation to Span
#3548
Open
martinvuyk
wants to merge
66
commits into
modularml:nightly
Choose a base branch
from
martinvuyk:move-strref-find-impl-to-span
base: nightly
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 16 commits
Commits
Show all changes
66 commits
Select commit
Hold shift + click to select a range
45f9bec
move StrRef find() implementation to Span
martinvuyk 070ac79
fix details
martinvuyk 9950729
fix details
martinvuyk 5a1cf5d
fix details
martinvuyk c0dac09
fix details
martinvuyk de6e116
fix details
martinvuyk d03ded0
fix details
martinvuyk a4c2a57
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 77a333f
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk f9a2eed
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 5e8fb25
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 6ca281e
fix stringref find and add fixme for later
martinvuyk 6535e8b
fix detail
martinvuyk f6f9144
fix detail
martinvuyk 964f4d5
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 828490b
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 015e5d4
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk d84530e
fix detail
martinvuyk 2f37b51
fix detail
martinvuyk f1cb5d1
fix detail
martinvuyk 702cb00
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 5a105a2
remove fixme comment
martinvuyk e4b495e
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 650d63c
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk f9a7329
fix detail
martinvuyk c7638f4
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk c29ab9d
refactor memrchr and memrmem and add parameter to use memchr and memr…
martinvuyk 21c2e7a
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk ef6a1ec
fix a bunch of stuff
martinvuyk 9898d6c
fix detail
martinvuyk a57d446
fix details
martinvuyk 5f69209
fix details
martinvuyk feee988
fix details
martinvuyk ec78800
fix details
martinvuyk 1d27970
fix details
martinvuyk de85d7e
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk d311761
fix bugs in memrchr and memrmem
martinvuyk c5be607
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 3f89314
fix details
martinvuyk 09d92b7
update changelog
martinvuyk 646bb90
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 162332a
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 22857e2
update to use pack_bits
martinvuyk f9e8fda
add overloads
martinvuyk 8cd3b56
fix detail
martinvuyk a619944
fix detail
martinvuyk 7213788
use var again
martinvuyk cfe5a5f
use var again
martinvuyk c3c0cd3
fix detail
martinvuyk 8b88b38
fix detail
martinvuyk 3fd3529
fix detail
martinvuyk e323392
fix detail
martinvuyk dca7728
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 8c29f12
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk aa791cd
fix unsafe ptr constructor
martinvuyk de191aa
Merge branch 'nightly' into move-strref-find-impl-to-span
martinvuyk 6926293
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 68c29f5
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk ae73214
mojo format
martinvuyk 74617ea
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk ef863cf
fix after merge with nightly
martinvuyk 7b43af6
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 52a0c90
try fix markdownlint
martinvuyk 1a8f348
Merge remote-tracking branch 'upstream/nightly' into move-strref-find…
martinvuyk 183a607
fix after merge
martinvuyk 53557d2
fix after merge
martinvuyk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question It seems a bit weird from an API design perspective to have this "find" function in
span
, can you help me understand why we'd want this? It feels more coupled to string algorithms, andSpan
isn't just for a view over string data.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be used if you want to find occurrences of a specific scalar value sequence. My main thought was that we'll lose the ability to do that once
StringSlice.find()
is fixed to use unicode codepoints and it introduces quite some overhead for algorithms that are faster using raw bytes. We could also make this private WDYT?Also, once PR's #3577
DType.get_dtype()
gets merged,List.index()
can also delegate to thisThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeLoser gentle ping and also to add something, since it seems like we are moving toward unifying Python's
bytes
intoSpan[UInt8]
in PR #3636, request #3634 and many other places. Python's bytes.find is actually a method, and many other things that we put inStringSlice
andStringRef
are actually operations frombytes
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great change for overall performance when manipulating slices. It opens a lot of possibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeLoser per discussion on discord,
Span[T].find
is intended to bememchr
ormemmem
as appropriate, but that the subspan of bytes must be aligned toalignof[T]()
. We can probably expand it to anyT: Trivial
once we have aTrivial
trait.The proper string version likely needs be specialized for UTF-8 comparisons. @martinvuyk and I have discussed an "AsciiString" which can more aggressively optimize for using byte-oriented intrinsics since UTF-8 is annoying to do with SIMD.