-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Move StringRef
find()
implementation to Span
#3548
base: nightly
Are you sure you want to change the base?
[stdlib] Move StringRef
find()
implementation to Span
#3548
Conversation
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
StrRef
find()
implementation to Span
StringRef
find()
implementation to Span
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
stdlib/src/utils/span.mojo
Outdated
@@ -335,3 +338,194 @@ struct Span[ | |||
return Span[T, _lit_mut_cast[lifetime, False].result]( | |||
unsafe_ptr=self._data, len=self._len | |||
) | |||
|
|||
fn find[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question It seems a bit weird from an API design perspective to have this "find" function in span
, can you help me understand why we'd want this? It feels more coupled to string algorithms, and Span
isn't just for a view over string data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be used if you want to find occurrences of a specific scalar value sequence. My main thought was that we'll lose the ability to do that once StringSlice.find()
is fixed to use unicode codepoints and it introduces quite some overhead for algorithms that are faster using raw bytes. We could also make this private WDYT?
Also, once PR's #3577 DType.get_dtype()
gets merged, List.index()
can also delegate to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeLoser gentle ping and also to add something, since it seems like we are moving toward unifying Python's bytes
into Span[UInt8]
in PR #3636, request #3634 and many other places. Python's bytes.find is actually a method, and many other things that we put in StringSlice
and StringRef
are actually operations from bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great change for overall performance when manipulating slices. It opens a lot of possibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeLoser per discussion on discord, Span[T].find
is intended to be memchr
or memmem
as appropriate, but that the subspan of bytes must be aligned to alignof[T]()
. We can probably expand it to any T: Trivial
once we have a Trivial
trait.
The proper string version likely needs be specialized for UTF-8 comparisons. @martinvuyk and I have discussed an "AsciiString" which can more aggressively optimize for using byte-oriented intrinsics since UTF-8 is annoying to do with SIMD.
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
@JoeLoser would it be possible to land this before the next stable release? I'd like to use this to be able to bypass string's
I would like to make the change as soon as possible because it will break a lot of projects, and many more are coming because the community is growing. |
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Move
StringRef
find()
implementation toSpan