-
Notifications
You must be signed in to change notification settings - Fork 94
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: add PortId::from_bytes and rework identifier validation
The main motivating factor for this change is scenario in which one has a slice of bytes and wants to parse it as PartId. Since PortId implements FromStr and has a new method which takes a String such slice must first be converted into a string. A parsing code might look something like: fn port_id_from_bytes(bytes: &[u8]) -> Result<PortId, Error> { let id = core::str::from_utf8(bytes)?; let id = PortId::from_str(id)?; Ok(id) } However, notice that in this situation the bytes are validated twice (in fact three times, see below). First `from_utf8` has to go through it to check if the identifier is valid UTF-8; and then `from_str` has to go through the bytes again. This by itself is wasteful but what’s even worse is that Unicode strings are not valid identifiers so the logic of checking if bytes are valid UTF-8 is unnecessary. With PortId::from_bytes, the code checks whether the bytes includes any invalid characters. If it doesn’t, than it knows the entire slice is all ASCII bytes and thus can be converted to a string. To handle error cases, introduce Error::InvalidUtf8 error which is used in the bytes aren’t valid UTF-8. ---- With this change, validate_identifier_chars now works on slice of bytes rather than on a str. This by itself is probably an optimisation since iterating over bytes is easier than over characters of a string. Since Unicode characters aren’t valid parts of identifiers this doesn’t affect observable behaviour of the code. ---- Furthermore, this change also refactors the validation code. Specifically, the old identifier validation code contained the following: if id.contains(PATH_SEPARATOR) { return Err(Error::ContainSeparator { id: id.into() }); } if !id.chars().all(|c| /* ... */) { return Err(Error::InvalidCharacter { id: id.into() }); } This means that all identifiers had to be scanned twice. First to look for a slash and then to check if all characters are valid. After all the refactoring the code is now equivalence of: if !id.bytes().all(|c| /* ... */) { if id.as_bytes().contains(PATH_SEPARATOR) { return Err(Error::ContainSeparator { id: id.into() }); } else { return Err(Error::InvalidCharacter { id: id.into() }); } } With this, correct identifiers are scanned only once.
- Loading branch information
Showing
3 changed files
with
162 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters