Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lib): add forbid_unsafe feature to disable unsafe code #413

Merged
merged 29 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
dc7b212
add default 'unsafe' feature, remove 'read_byte_unchecked'
davidkern Aug 18, 2024
ef7e5ba
feature flag 'slice_unchecked'
davidkern Aug 18, 2024
6ddaef9
feature flag remaining unsafety
davidkern Aug 18, 2024
c1ac617
fix misnamed feature flags
davidkern Aug 18, 2024
d4a6877
fix a few more feature flags
davidkern Aug 18, 2024
13cd8bc
lints and formatting
davidkern Aug 19, 2024
884d9ba
correct the auto-correct
davidkern Aug 19, 2024
42e43be
Rename feature to forbid_unsafe so unsafe code is the default
davidkern Aug 19, 2024
223e3b5
reintroduce 'read_byte_unchecked'
davidkern Aug 19, 2024
ac3d019
don't use --all-features as the default, as that removes unsafe code …
davidkern Aug 19, 2024
7cd591a
add safe path to codegen and fix regression on unsafe path
davidkern Aug 19, 2024
f97c2d4
enable 'forbid_unsafe' on logos-derive/codegen if enabled on 'logos' …
davidkern Aug 19, 2024
e2e5ed4
fmt
davidkern Aug 19, 2024
8c13adf
additionally run benchmarks against forbid_unsafe
davidkern Aug 21, 2024
b4a2ceb
add --features forbid_unsafe to the new benchmarks
davidkern Aug 21, 2024
40d2cab
handle case of base branch not supporting forbid_unsafe feature
davidkern Aug 21, 2024
0997c34
include comment if comparing against defaults instead of before
davidkern Aug 21, 2024
87ed40f
troubleshoot why the two benchmark runs seem to be the same
davidkern Aug 21, 2024
c888fc1
try renaming the baselines
davidkern Aug 21, 2024
c169614
seems baseline names may only include underscores
davidkern Aug 21, 2024
fc950bf
and display the correct results for forbid_unsafe
davidkern Aug 21, 2024
af82cc9
test default features and forbid unsafe in test matrix
davidkern Aug 21, 2024
a446f68
add forbid_unsafe to integration test
davidkern Aug 21, 2024
3a44e83
Feature `forbid_unsafe` tests and benchmarks
davidkern Aug 21, 2024
d2ff9f5
add safety note to source trait
davidkern Sep 9, 2024
0242426
add safety note to source trait
davidkern Sep 9, 2024
7fcd8c6
wordsmithing
davidkern Sep 11, 2024
35a24bf
include note on feature unification
davidkern Sep 11, 2024
45a3b68
fmt
davidkern Sep 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
+ [Using callbacks](./callbacks.md)
+ [Common regular expressions](./common-regex.md)
+ [Debugging](./debugging.md)
+ [Unsafe Code](./unsafe.md)
+ [Examples](./examples.md)
+ [Brainfuck interpreter](./examples/brainfuck.md)
+ [JSON parser](./examples/json.md)
Expand Down
32 changes: 32 additions & 0 deletions book/src/unsafe.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Unsafe Code

By default, **Logos** uses unsafe code to avoid unnecessary bounds checks while
accessing slices of the input `Source`.

This unsafe code also exists in the code generated by the `Logos` derive macro,
which generates a deterministic finite automata (DFA). Reasoning about the correctness
of this generated code can be difficult - if the derivation of the DFA in `Logos`
is correct, then this generated code will be correct and any mistakes in implementation
would be caught given sufficient fuzz testing.

Use of unsafe code is the default as this typically provides the fastest parser.

## Disabling Unsafe Code

However, for applications accepting untrusted input in a trusted context, this
may not be a sufficient correctness justification.

For those applications which cannot tolerate unsafe code, the feature `forbid-unsafe`
may be enabled. This replaces unchecked accesses in the `Logos` crate with safe,
checked alternatives which will panic on out-of-bounds access rather than cause
undefined behavior. Additionally, code generated by the macro will not use the
unsafe keyword, so generated code may be used in a crates using the
`#![forbid(unsafe_code)]` attribute.

Disabling unsafe code will generally result in a slower parser overall, though rarely
may end up slightly faster.
jeertmans marked this conversation as resolved.
Show resolved Hide resolved

There are too many variables to consider between compiler optimizations, the specific
grammar being parsed, and the target processor to make definitive statements about
performance of safe-only code. The automated benchmarks of this crate shows around a
10% slowdown in safe-only code as of the time of this writing.
jeertmans marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 5 additions & 0 deletions src/source.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ use core::ops::{Deref, Range};
/// Most notably this is implemented for `&str`. It is unlikely you will
/// ever want to use this Trait yourself, unless implementing a new `Source`
/// the `Lexer` can use.
///
/// SAFETY: Unless the unsafe functions of this trait are disabled with the `forbid_unsafe`
/// feature, the correctness of the unsafe functions of this trait depend on the correct
/// implementation of the `len` and `find_boundary` functions so generated code does not request
/// out-of-bounds access.
#[allow(clippy::len_without_is_empty)]
pub trait Source {
/// A type this `Source` can be sliced into.
Expand Down