Skip to content

Commit

Permalink
0.7.8
Browse files Browse the repository at this point in the history
  • Loading branch information
y21 committed Jan 29, 2024
1 parent d6f0792 commit 604f644
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 22 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
Changes annotated with `` are breaking.

# 0.7.8
- Fixes a build error if compiled with the `simd` feature flag. See [y21/tl#60]
- Fixes MDN-related doc comments ([y21/tl#51])

# 0.7.7
- Fixes a bug in the query selector parser that made it fail to parse values containing `:`. See [y21/tl#46](https://github.com/y21/tl/issues/46) and [y21/tl#47] for more details.

Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tl"
version = "0.7.7"
version = "0.7.8"
authors = ["y21"]
edition = "2021"
license = "MIT"
Expand Down
22 changes: 2 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ tl is a fast HTML parser written in pure Rust. <br />
- [Usage](#usage)
- [Examples](#examples)
- [SIMD-accelerated parsing](#simd-accelerated-parsing)
- [Benchmarks](#benchmarks)

This crate (currently) does *not* strictly follow the full specification of the HTML standard, however this usually is not a problem for most use cases. This crate generally attempts to support most "sane" HTML. Not being limited by a specification allows for more optimization opportunities.
If you need a parser that can (very quickly) parse the typical HTML document and you need a simple API to work with the DOM, give this crate a try.
Expand All @@ -15,10 +14,10 @@ If you need a parser that closely follows the standard, consider using [html5eve
Add `tl` to your dependencies.
```toml
[dependencies]
tl = "0.7.7"
tl = "0.7.8"
# or, with explicit SIMD support
# (requires a nightly compiler!)
tl = { version = "0.7.7", features = ["simd"] }
tl = { version = "0.7.8", features = ["simd"] }
```

The main function is `tl::parse()`. It accepts an HTML source code string and parses it. It is important to note that tl currently silently ignores tags that are invalid, sort of like browsers do. Sometimes, this means that large chunks of the HTML document do not appear in the resulting tree.
Expand Down Expand Up @@ -98,20 +97,3 @@ assert_eq!(attributes.get("href").flatten(), Some(&"http://localhost/about".into
This crate has utility functions used by the parser which make use of SIMD (e.g. finding a specific byte by looking at the next 16 bytes at once, instead of going through the string one by one). These are disabled by default and must be enabled explicitly by passing the `simd` feature flag due to the unstable feature `portable_simd`. This requires a **nightly** compiler!

If the `simd` feature is not enabled, it will fall back to stable alternatives that don't explicitly use SIMD intrinsics, but are still decently well optimized, using techniques such as manual loop unrolling to remove boundary checks and other branches by a factor of 16, which also helps LLVM further optimize the code and potentially generate SIMD instructions by itself.

## Benchmarks
Results for parsing a ~320KB [HTML document](https://github.com/y21/rust-html-parser-benchmark/blob/c45c89871a34396d6818c73c51275241dee8ad34/data/wikipedia.html). Benchmarked using [criterion](https://crates.io/crates/criterion).

**Note:** Some HTML parsers listed closely follow the specification while others don't, which greatly impacts performance as the specification limits what can and can't be done.
Comparing the performance of a parser that doesn't follow the specification to one that does isn't fair and doesn't yield meaningful results, but it can be interesting to see what the theoretical difference is.

```notrust
time thrpt follows spec
tl¹ 629.78 us 496.65 MiB/s ❌
lol_html 788.91 us 396.47 MiB/s ✅
htmlstream 2.2786 ms 137.48 MiB/s ❌
html5ever 6.2233 ms 50.276 MiB/s ✅
```
¹ - `simd` feature enabled

[Source](https://github.com/y21/rust-html-parser-benchmark/tree/c45c89871a34396d6818c73c51275241dee8ad34)
2 changes: 1 addition & 1 deletion src/queryselector/selector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pub enum Selector<'a> {
Descendant(Box<Selector<'a>>, Box<Selector<'a>>),
/// Parent combinator: .foo > .bar
Parent(Box<Selector<'a>>, Box<Selector<'a>>),
/// Attribute: [foo]
/// Attribute: \[foo\]
Attribute(&'a [u8]),
/// Attribute with value: [foo=bar]
AttributeValue(&'a [u8], &'a [u8]),
Expand Down

0 comments on commit 604f644

Please sign in to comment.