Skip to content

Commit

Permalink
HTML API: Add custom text decoder.
Browse files Browse the repository at this point in the history
Provide a custom decoder for strings coming from HTML attributes and
markup. This custom decoder is necessary because of deficiencies in
PHP's `html_entity_decode()` function:

 - It isn't aware of 720 of the possible named character references in
   HTML, leaving many out that should be translated.

 - It isn't able to decode character references in data segments where
   the final semicolon is missing, or when there are ambiguous
   characters after the reference name but before the semicolon.
   This one is complicated: refer to the HTML5 specification to clarify.

This decoder will also provide some conveniences, such as making a
single-pass and interruptable decode operation possible. This will
provide a number of opportunities to optimize detection and decoding
of things like value prefixes, and whether a value contains a given
substring.
  • Loading branch information
dmsnell committed May 15, 2024
1 parent 6f7cd05 commit 3ff78cc
Show file tree
Hide file tree
Showing 4 changed files with 465 additions and 46 deletions.
Loading

0 comments on commit 3ff78cc

Please sign in to comment.