Skip to content

Commit

Permalink
HTML API: Add custom text decoder.
Browse files Browse the repository at this point in the history
Provide a custom decoder for strings coming from HTML attributes and
markup. This custom decoder is necessary because of deficiencies in
PHP's `html_entity_decode()` function:

 - It isn't aware of 720 of the possible named character references in
   HTML, leaving many out that should be translated.

 - It isn't able to decode character references in data segments where
   the final semicolon is missing, or when there are ambiguous
   characters after the reference name but before the semicolon.
   This one is complicated: refer to the HTML5 specification to clarify.

This decoder will also provide some conveniences, such as making a
single-pass and interruptable decode operation possible. This will
provide a number of opportunities to optimize detection and decoding
of things like value prefixes, and whether a value contains a given
substring.
  • Loading branch information
dmsnell committed Apr 29, 2024
1 parent 2074392 commit ce888a0
Show file tree
Hide file tree
Showing 6 changed files with 2,324 additions and 45 deletions.
Loading

0 comments on commit ce888a0

Please sign in to comment.