Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate switching to Masterminds HTML5-PHP parser as opposed to built-in PHP DOM extension #3293

Closed
westonruter opened this issue Sep 18, 2019 · 1 comment
Labels
Enhancement New feature or improvement of an existing one Groomed Needs More Info Follow-up required in order to be actionable. P2 Low priority WS:Core Work stream for Plugin core

Comments

@westonruter
Copy link
Member

The AMP plugin uses PHP's built-in DOM extension which are powered by libxml. There are various idiosyncrasies with libxml which force us to do various workarounds in AMP_DOM_Utils, namely:

  1. Convert AMP-bind bracketed attribute syntax to data-amp-bind-* syntax using regular expression replacements, since PHP DOM doesn't understand the brackets. (The replacements should still be done, but preferably not via regex!)
  2. Force all self-closing tags to have closing tags since DOMDocument isn't fully aware of them.
  3. Replace noscript elements with placeholders since libxml<2.8 can parse them incorrectly.
  4. Add a pre-HTML5-style declaration of the encoding since libxml doesn't always recognize HTML5's meta charset.

We may be able to remove these workarounds and improve our parsing routines by switching to the Masterminds HTML5-PHP parser. This was actually chosen for the Drupal AMP module's underlying AMP PHP Library (which is no longer maintained, cf. #2315). This HTML5 parser did not exist at the time the AMP plugin was first created, do it made sense to use PHP's DOM extension. But now a better choice may be the Masterminds library. This would also reduce a dependency.

Something that makes me wary, however, is whether the Masterminds library would be slower than the PHP extension. This is something to be investigated.

@westonruter westonruter added the Enhancement New feature or improvement of an existing one label Sep 18, 2019
@swissspidy
Copy link
Collaborator

Something that makes me wary, however, is whether the Masterminds library would be slower than the PHP extension. This is something to be investigated.

A bit surprised that this seemingly popular library doesn't address this by having a small comparison table somewhere or at least a paragraph about it.

@kmyram kmyram added this to the v2.0 milestone Feb 18, 2020
@kmyram kmyram added the P2 Low priority label Feb 18, 2020
@amedina amedina removed this from the v2.0 milestone Mar 31, 2020
@kmyram kmyram added the WS:Core Work stream for Plugin core label Aug 5, 2020
@kmyram kmyram added Groomed Needs More Info Follow-up required in order to be actionable. labels Nov 24, 2020
@github-project-automation github-project-automation bot moved this to Backlog in Ongoing Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or improvement of an existing one Groomed Needs More Info Follow-up required in order to be actionable. P2 Low priority WS:Core Work stream for Plugin core
Projects
Archived in project
Development

No branches or pull requests

4 participants