Skip to content

Commit

Permalink
HTML API: Make non-body fragment creation methods private.
Browse files Browse the repository at this point in the history
The current implementation of `create_fragment` (and the underlying `create_fragment_at_current_node`) allows passing in a context that might result in a tree that cannot be represented by HTML. For example, a user might use `<p>` as context, and attempt to create a fragment that also consists of a paragraph element, `<p>like this`. This would result in a paragraph node nested inside another -- something that can never result from parsing HTML.

To prevent this, this changeset makes `create_fragment_at_current_node` private and limits `create_fragment` to only `<body>` as context, while a comprehensive solution to allow other contexts is being worked on.

Follow-up to [59444], [59467].
Props jonsurrell, dmsnell, bernhard-reiter.
Fixes #62584.

git-svn-id: https://develop.svn.wordpress.org/trunk@59469 602fd350-edb4-49c9-b593-d223f7449a82
  • Loading branch information
ockham committed Nov 28, 2024
1 parent 4b34369 commit e99d839
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 264 deletions.
38 changes: 9 additions & 29 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -279,44 +279,24 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor {
* form is provided because a context element may have attributes that
* impact the parse, such as with a SCRIPT tag and its `type` attribute.
*
* Example:
*
* // Usually, snippets of HTML ought to be processed in the default `<body>` context:
* $processor = WP_HTML_Processor::create_fragment( '<p>Hi</p>' );
*
* // Some fragments should be processed in the correct context like this SVG:
* $processor = WP_HTML_Processor::create_fragment( '<rect width="10" height="10" />', '<svg>' );
*
* // This fragment with TD tags should be processed in a TR context:
* $processor = WP_HTML_Processor::create_fragment(
* '<td>1<td>2<td>3',
* '<table><tbody><tr>'
* );
*
* In order to create a fragment processor at the correct location, the
* provided fragment will be processed as part of a full HTML document.
* The processor will search for the last opener tag in the document and
* create a fragment processor at that location. The document will be
* forced into "no-quirks" mode by including the HTML5 doctype.
*
* For advanced usage and precise control over the context element, use
* `WP_HTML_Processor::create_full_processor()` and
* `WP_HTML_Processor::create_fragment_at_current_node()`.
* ## Current HTML Support
*
* UTF-8 is the only allowed encoding. If working with a document that
* isn't UTF-8, first convert the document to UTF-8, then pass in the
* converted HTML.
* - The only supported context is `<body>`, which is the default value.
* - The only supported document encoding is `UTF-8`, which is the default value.
*
* @since 6.4.0
* @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances.
* @since 6.8.0 Can create fragments with any context element.
*
* @param string $html Input HTML fragment to process.
* @param string $context Context element for the fragment. Defaults to `<body>`.
* @param string $context Context element for the fragment, must be default of `<body>`.
* @param string $encoding Text encoding of the document; must be default of 'UTF-8'.
* @return static|null The created processor if successful, otherwise null.
*/
public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
return null;
}

$context_processor = static::create_full_parser( "<!DOCTYPE html>{$context}", $encoding );
if ( null === $context_processor ) {
return null;
Expand Down Expand Up @@ -475,7 +455,7 @@ function ( WP_HTML_Token $token ): void {
* @param string $html Input HTML fragment to process.
* @return static|null The created processor if successful, otherwise null.
*/
public function create_fragment_at_current_node( string $html ) {
private function create_fragment_at_current_node( string $html ) {
if ( $this->get_token_type() !== '#tag' || $this->is_tag_closer() ) {
_doing_it_wrong(
__METHOD__,
Expand Down
178 changes: 0 additions & 178 deletions tests/phpunit/tests/html-api/wpHtmlProcessorFragmentParsing.php

This file was deleted.

66 changes: 9 additions & 57 deletions tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,10 @@ public function data_external_html5lib_tests() {
* @return bool True if the test case should be skipped. False otherwise.
*/
private static function should_skip_test( ?string $test_context_element, string $test_name ): bool {
if ( null !== $test_context_element && 'body' !== $test_context_element ) {
return true;
}

if ( array_key_exists( $test_name, self::SKIP_TESTS ) ) {
return true;
}
Expand All @@ -153,63 +157,11 @@ private static function should_skip_test( ?string $test_context_element, string
* @return string|null Tree structure of parsed HTML, if supported, else null.
*/
private static function build_tree_representation( ?string $fragment_context, string $html ) {
if ( $fragment_context ) {
/*
* If the string of characters starts with "svg ", the context
* element is in the SVG namespace and the substring after
* "svg " is the local name. If the string of characters starts
* with "math ", the context element is in the MathML namespace
* and the substring after "math " is the local name.
* Otherwise, the context element is in the HTML namespace and
* the string is the local name.
*/
if ( str_starts_with( $fragment_context, 'svg ' ) ) {
$tag_name = substr( $fragment_context, 4 );
if ( 'svg' === $tag_name ) {
$fragment_context_html = '<svg>';
} else {
$fragment_context_html = "<svg><{$tag_name}>";
}
} elseif ( str_starts_with( $fragment_context, 'math ' ) ) {
$tag_name = substr( $fragment_context, 5 );
if ( 'math' === $tag_name ) {
$fragment_context_html = '<math>';
} else {
$fragment_context_html = "<math><{$tag_name}>";
}
} else {
// Tags that only appear in tables need a special case.
if ( in_array(
$fragment_context,
array(
'caption',
'col',
'colgroup',
'tbody',
'td',
'tfoot',
'th',
'thead',
'tr',
),
true
) ) {
$fragment_context_html = "<table><{$fragment_context}>";
} else {
$fragment_context_html = "<{$fragment_context}>";
}
}

$processor = WP_HTML_Processor::create_fragment( $html, $fragment_context_html );

if ( null === $processor ) {
throw new WP_HTML_Unsupported_Exception( "Could not create a parser with the given fragment context: {$fragment_context}.", '', 0, '', array(), array() );
}
} else {
$processor = WP_HTML_Processor::create_full_parser( $html );
if ( null === $processor ) {
throw new Exception( 'Could not create a full parser.' );
}
$processor = $fragment_context
? WP_HTML_Processor::create_fragment( $html, "<{$fragment_context}>" )
: WP_HTML_Processor::create_full_parser( $html );
if ( null === $processor ) {
throw new WP_HTML_Unsupported_Exception( "Could not create a parser with the given fragment context: {$fragment_context}.", '', 0, '', array(), array() );
}

$output = '';
Expand Down

0 comments on commit e99d839

Please sign in to comment.