Add HTML schema for separator block #914

nylen · 2017-05-26T16:58:22Z

This PR is an initial step towards handling editor parsing as described in #391. It adds the concept of a "block schema" with which a block can define the markup shape(s) it understands how to handle.

This functionality is eventually intended to replace most of the attributes and save callbacks for static blocks (blocks that do not need to fetch data from external sources other than post_content; as described in #391, these blocks should store their data inside semantically meaningful markup that does not change from render to render).

Goals of this PR

Here is what I hope to achieve in this initial PR:

The core/separator block defines a very simple schema (a single <hr /> element). This schema resembles the HTML markup that it matches.
If an HTML tag in between block delimiters matches a schema defined by a block, then this tag will be "upgraded" to the corresponding block type.
If a block defines a schema, then it will be used to validate the HTML content that appears inside block delimiters for that block type. If the schema does not validate against the HTML content, then the block will be "downgraded" to the core/freeform block.
Improve the build process of the phs library - this should contain a minimum of webpack-generated code.
Improve the way JSX element creation is handled for Schema tags - we shouldn't have to import the createSchemaElement variable and then tell eslint to ignore it.

How it works

This PR introduces two new libraries. First, babel-plugin-transform-jsx-flexible, inspired by this article about alternative uses of JSX syntax:

This is a drop-in replacement for babel-plugin-transform-react-jsx, with the additional feature that multiple JSX handler functions can be used in the same file.

Second, phs (Parameterized HTML Schemas):

This is a library that can be used to describe and validate chunks of HTML. It is inspired by RELAX NG, but with a few very different goals [accept placeholders for parameters; be more concise than RELAX NG; and resemble the HTML it describes].

Combining these two libraries, this PR uses a new JSX handler function (separate from React) to create a very simple schema that validates a single <hr /> tag. The schema is comprised of a single <hr /> element. As described in the phs readme, this is a shorthand syntax for <Element name="hr" /> with the benefit of resembling the HTML it describes.

Finally, this PR uses the newly added schema functionality to perform the tasks described above in "Goals of this PR".

I expect to add a lot more features to these schemas in following PRs, but as noted in the phs readme, the schema library currently contains the bare minimum that is at all useful.

(JSX is not strictly necessary for the task of creating schemas; however, I have not yet created an alternative, and I think this is a very logical and intuitive way to express criteria for validating trees of HTML content, much like RELAX NG but simpler to use for common cases.)

Goals for future PRs

A declarative, schema-based approach for parsing and serialization where the schemas also contain placeholders for serialization parameters has the potential to solve a lot of problems at once. In particular:

Define where block parameters are expected to appear in the markup, including their types and allowed values.
Store information about which branch of the schema was matched, and replace the save callback for blocks that are simple enough.
Using this information about how the schema was matched, preserve existing markup attributes like user-added CSS classes across deserialization and serialization.
Drop many of the block delimiters entirely. (Theoretically, we could now remove the block delimiters for the core/separator blocks, but this requires further discussion and is out of scope for this PR).

Caveats / Questions

This PR uses the TinyMCE-based parser because it returns a full tree of HTML tags along with the block delimiters. This is important for the functionality in this PR and for other reasons, such as detecting and recovering from invalid markup, and planning for future features like shortcode support. See also discussion at #844 (comment).

I would like to remove the TinyMCE parser, but first we need to enhance our grammar-based parser to return a full tree of HTML content like wp-post-grammar. For now, using the existing parser seems the simplest way forward.

As noted in several TODO comments in the new code introduced here, there will be some algorithmic complexity related to matching schemas against parsed HTML due to the large number of combinations. The phs schema library will also likely become pretty complicated and I will need some help developing it. For now, though, it is fairly simple and has excellent test coverage.

I am not sure the best way to accomplish the last two tasks in this PR (better package phs for consumption by another webpack build; improve the way JSX element creation is handled for Schema tags). Any tips there?

- Enable multiple JSX tags per file - see: https://github.com/nylen/babel-plugin-transform-jsx-flexible - Add the `phs` library - https://github.com/nylen/phs - Make the TinyMCE parser try content between blocks against known schemas, and initialize the corresponding block if possible

These should be upgraded to `` blocks.

nylen · 2017-05-26T17:49:01Z

Commit b732903 (add failing test fixtures with <hr> tags in between blocks) happened before commit c1455a2 (upgrade content in between blocks using schemas). I am not sure why GitHub is switching them around.

youknowriad · 2017-05-27T09:17:47Z

I've been thinking about this too. And I think we can't decide whether this is a good approach (simple enough to understand for block authors and addresses all our needs) unless we implement the complex use-cases. Do you have an idea, what the schema would look like for the image for example?

I'm personally afraid the JSX here will add more confusion than it solves, it's not easy to figure out that these are not React Elements when in the same file we use JSX for React Elements. What if someone, somewhere else in the code creates a Schema component?

I've been thinking we could write something like for the image schema (example).

const schema = s( 'figure', {}, [
	s( 'img', { placeholders: { src: attr( 'src' ), alt: attr( 'alt' ) } } ),
	s( 'figcaption', { placeholders: { content: children() } } ),
] );

This uses hyperscript notation to avoid the confusion with JSX, it's still concise enough IMO and I think it kind of use simpler hpq handlers (dropping the query handler).

nylen · 2017-05-27T21:55:50Z

Do you have an idea, what the schema would look like for the image for example?

I think it would look a lot like #391 (comment). Everything Andrew described there is possible, and I think it will work, but the implementation will get pretty complicated. So, before we spend a lot of time doing that, how can we decide whether this is something we want to do?

Worth noting that I looked for an existing solution for this, but didn't find anything that could handle what we need, particularly the parameter placeholders part.

I'm personally afraid the JSX here will add more confusion than it solves, it's not easy to figure out that these are not React Elements when in the same file we use JSX for React Elements.

I don't think this matters as long as we document what this is and how it works, provide good error handling, etc. I still think JSX is the way to go here, because we're going to end up with a lot of nesting, and for this task, tags are much clearer than lots of } ).

What if someone, somewhere else in the code creates a Schema component?

We can use a more specific name, for example <ParameterizedHTMLSchema>, or add a namespace, like <phs.Schema>. The latter is probably a better solution, though it will require some changes to the Babel plugin (nylen/babel-plugin-transform-jsx-flexible#2).

Though we can certainly try a non-JSX syntax without too much trouble. It needs to be a bit more sophisticated, because there will be a lot more structures than simply s( 'elementname' ).

aduth · 2017-06-01T14:54:59Z

I want to revisit this after I've had some time to digest and weigh the direction. In the meantime I've left some commentary on my internal state of conflict at #886 (comment) where similar suggestions for server-side schemas have surfaced.

aduth · 2017-06-01T18:09:48Z

The schema is comprised of a single <hr /> element. As described in the phs readme, this is a shorthand syntax for <Element name="hr" /> with the benefit of resembling the HTML it describes.

How do you envision the shorthand looking when matching attributes of a given element? Like this?

<Schema>
	<img>
		<Attribute name="src" as="url" />
	</img>
</Schema>

nylen · 2017-06-01T22:48:59Z

How do you envision the shorthand looking when matching attributes of a given element? Like this?

Like that, or possibly:

<Schema>
	<img src={ <Param name="url" /> } />
</Schema>

nylen · 2017-06-01T23:01:26Z

Though maybe what this says (and you touched on this point in #391 (comment)) is that the shorthand isn't such a good idea after all.

nylen · 2017-08-04T18:51:09Z

This is out of date, and subsequent PRs like #1929 and #2210 have achieved most of what I wanted to do here in a different way.

nylen added 5 commits May 25, 2017 23:21

Add test fixtures with <hr> tags in between blocks

b732903

These should be upgraded to `` blocks.

Add a schema to the core/separator block

c1455a2

Add test fixtures for core/separator blocks containing invalid markup

5c7c67e

Validate content inside block delimiters against block schemas

d203323

nylen added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label May 26, 2017

nylen requested review from youknowriad, mtias and aduth May 26, 2017 16:58

nylen mentioned this pull request May 26, 2017

Deleting a closing block comment in "Text" mode #844

Closed

mtias mentioned this pull request May 29, 2017

Attempting to load Redux devtools freezes the page #919

Closed

nylen mentioned this pull request May 31, 2017

Parser: Revert to PEG parser by default #947

Merged

aduth mentioned this pull request Jun 1, 2017

Plugin: PHP wrapper to add new blocks #886

Closed

aduth mentioned this pull request Jun 1, 2017

Block parsing and serialization/deserialization #391

Closed

aduth mentioned this pull request Jun 12, 2017

Gallery: Add simple columns slider #1135

Merged

nylen closed this Aug 4, 2017

nylen deleted the add/html-schemas branch August 4, 2017 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HTML schema for separator block #914

Add HTML schema for separator block #914

nylen commented May 26, 2017

nylen commented May 26, 2017

youknowriad commented May 27, 2017

nylen commented May 27, 2017

aduth commented Jun 1, 2017

aduth commented Jun 1, 2017

nylen commented Jun 1, 2017

nylen commented Jun 1, 2017

nylen commented Aug 4, 2017

Add HTML schema for separator block #914

Add HTML schema for separator block #914

Conversation

nylen commented May 26, 2017

Goals of this PR

How it works

Goals for future PRs

Caveats / Questions

nylen commented May 26, 2017

youknowriad commented May 27, 2017

nylen commented May 27, 2017

aduth commented Jun 1, 2017

aduth commented Jun 1, 2017

nylen commented Jun 1, 2017

nylen commented Jun 1, 2017

nylen commented Aug 4, 2017