Editable: Treat value as HTML string, eliminate children source #2750

aduth · 2017-09-20T13:07:39Z

Previously: #421
Related: #2463, #2418

The children source matcher was introduced as a means of extracting and representing the value of rich text in content. It was not expected that a developer should need to interact with it, thus we were comfortable with it being an "opaque" object shape: in the original implementation, an array of React elements. Primarily, this helped avoid the need for dangerouslySetInnerHTML when implementing the save representation of a block, allowing the implementer to instead use the attribute value directly.

Example:

// Before:
<div dangerouslySetInnerHTML={ { __html: attributes.content } } />

// After
<div>{ attributes.content }</div>

There were a few downsides to this approach however:

When it was the case that the value would need to be manipulated, it was difficult to work with ([1], [2], [3])
It required us to convert to an element structure in both parsing and Editable content getters. In the latter case, the overhead in converting the DOM structure deters us from being more aggressive/immediate with onChange callbacks.
The presence of a separate children source increases knowledge necessary to become productive with blocks, as block implementers now need to understand difference between: text, html, children, node

Proposal:

To address the original need to avoid dangerouslySetInnerHTML, we could instead create a separate component which abstracts this implementation detail. Example:

<Editable.Value>{ attributes.content }</Editable.Value>

In this case, the underlying implementation may be to merely assign dangerouslySetInnerHTML, but it is not surfaced to the block implementer, thereby avoiding some verbosity and worries around its application.

This could then allow us to eliminate or alter the value of rich text to that of our choosing. In #2463 (more accurately the update/children-value branch), I tried an array structure.

After further thought, if possible, it would be desirable to eliminate the separate children source altogether, if it is possible to represent the value in raw HTML (leverage html source instead). This has the added benefit of requiring no post-processing in the Editable change handlers, instead merely passing the value of this.editor.getBody().innerHTML (or this.editor.getContent( { format: 'raw' } )). Finally, as a string, it can be easily concatenated, but may be difficult to work with if further manipulation of the DOM structure is desirable.

Open questions:

The default behavior for TinyMCE's getContent is to run a serializer over the body contents. We should consult with the Ephox team to determine what of this serialization behavior we might need, whether it can be avoided, or if there is a performance concern in frequently calling to the non-raw format content getter (particularly given that we can expect single TinyMCE instances not to be very large or complex documents)
How much are we opening the potential for self-XSS? The value of an HTML string should only ever be derived from either the saved post content, or TinyMCE, where we should like to assume that the content is already sanitized.
Where are we manipulating the value of an Editable currently, and in those cases would representing the value as a simple string be problematic? I might imagine that iterating over or extracting from the DOM structure would become more difficult, albeit not impossible (assigning into throwaway DOM wrapper, for instance).

The text was updated successfully, but these errors were encountered:

aduth · 2017-09-20T15:53:57Z

Related Slack discussion in #core-tinymce: https://wordpress.slack.com/archives/C0UCMQP0F/p1505913418000264

ellatrix · 2017-09-22T08:34:50Z

This has the added benefit of requiring no post-processing in the Editable change handlers, instead merely passing the value of this.editor.getBody().innerHTML (or this.editor.getContent( { format: 'raw' } )). Finally, as a string, it can be easily concatenated, but may be difficult to work with if further manipulation of the DOM structure is desirable.

We will still require some post processing, which is currently handled in the nodeToReact loop (cheap).

It required us to convert to an element structure in both parsing and Editable content getters. In the latter case, the overhead in converting the DOM structure deters us from being more aggressive/immediate with onChange callbacks.

I'm not sure I understand this concern. Could you elaborate? This is a simple loop that looks what each DOM node is and makes an object out of it, so it's a cheap call.

This could then allow us to eliminate or alter the value of rich text to that of our choosing. In #2463 (more accurately the update/children-value branch), I tried an array structure.

In this case (which I like much more than passing HTML), we'd retain the same logic, but instead of nodeToReact, we'll have something like nodeToArrayStructure.

<Editable.Value>{ attributes.content }</Editable.Value>

I like this idea in general. Would also be great for assigning keys so the block implementor does not need to concern themselves with that.

aduth · 2017-09-22T14:01:50Z

We will still require some post processing, which is currently handled in the nodeToReact loop (cheap).

Yep, we're on the same page now. This was discussed in the Slack conversation noted above, though not clearly referenced here.

I'm not sure I understand this concern. Could you elaborate? This is a simple loop that looks what each DOM node is and makes an object out of it, so it's a cheap call.

It's a concern as the document becomes more complex. In a single TinyMCE instance containing thousands of nodes, for example, this loop occurring on each keystroke could become noticeable for slower machines. In our case, I don't think it's as much of a problem, though of course the .innerHTML would still be obviously faster if the editor body were kept in a normalized/clean state at all times (it's not).

In this case (which I like much more than passing HTML), we'd retain the same logic, but instead of nodeToReact, we'll have something like nodeToArrayStructure.

A main desire here is to eliminate children, because I think it will cause confusion on how it differentiates from html (text, node, etc). I noted that this could make transforming the value more difficult, but (a) we've always wanted to discourage this and (b) there are still workarounds even if we do end up treating it as plain HTML. Do you specific reasons in mind for why the array structure would be a better fit?

ellatrix · 2017-09-22T14:15:38Z

Do you specific reasons in mind for why the array structure would be a better fit?

If we end of keeping a local or global state, it would also be easier for Editable to make use of it, mostly together with focus state (assuming we'll also have a focus/selection state).
I don't see how a focus/selection state would work with a content state in HTML...

aduth · 2017-09-22T18:25:48Z

If we end of keeping a local or global state, it would also be easier for Editable to make use of it

I think it could still be reasonable (or at least technically possible) for Editable to convert the string to its own internal representation of state. The more pertinent question to me is from the perspective of a plugin author implementing a block, should I be exposed to this separate content shape? Is it at all useful outside representing the value of Editable?

ellatrix · 2017-09-22T18:31:50Z

As a global state, I think so, yes. Things that come to mind are footnotes, (future) collaborative editing, maybe some more advanced things like spell checking and suggestions (not even unrealistic as @Yoast has been looking into that).

ellatrix · 2017-09-22T18:49:46Z

And if we are doing this, the block implementor will be familiar with the shape? :) It we're still making it agnostic that is...

youknowriad · 2017-09-22T18:56:27Z

leveraging hpq matchers in some advanced usecases is an "ok" tradeoff for me. It allows us to keep the simplicity of a string value, it fits well in most use-cases.

ellatrix · 2017-09-22T19:09:28Z

It allows us to keep the simplicity of a string value

Atm, we don't have a string value anywhere? As for hpq, if we have an array value for the rest, we could use this for the matchers/source as well, where the content is extracted form the arrays (maybe with some helper functions for easy use). With that, it would even all be unified under one structure. State (local and global), matchers/source and template agnostic framework.

youknowriad · 2017-09-22T19:21:50Z

Atm, we don't have a string value anywhere?

For now, we have react elements and no string value

if we have an array value for the rest, we could use this for the matchers/source as well

So, with an array value, we'll still need helpers (matchers) to manipulate this data. Granted these helpers may have a simpler implementation, but on the other side, with a string value no need for serializing/parsing HTML, innerHTML and editor.setContent are self-sufficient (simpler and better performance).

matchers/source and template agnostic framework.

template agnostic framework, means custom elements, so creating a new standard, I'm ok on relying on an existing approach familiar to a lot of developpers aka JSX or h (which are the exact same thing)

So, IMO, the only advantage of an array structure is having a simpler helpers/matchers implementation. Is this worth-it? I don't know.

we tried the React elements as a format, we found limits
IMO we should try the simplest alternative, a string value and see if it works well

ellatrix · 2017-09-22T19:28:32Z

~~editor.setContent~~ editor.getContent will have worse performance than element.childNodes with a loop to array.

So, IMO, the only advantage of an array structure is having a simpler helpers/matchers implementation. Is this worth-it? I don't know.

I didn't say this at all, the matchers only came up later, and it's extra goodness of the array structure. See #2750 (comment) and #2750 (comment).

youknowriad · 2017-09-22T19:31:41Z

editor.setContent is rarely used: at initialization and when the content changes externally (not modified in TinyMCE) which basically never happens (aside maybe splitting blocks)

ellatrix · 2017-09-22T19:32:54Z

Sorry, I meant editor.getContent.

ellatrix · 2017-09-22T19:37:11Z

I guess I should make a PR to make things clearer. @aduth did some interesting work here https://github.com/WordPress/gutenberg/pull/2463/files.

youknowriad · 2017-09-22T19:37:29Z

what about editor.body.innerHTML and nothing stops us from looping and concatening to do some cleanup, so same performance.

Not saying the array structure is a bad choice, just that I don't see any major downside of using strings which are simpler: don't require any serialization/parsing nor documenting.

ellatrix · 2017-09-22T19:53:12Z

Why destroy a proper nested structure (DOM => Array) by mashing it it all together in HTML if it's useful internally in Editable, for footnotes, collaborative editing, perhaps doing stuff with editable like spell check/suggestions, having the same structure for matchers (no need for block implementors to understand two concepts)? If we do the agnostic framework approach (which is not my main argument here), it could be the same concept as well (https://github.com/WordPress/gutenberg/pull/2463/files#diff-861daa55c62f4f58dc9b3c0cca54f80bR31).

youknowriad · 2017-09-22T19:58:35Z

I like the driven by use-cases approach, I think moving to a string is easy enough and will remove a lot of code. Let's see if the array structure solves the use-cases better than a simple string.

aduth · 2017-09-22T19:59:30Z

Why destroy a proper nested structure (DOM => Array) by mashing it it all together in HTML if it's useful internally in Editable

Because the developer-facing implications (distinguishing between types of attribute values) should take precedence, particularly if it's still technically feasible to satisfy all internal (or the rare block-specific) requirements. Documenting what and why children is needed has proven to be a challenge.

dmsnell · 2017-09-22T23:48:05Z

Coming in late to the party I can say that I don't understand the implications here. A string does seem a bit overkill and backwards but maybe we've had problems with trees?

I can understand a confusion with a mixture of strings and rich types, but have we looked at making everything a rich type such as an array or tree? Lots of earlier work was explored to serve a tree-representation of HTML as a POJO.

TinyMCE won't spit out this tree which is kind of a bummer. It all ties in with parser work and "how do we represent block data in the parse?" If everything is a tree it's easy. If everything is a string it's probably also easy.

Here's what a tree doesn't get us: an answer to why the serialized format is an HTML/JSON hybrid.
Here's what a string doesn't get us: any help when filtering and processing the content.

I do know that I've found great merit in representing text fundamentally as a plain-text string with attributes operating over ranges of the text. It has worked well with notifications (especially in multi-platform contexts) and it seems to work well in Draft-JS and other code editors. Nothing funny has to happen in order to move from a tree to a decorated view component.

If we make it all strings it seems like that could provide a path to figure out better what our needs are and reduce the code in play until we have a straightforward conception for the trees. My personal hope would be that in the end we're working with data in the language of the editor (JavaScript and JavaScript objects) vs data in another language (and a complicated one at that) - HTML.

ellatrix · 2017-09-23T14:41:51Z

TinyMCE won't spit out this tree which is kind of a bummer. It all ties in with parser work and "how do we represent block data in the parse?"

@dmsnell That's not true though... it does spit out a tree. It's in the document, so it's a tree, just do editor.getBody().childNodes and make an array from that. No need to parse. That's the behaviour right now, but with React instead of Arrays, which is unfortunate I guess.

I guess it's all fairly easy to adjust again, and I'm not going to argue against it further without any code. :) I agree with @youknowriad there.

dmsnell · 2017-09-24T01:40:31Z

Cool @iseulde! I didn't know that.

If we were to have a tree structure for Editable then the parse could produce that natural structure and we could get rid of the idea of having separate block children as a linear array. Instead, they would simply appear in place wherever the author puts them (in the tree of children).

How hard is it to feed TinyMCE a tree? How hard would it be to convert a block placeholder to something TinyMCE could represent?

These kinds of structures are easily processed with stupid-simple recursive-descent parsers. For example, converting to a React tree or a DOM tree or building a string are all basically done with the same abstraction: walk the tree in a depth-first search.

For some examples (lacking documentation or explanation of what they are doing) I have some code lying around from a prototype for the notifications panel at WordPress.com

https://github.com/dmsnell/wrnc/blob/master/src/api-poller/block-parser.js
In this file we're taking a structure much like described in #771 and turning it into a tree of nodes of a particular type. The base data is a plain string and an array of ranges attaching attributes to those ranges in the text: URLs, user objects, site objects, etc…

https://github.com/dmsnell/wrnc/blob/master/src/blocks/block-tree-parser.jsx
In this file we're converting that tree of nodes into React components. This is where the tree structure shines because you can see how little code it takes to traverse. In this code we're even allowing things like untagged strings to represent text nodes.

The commit which brought those two files into existence explains things in its comment. There's lots to ignore in that code as well though so please don't let that distract from the core idea. The original doodles for that code turned the tree back into a linear string and took a similar amount (but less) code as the toReact version.

ellatrix · 2017-09-24T10:14:16Z

How hard is it to feed TinyMCE a tree?

If the content is clean (a valid structure, etc.), you can map the JSON tree to a DOM tree and attach it to the TinyMCE editor root node. At the moment we run it through setContent to validate and adjust the content. When the content comes from TinyMCE after that, we know it's valid, so we set it without the validation (basically just rootNode.innerHTML = ...).

How hard would it be to convert a block placeholder to something TinyMCE could represent?

I'm not sure what you mean? Could you explain?

ellatrix · 2017-09-25T11:36:16Z

This is just a test with editable state as array tree instead of React tree: https://github.com/WordPress/gutenberg/compare/try/editable-state-tree
Very rough, without any optimisations and unit tests adjusted etc. Also noticed unit tests use a similar structure, so we could get rid of those conversions. Was thinking the build further on this for footnotes at some point.

atimmer · 2017-09-27T10:11:14Z

I find it hard to jump in without context, but here goes.

In general I would prefer any kind of data structure that is more rich than a tree to represent anything. In the content analysis we are using in Yoast SEO we are currently moving a way from string processing and moving towards building a tree form the content our users write. If we can somehow use the tree already present in Gutenberg we can speed up our content analysis by a lot.

If the concern is how easy it is to make a block, I think there are plenty of ways to make the API easy and still use a tree structure internally. It is a matter of high level APIs versus low level APIs. The low level APIs are exposed to trees and the higher level APIs could be exposed to strings. I would assume that the block API is a high level API.

As for specific use-cases. We are trying to mark specific pieces of text in the editor to highlight places where the user can improve their text. In current WordPress/tinyMCE they are completely removed once the user focuses the editor, because we cannot guarantee we won't corrupt the data otherwise. We would really like to have real-time marking of the text.

aduth · 2017-10-25T19:23:09Z

You may also be interested in #2541 which seeks to explore solutions to enabling block implementers to change their block markup without invalidating previously created blocks.

andreiglingeanu · 2017-10-25T19:29:29Z

Wow, that's mind blowing! But anyway, I can see a couple of cases where a complete server-side solution might be a better option.

What's the current strategy for core blocks? They all get invalidated on subsequent changes, right?

aduth · 2017-10-25T19:40:41Z

What's the current strategy for core blocks? They all get invalidated on subsequent changes, right?

Depending on how severe the change is, and notably for markup changes it tends to be more disruptive but, generally, yes a saved core block may simply become invalid if its markup is changed version to version.

aduth · 2018-03-07T14:06:34Z

After some unsuccessful iterations, I'm going to close this for reasons discussed in #5380.

Certainly can be revisited, but I don't think this is a priority item for now, and the tree shape may prove useful in the long-term.

aduth added [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable [Type] Task Issues or PRs that have been broken down into an individual action to take labels Sep 20, 2017

aduth mentioned this issue Sep 20, 2017

Editable: Invoke onChange on every input #2758

Closed

mtias added this to the Beta 1.2 milestone Sep 20, 2017

youknowriad self-assigned this Sep 20, 2017

youknowriad mentioned this issue Sep 25, 2017

Editable: Treat the value prop as a string #2786

Closed

iandunn mentioned this issue Oct 25, 2017

Poor UX after removing gutenberg_add_edit_link_for_post_type filter #3158

Closed

mtias modified the milestones: Beta 1.6, Beta 1.7 Oct 31, 2017

ellatrix modified the milestones: Beta 1.7, Beta 1.8 Nov 20, 2017

gziolo mentioned this issue Nov 30, 2017

Collaborative Editing using WebRTC (only 2 peers supported) #3741

Closed

18 tasks

ellatrix modified the milestones: Beta 1.8, 2.0.0 Dec 13, 2017

aduth mentioned this issue Dec 15, 2017

Try: Custom save serializer, Editable as array tree #4049

Closed

karmatosed modified the milestones: 2.0.0, 2.1 Jan 17, 2018

karmatosed modified the milestones: 2.1, 2.2 Jan 25, 2018

aduth mentioned this issue Jan 31, 2018

Element / Block API: Add RawHTML component, drop support for HTML string block save #4786

Merged

aduth modified the milestones: 2.2, 2.3 Feb 14, 2018

aduth modified the milestones: 2.3, 2.4 Mar 1, 2018

aduth mentioned this issue Mar 3, 2018

Try: Block API: Remove children source in favor of HTML string #5380

Closed

karmatosed modified the milestones: 2.4, 2.5, Merge Proposal Mar 7, 2018

aduth closed this as completed Mar 7, 2018

efuller mentioned this issue Mar 14, 2018

Add Deprecated Function for Existing Blocks WebDevStudios/wds-blocks#24

Closed

6 tasks

aduth mentioned this issue Jul 12, 2018

Rich Text: Formalize RichText children value abstraction #7934

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editable: Treat value as HTML string, eliminate children source #2750

Editable: Treat value as HTML string, eliminate children source #2750

aduth commented Sep 20, 2017

aduth commented Sep 20, 2017

ellatrix commented Sep 22, 2017 •

edited

Loading

aduth commented Sep 22, 2017 •

edited

Loading

ellatrix commented Sep 22, 2017

aduth commented Sep 22, 2017

ellatrix commented Sep 22, 2017 •

edited

Loading

ellatrix commented Sep 22, 2017

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017 •

edited

Loading

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017 •

edited

Loading

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017

ellatrix commented Sep 22, 2017

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017

youknowriad commented Sep 22, 2017 •

edited

Loading

aduth commented Sep 22, 2017

dmsnell commented Sep 22, 2017 •

edited

Loading

ellatrix commented Sep 23, 2017

dmsnell commented Sep 24, 2017

ellatrix commented Sep 24, 2017

ellatrix commented Sep 25, 2017

atimmer commented Sep 27, 2017

aduth commented Oct 25, 2017

andreiglingeanu commented Oct 25, 2017

aduth commented Oct 25, 2017

aduth commented Mar 7, 2018

Editable: Treat value as HTML string, eliminate children source #2750

Editable: Treat value as HTML string, eliminate children source #2750

Comments

aduth commented Sep 20, 2017

aduth commented Sep 20, 2017

ellatrix commented Sep 22, 2017 • edited Loading

aduth commented Sep 22, 2017 • edited Loading

ellatrix commented Sep 22, 2017

aduth commented Sep 22, 2017

ellatrix commented Sep 22, 2017 • edited Loading

ellatrix commented Sep 22, 2017

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017 • edited Loading

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017 • edited Loading

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017

ellatrix commented Sep 22, 2017

youknowriad commented Sep 22, 2017

ellatrix commented Sep 22, 2017

youknowriad commented Sep 22, 2017 • edited Loading

aduth commented Sep 22, 2017

dmsnell commented Sep 22, 2017 • edited Loading

ellatrix commented Sep 23, 2017

dmsnell commented Sep 24, 2017

ellatrix commented Sep 24, 2017

ellatrix commented Sep 25, 2017

atimmer commented Sep 27, 2017

aduth commented Oct 25, 2017

andreiglingeanu commented Oct 25, 2017

aduth commented Oct 25, 2017

aduth commented Mar 7, 2018

ellatrix commented Sep 22, 2017 •

edited

Loading

aduth commented Sep 22, 2017 •

edited

Loading

ellatrix commented Sep 22, 2017 •

edited

Loading

ellatrix commented Sep 22, 2017 •

edited

Loading

ellatrix commented Sep 22, 2017 •

edited

Loading

youknowriad commented Sep 22, 2017 •

edited

Loading

dmsnell commented Sep 22, 2017 •

edited

Loading