Deleting a closing block comment in "Text" mode #844

mtias · 2017-05-19T13:46:05Z

If you delete a closing comment in "text" mode, we are converting block delimiters into divs.

youknowriad · 2017-05-19T13:51:35Z

I think if we switch the parsing mechanism back to the grammar parser, this should be fixed. Doesn't mean we shouldn't fix the DOM parser tough.

dmsnell · 2017-05-19T14:43:50Z

just tested @youknowriad's suggestion and with the grammar-based parser only the affected block decays - the other blocks are preserved

it's nice isolating issues 😄

aduth · 2017-05-19T16:56:47Z

Doesn't mean we shouldn't fix the DOM parser tough.

How long do we want to maintain two parsers?

dmsnell · 2017-05-25T20:09:06Z

@aduth it would not be my choice to ever maintain more than one official parser, though I would hope that we could build one with a spec well-enough defined so that others could make their own implementations that work just as well

@nylen are you at all opposed to switching back to the grammar-based parser? also, are there still things the TinyMCE parser does that the PEG doesn't handle?

nylen · 2017-05-25T22:28:20Z

Our parser needs to handle HTML tags as well - this is why I pushed to get the TinyMCE-based parser in place, because our current PEG doesn't handle this. Though we're not using this functionality yet. soon I promise 😭

wp-post-grammar and our TinyMCE parser support HTML tags, though I still think we need to go looking for examples of malformed markup in the wild and make sure we handle them reasonably well.

Ideally whatever parsing solution we pick will eventually handle shortcodes as well.

I agree we don't want to maintain two parsers long-term. However, our existing PEG in this repo cannot do anything except handle block delimiters and their attributes, and we should keep the TinyMCE version until our PEG supports extracting HTML tags at least.

dmsnell · 2017-05-26T02:17:51Z

needs to handle HTML tags as well

I keep hearing this but what about it doesn't "handle" HTML tags? currently it handles HTML tags better than TinyMCE does (except for the bit where that was removed from the parser) because it can accept any combination of HTML and preserve its structure even if its invalid.

just asking because it appears like now the TinyParser is exhibiting broken behavior and is ill-defined in its behavior and the grammar-based parser doesn't have these problems and is capable of expanding in a straightforward way

there is nothing a proper parser cannot do that TinyMCE can do

nylen · 2017-05-26T17:38:18Z

See #426 for previous, similar discussion. I'd like to replace the TinyMCE-based parser with something more robust as soon as possible - I don't like the way we are handling comment delimiters either.

More specifically: I want a parser that returns a tree of elements including block delimiters, HTML elements, and eventually shortcodes. In #914 I've started using the tree of HTML elements returned by the TinyMCE parser because it's available today.

However, the current TinyMCE-based approach also has the advantage that we aren't significantly changing parsing behavior from 10+ years of existing code, because content already runs through this parser. And I'm concerned that there are a lot of cases to handle there. For a couple of examples:

There are many different kinds of HTML elements that need to be parsed and opened/closed differently.
For example, hr tags cannot contain children. code
We should handle invalid nesting the same way as TinyMCE and/or browsers, by closing existing tags and starting over. code

because it can accept any combination of HTML and preserve its structure even if its invalid.

As currently structured, where HTML content inside block delimiters is simply treated as a string, this isn't a good thing! I expect a common failure case will be that our block delimiters get wrapped in extraneous tags like a div. @azaozz put together an example of this behavior, IIRC. We need to be able to detect and recover from this.

there is nothing a proper parser cannot do that TinyMCE can do

This is certainly true, but how much work is involved, and/or how hard do we want to try to match what WP has been doing here for years?

As far as the specific issue reported here, I'll look into it today.

dmsnell · 2017-05-26T17:48:17Z

@nylen thanks for the response.

because it's available today.

…and the post grammar had it available yesterday 😉 it was in the original until it was removed because it was supposedly not wanted in Gutenberg.

not a big deal. I think it's time to make the update back from wp-post-grammar and introduce the one escape-hatch flat: preserveRawContents or something like that. if that's the case we can skip parsing the contents of the block and pass the string into TinyMCE or whatever it is we want.

the reason for preserving HTML structure even if invalid is that it makes the parser transparent in the process. breaking ten years of habits is fine for most cases because we're building a new editor and new block types. if we want to work with a TinyMCE block then let's just shove the content in, but not welcome into new block types the sort of troubled HTML that has been around so long.

o hai bro. I heard you want to make a new block type. well you better make well-formed HTML

having the parser generate the tree is something I like because it means block code can deal with that tree specifically. if the HTML is roughly well-formed then interacting with that tree should be trivial and block authors won't need to think about parsing. they can return a tree too and have it serialize into HTML.

are you aware of anything besides the TinyMCE block which will need this?

This is certainly true, but how much work is involved

Almost certainly much simpler with a grammar because the structure is declarative and well-formed. It's too easy to build parsers by hand and imperatively and let tons of bugs and ill-defined behaviors sneak in. Changing big properties of the grammar is relatively straightforward because the parser is generated code.

how hard do we want to try to match what WP has been doing here for years?

my vote would be to spend almost no effort, hence the escape hatch. getting away from TinyMCE's behavioral legacy should be a priority. fix it, instead of enable it.

nylen · 2017-05-26T18:06:28Z

it was in the original until it was removed because it was supposedly not wanted in Gutenberg. I think it's time to make the update back from wp-post-grammar and introduce the one escape-hatch flat: preserveRawContents or something like that.

I missed where we removed this from the grammar parser. I thought it was never added because we haven't needed it yet. In any case I agree, let's bring this [using the grammar to parse a tree of HTML content] back.

are you aware of anything besides the TinyMCE block which will need this?

Ideally, no.

my vote would be to spend almost no effort, hence the escape hatch. getting away from TinyMCE's behavioral legacy should be a priority. fix it, instead of enable it.

👍 I like this approach, especially for new block types.

dmsnell · 2017-05-26T18:29:42Z

@nylen here's the point where the repo version differs from this one

of course, we have full control over what we want so it's not like we're stuck with what's already there. when I started on wp-post-grammar my goal was to follow RFC conventions and specs, which is why some of the HTML nomenclature isn't as colloquial as one might expect

nylen · 2017-05-31T16:55:59Z

@mtias I'm not seeing this issue. Can you share the markup you used?

I added a new test:

diff --git a/blocks/api/test/parser.js b/blocks/api/test/parser.js
index 889a347..9b1b06e 100644
--- a/blocks/api/test/parser.js
+++ b/blocks/api/test/parser.js
@@ -179,6 +179,15 @@ describe( 'block parser', () => {
 					expect( parsed ).to.eql( [] );
 				} );
 
+				it( 'should parse post content with a missing ending delimiter', () => {
+					registerBlock( 'core/test-block', {} );
+
+					const parsed = parse( '<!-- wp:core/test-block -->' );
+
+					expect( parsed ).to.have.lengthOf( 1 );
+					expect( parsed[ 0 ].blockType ).to.equal( 'core/test-block' );
+				} );
+
 				it( 'should parse the post content, ignoring unknown blocks', () => {
 					registerBlock( 'core/test-block', {
 						attributes: function( rawContent ) {

and when I run it, it passes using the TinyMCE parser, but not the grammar parser:

npm run test-unit -- --grep 'with a missing ending delimiter'

In fact this is more like what I would've expected, because the TinyMCE-based parser handles auto-closing badly nested tags.

dmsnell · 2017-05-31T18:34:16Z

As noted in Slack, I think that auto-closing missing block comments might cause more damage than it prevents. I would like to make sure that we don't "fix" them.

mtias · 2017-05-31T18:45:55Z

@nylen recorded here: https://cloudup.com/caocvWrIeXl

nylen · 2017-05-31T19:35:58Z

Thanks @mtias. This is certainly a shortcoming of the approach used in the TinyMCE-based parser (all comment delimiters are transformed into fake wp-block tags for the parse, and then only top-level ones have the reverse transformation applied). I don't see a clean way to solve this.

I think that auto-closing missing block comments might cause more damage than it prevents. I would like to make sure that we don't "fix" them.

As long as we pass the resulting content to the core/freeform block, rather than destroying it, I am totally fine with this.

In this particular case, using the grammar-based parser, would the entire post then be invalid? That seems unfortunate, but perhaps OK.

dmsnell · 2017-05-31T19:45:55Z

As long as we pass the resulting content to the core/freeform block, rather than destroying it, I am totally fine with this.

That seems very doable. I think in addition, unless the freeform block already indicates content outside of a block proper, that we can indicate how the content came through an invalid parse, that it was outside of any recognizable block.

In this particular case, using the grammar-based parser, would the entire post then be invalid? That seems unfortunate, but perhaps OK.

I don't think the intent was to preserve the valid blocks but that's what it appears to do. I would like to establish more in terms of error reporting from the grammar and I think we could imagine passing along a pair of [ parsedContent, parseMeta ] holding not only the blocks but also any errors or important information about the parse.

nylen · 2017-05-31T19:48:44Z

I don't think the intent was to preserve the valid blocks but that's what it appears to do. I would like to establish more in terms of error reporting from the grammar and I think we could imagine passing along a pair of [ parsedContent, parseMeta ] holding not only the blocks but also any errors or important information about the parse.

I think this is a great idea, including eventually surfacing these errors to the user.

We may also find that there are very common cases which are simple and well-defined enough to correct rather than just detect. Assuming no block nesting (which will not always be true), the recording above is an example of such a case.

I know this is not always possible, but I still think this sort of thing is worth looking out for during user testing.

dmsnell · 2017-05-31T19:56:05Z

We may also find that there are very common cases which are simple and well-defined enough to correct rather than just detect.

would love to consider some of these separately from the parser and in a separate processing stage. right now I'm very uncomfortable with the ideas presented and think they will do more harm than good because of how they mangle content and hide errors and make decisions on undecidable circumstances.

Assuming no block nesting (which will not always be true), the recording above is an example of such a case.

I'm not sure I understand what you are communicating here. Can you help me by rewording or expanding the explanation?

gravityrail · 2017-05-31T19:57:21Z

Some thoughts:

It seems that there are a couple of overlapping cases we're trying to handle here.

(1) that we straight-up have invalid HTML to start with.
(2) that the user externally edits the HTML generated by Gutenberg and it messes things up (e.g. by removing closing tags from wp-blocks).

I am not a parsing expert so I don't have an answer to (1), but for (2) I was thinking that we could include two additional comments in the HTML. A comment at the top which says something like:

<!-- wp-warning: This code was edited by Gutenberg. External changes may break your editing experience. You have been warned -->

And another one at the bottom:

<!-- wp-checksum: ab23ffee -->

Which is the checksum of the HTML with that comment removed, set by the Gutenberg JS before save.

Then users have been warned not to change it, and if they DO change it you can warn them again that stuff might be broken, and perhaps even give them a button to revert to the last version of the post where the checksum was correct.

Perhaps by doing these things, you can get away with the simpler, faster PEG grammar thing.

dmsnell · 2017-05-31T19:59:23Z

we could include two additional comments in the HTML

both great options @gravityrail and things we have discussed. granted, these are nice add-ins to iterate on. at the moment these seem unrelated to this PR but I would like to see us do both

nylen · 2017-05-31T19:59:26Z

would love to consider some of these separately from the parser and in a separate processing stage.

+1.

I'm not sure I understand what you are communicating here. Can you help me by rewording or expanding the explanation?

The example in the recording above is easy to fix by simply closing out the previous block tag when we encounter the new tag (). This assumption is only valid if we assume that blocks cannot be nested (or if we assume that a block cannot be nested with itself, which seems more reasonable).

nylen · 2017-05-31T20:06:05Z

I am extremely strongly against any solution that requires detecting changes to content, as we simply cannot assume that we know how people will end up editing post content that contains blocks. (We know some cases. A common case during the initial adoption of Gutenberg will be switching back and forth between wp-admin and Gutenberg, which is pretty badly broken at the moment.)

Instead I would prefer that we validate as much as possible, detect invalid data, and add fallbacks to more flexible editing behavior when we find something invalid.

For better or worse, WordPress is extremely interoperable with other tools, it allows plugins to do basically whatever they want, etc. I think these attributes are worth preserving to the extent that we can.

From https://make.wordpress.org/core/2017/01/17/editor-technical-overview/:

To ensure we keep a key component of WordPress’ strength intact, the source of truth for the content should remain in post_content, where the bulk of the post data needs to be present in a way that is accessible and portable, while still providing additional structure on top of HTML semantics for our editing tools. Data needs to be praised and respected. This additional structure would hopefully be invisible to the document’s display context, as it ensures the rendered content is viewable in situations that may not be aware of blocks at all. (Think of email clients, RSS, older editor versions, mobile editors, database dumps, etc.)

In particular, "older editor versions" and "mobile editors" should not have to know about Gutenberg to allow viewing and making changes to post content. It's great if they do, but I think we are throwing out a large part of the justification for using post_content if we say "all bets are off when you edit WordPress's standard content storage mechanism".

dmsnell · 2017-05-31T20:07:42Z

The example in the recording above is easy to fix by simply closing out the previous block tag when we encounter the new tag ()

makes sense, though I fail to see how we can be confident here on that. if we have nothing but valid blocks, and we have no content after the last block and before the current block opening, and we have content to the end without a closing comment, then I'd probably agree that it's safe. I'm more concerned about situations like this…

<!-- wp:quote -->
a
<!-- wp:quote -->
b
<!-- /wp:quote -->

or this

<!-- /wp:quote -->
a
<!-- /wp:quote -->

assume that blocks cannot be nested

well I'm still holding out on this assumption because I think we'll need nested blocks but have to figure out what that means from a UI perspective. thinking about tables and actual columns (two quarter-columns on the left, one half column on the right) or a block quote inside a block quote

dmsnell · 2017-05-31T20:09:48Z

In particular, "older editor versions" and "mobile editors" should not have to know about Gutenberg to allow viewing and making changes to post content.

this seems to be one point where I've heard different from both @mtias and @jasmussen - that preserving the edit functionality in the "old editor" is not a primary project goal and shouldn't hold us back. hopefully they can confirm or refute me.

gravityrail · 2017-05-31T20:12:03Z

I am extremely strongly against any solution that requires detecting changes to content ... Instead I would prefer that we validate as much as possible, detect invalid data, and add fallbacks to more flexible editing behavior when we find something invalid.

But isn't it nice to know that content was edited externally, so that if we do implement these fallbacks (which may not always be to the user's liking) AND we know that the content was edited externally, we can say to the user "we detected some invalid markup, and we did our best to fix it, but if you need to revert to the last correct version click here."

nylen · 2017-05-31T20:13:35Z

that preserving the edit functionality in the "old editor" is not a primary project goal and shouldn't hold us back. hopefully they can confirm or refute me.

I think this is one of the worst mistakes we can possibly make with this project.

dmsnell · 2017-05-31T20:14:30Z

I think this is one of the worst mistakes we can possibly make with this project.

we all understand that 😄

gravityrail · 2017-05-31T20:17:15Z

I think everyone here agrees that users should be free to use whatever editor that want, and we should do the best we can regardless. Besides, you have to solve that problem in order to work with legacy content in the first place.

It seems that where people disagree is on how to handle the consequences of that.

I happen to think that warning people who are using an external editor that they could break their Gutenberg experience is a good thing to do, since it's hard for me to imagine a performant, 100% solution to the terrible things they might to to your special tags.

jasmussen · 2017-06-01T06:47:41Z

In particular, "older editor versions" and "mobile editors" should not have to know about Gutenberg to allow viewing and making changes to post content.

this seems to be one point where I've heard different from both @mtias and @jasmussen - that preserving the edit functionality in the "old editor" is not a primary project goal and shouldn't hold us back. hopefully they can confirm or refute me.

First off: there are no stone tablets.

What we have to do, however, is decide what goes in the near term V1, and what has to be postponed to the longer term V2. And for V1, it is not a primary goal to optimize for the situations where multiple different editors edit the same blocks.

Very important, though: while not a goal for V1, this does not mean we can't figure out solutions for V2. A big reason for going with this post_content approach was exactly so that we could keep some backwards compatibility. We want this to mature, but we can't let this stop us from moving.

A V1 can take many shapes. Maybe it's opt-in initially, maybe it becomes opt-out after a while. Real usage in the plugin directory is also going to help surface problems, and help inform what shape the solutions might take. In the mean time, maybe there's a "preview" label on the editor. We have revisions and undo buttons to help mitigate damage.

We'll wade these waters, and I really appreciate that we are thinking about these problems, because they are difficult problems. We will get to them.

nylen · 2017-06-02T03:39:35Z

I think everyone here agrees that users should be free to use whatever editor that want, and we should do the best we can regardless. Besides, you have to solve that problem in order to work with legacy content in the first place.

Quite honestly, it seems like we are barely trying to solve this set of problems at all.

And for V1, it is not a primary goal to optimize for the situations where multiple different editors edit the same blocks.

There are several problems with this reasoning.

First, we're making major changes to software that millions of people use. This is a lot of responsibility; part of that responsibility is smoothing out the upgrade process from the old editor, including when people switch back and forth between editors to edit their posts. In fact, I'd argue that this is only really important for v1, when people are trying out the new editor for the first few times, and not so much for v2.

I'm concerned that our current approach will be perceived as follows: "Thanks for using WordPress. We made some major changes. Going forward, you had better use WordPress in exactly this specific way, or you will experience content deletion and other various kinds of undefined behavior." I am not even convinced that we will be able to document and communicate this appropriately. All of this seems extremely antithetical to the spirit of WordPress as a whole.

Second, whether intentionally or not, saying "for v2" is often a friendly way of punting entirely. In my 2 years and several months at Automattic, I have already been on at least 5 projects that are still awaiting these "post-launch features".

Finally:

A big reason for going with this post_content approach was exactly so that we could keep some backwards compatibility.

Backwards compatibility in what form? The current state seems like the worst of both worlds: we are using an existing data storage mechanism (which in software terms is ancient), but pretending that it will not be used in any way other than to store data from our new project.

mtias added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Type] Bug An existing feature does not function as intended labels May 19, 2017

nylen mentioned this issue May 26, 2017

Add HTML schema for separator block #914

Closed

5 tasks

nylen self-assigned this May 26, 2017

This was referenced May 31, 2017

Parser: Revert to PEG parser by default #947

Merged

Automated Testing Guidelines for Gutenberg #939

Closed

nylen mentioned this issue Jun 12, 2017

Refactor serializer: rearrange code for clarity and introspection #1148

Merged

mtias closed this as completed Jun 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting a closing block comment in "Text" mode #844

Deleting a closing block comment in "Text" mode #844

mtias commented May 19, 2017

youknowriad commented May 19, 2017 •

edited

Loading

dmsnell commented May 19, 2017 •

edited

Loading

aduth commented May 19, 2017

dmsnell commented May 25, 2017

nylen commented May 25, 2017

dmsnell commented May 26, 2017

nylen commented May 26, 2017 •

edited

Loading

dmsnell commented May 26, 2017

nylen commented May 26, 2017

dmsnell commented May 26, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

mtias commented May 31, 2017

nylen commented May 31, 2017 •

edited

Loading

dmsnell commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

gravityrail commented May 31, 2017

dmsnell commented May 31, 2017

nylen commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

dmsnell commented May 31, 2017

gravityrail commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

gravityrail commented May 31, 2017

jasmussen commented Jun 1, 2017

nylen commented Jun 2, 2017

Deleting a closing block comment in "Text" mode #844

Deleting a closing block comment in "Text" mode #844

Comments

mtias commented May 19, 2017

youknowriad commented May 19, 2017 • edited Loading

dmsnell commented May 19, 2017 • edited Loading

aduth commented May 19, 2017

dmsnell commented May 25, 2017

nylen commented May 25, 2017

dmsnell commented May 26, 2017

nylen commented May 26, 2017 • edited Loading

dmsnell commented May 26, 2017

nylen commented May 26, 2017

dmsnell commented May 26, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

mtias commented May 31, 2017

nylen commented May 31, 2017 • edited Loading

dmsnell commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

gravityrail commented May 31, 2017

dmsnell commented May 31, 2017

nylen commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

dmsnell commented May 31, 2017

gravityrail commented May 31, 2017

nylen commented May 31, 2017

dmsnell commented May 31, 2017

gravityrail commented May 31, 2017

jasmussen commented Jun 1, 2017

nylen commented Jun 2, 2017

youknowriad commented May 19, 2017 •

edited

Loading

dmsnell commented May 19, 2017 •

edited

Loading

nylen commented May 26, 2017 •

edited

Loading

nylen commented May 31, 2017 •

edited

Loading