-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deleting a closing block comment in "Text" mode #844
Comments
I think if we switch the parsing mechanism back to the grammar parser, this should be fixed. Doesn't mean we shouldn't fix the DOM parser tough. |
just tested @youknowriad's suggestion and with the grammar-based parser only the affected block decays - the other blocks are preserved it's nice isolating issues 😄 |
How long do we want to maintain two parsers? |
@aduth it would not be my choice to ever maintain more than one official parser, though I would hope that we could build one with a spec well-enough defined so that others could make their own implementations that work just as well @nylen are you at all opposed to switching back to the grammar-based parser? also, are there still things the TinyMCE parser does that the PEG doesn't handle? |
Our parser needs to handle HTML tags as well - this is why I pushed to get the TinyMCE-based parser in place, because our current PEG doesn't handle this. Though we're not using this functionality yet. soon I promise 😭
Ideally whatever parsing solution we pick will eventually handle shortcodes as well. I agree we don't want to maintain two parsers long-term. However, our existing PEG in this repo cannot do anything except handle block delimiters and their attributes, and we should keep the TinyMCE version until our PEG supports extracting HTML tags at least. |
I keep hearing this but what about it doesn't "handle" HTML tags? currently it handles HTML tags better than TinyMCE does (except for the bit where that was removed from the parser) because it can accept any combination of HTML and preserve its structure even if its invalid. just asking because it appears like now the TinyParser is exhibiting broken behavior and is ill-defined in its behavior and the grammar-based parser doesn't have these problems and is capable of expanding in a straightforward way there is nothing a proper parser cannot do that TinyMCE can do |
See #426 for previous, similar discussion. I'd like to replace the TinyMCE-based parser with something more robust as soon as possible - I don't like the way we are handling comment delimiters either. More specifically: I want a parser that returns a tree of elements including block delimiters, HTML elements, and eventually shortcodes. In #914 I've started using the tree of HTML elements returned by the TinyMCE parser because it's available today. However, the current TinyMCE-based approach also has the advantage that we aren't significantly changing parsing behavior from 10+ years of existing code, because content already runs through this parser. And I'm concerned that there are a lot of cases to handle there. For a couple of examples:
As currently structured, where HTML content inside block delimiters is simply treated as a string, this isn't a good thing! I expect a common failure case will be that our block delimiters get wrapped in extraneous tags like a
This is certainly true, but how much work is involved, and/or how hard do we want to try to match what WP has been doing here for years? As far as the specific issue reported here, I'll look into it today. |
@nylen thanks for the response.
…and the post grammar had it available yesterday 😉 it was in the original until it was removed because it was supposedly not wanted in Gutenberg. not a big deal. I think it's time to make the update back from wp-post-grammar and introduce the one escape-hatch flat: the reason for preserving HTML structure even if invalid is that it makes the parser transparent in the process. breaking ten years of habits is fine for most cases because we're building a new editor and new block types. if we want to work with a TinyMCE block then let's just shove the content in, but not welcome into new block types the sort of troubled HTML that has been around so long.
having the parser generate the tree is something I like because it means block code can deal with that tree specifically. if the HTML is roughly well-formed then interacting with that tree should be trivial and block authors won't need to think about parsing. they can return a tree too and have it serialize into HTML. are you aware of anything besides the TinyMCE block which will need this?
Almost certainly much simpler with a grammar because the structure is declarative and well-formed. It's too easy to build parsers by hand and imperatively and let tons of bugs and ill-defined behaviors sneak in. Changing big properties of the grammar is relatively straightforward because the parser is generated code.
my vote would be to spend almost no effort, hence the escape hatch. getting away from TinyMCE's behavioral legacy should be a priority. fix it, instead of enable it. |
I missed where we removed this from the grammar parser. I thought it was never added because we haven't needed it yet. In any case I agree, let's bring this [using the grammar to parse a tree of HTML content] back.
Ideally, no.
👍 I like this approach, especially for new block types. |
@nylen here's the point where the repo version differs from this one of course, we have full control over what we want so it's not like we're stuck with what's already there. when I started on |
@mtias I'm not seeing this issue. Can you share the markup you used? I added a new test: diff --git a/blocks/api/test/parser.js b/blocks/api/test/parser.js
index 889a347..9b1b06e 100644
--- a/blocks/api/test/parser.js
+++ b/blocks/api/test/parser.js
@@ -179,6 +179,15 @@ describe( 'block parser', () => {
expect( parsed ).to.eql( [] );
} );
+ it( 'should parse post content with a missing ending delimiter', () => {
+ registerBlock( 'core/test-block', {} );
+
+ const parsed = parse( '<!-- wp:core/test-block -->' );
+
+ expect( parsed ).to.have.lengthOf( 1 );
+ expect( parsed[ 0 ].blockType ).to.equal( 'core/test-block' );
+ } );
+
it( 'should parse the post content, ignoring unknown blocks', () => {
registerBlock( 'core/test-block', {
attributes: function( rawContent ) { and when I run it, it passes using the TinyMCE parser, but not the grammar parser: npm run test-unit -- --grep 'with a missing ending delimiter' In fact this is more like what I would've expected, because the TinyMCE-based parser handles auto-closing badly nested tags. |
As noted in Slack, I think that auto-closing missing block comments might cause more damage than it prevents. I would like to make sure that we don't "fix" them. |
@nylen recorded here: https://cloudup.com/caocvWrIeXl |
Thanks @mtias. This is certainly a shortcoming of the approach used in the TinyMCE-based parser (all comment delimiters are transformed into fake
As long as we pass the resulting content to the In this particular case, using the grammar-based parser, would the entire post then be invalid? That seems unfortunate, but perhaps OK. |
That seems very doable. I think in addition, unless the freeform block already indicates content outside of a block proper, that we can indicate how the content came through an invalid parse, that it was outside of any recognizable block.
I don't think the intent was to preserve the valid blocks but that's what it appears to do. I would like to establish more in terms of error reporting from the grammar and I think we could imagine passing along a pair of |
I think this is a great idea, including eventually surfacing these errors to the user. We may also find that there are very common cases which are simple and well-defined enough to correct rather than just detect. Assuming no block nesting (which will not always be true), the recording above is an example of such a case. I know this is not always possible, but I still think this sort of thing is worth looking out for during user testing. |
would love to consider some of these separately from the parser and in a separate processing stage. right now I'm very uncomfortable with the ideas presented and think they will do more harm than good because of how they mangle content and hide errors and make decisions on undecidable circumstances.
I'm not sure I understand what you are communicating here. Can you help me by rewording or expanding the explanation? |
Some thoughts: It seems that there are a couple of overlapping cases we're trying to handle here. (1) that we straight-up have invalid HTML to start with. I am not a parsing expert so I don't have an answer to (1), but for (2) I was thinking that we could include two additional comments in the HTML. A comment at the top which says something like:
And another one at the bottom:
Which is the checksum of the HTML with that comment removed, set by the Gutenberg JS before save. Then users have been warned not to change it, and if they DO change it you can warn them again that stuff might be broken, and perhaps even give them a button to revert to the last version of the post where the checksum was correct. Perhaps by doing these things, you can get away with the simpler, faster PEG grammar thing. |
both great options @gravityrail and things we have discussed. granted, these are nice add-ins to iterate on. at the moment these seem unrelated to this PR but I would like to see us do both |
+1.
The example in the recording above is easy to fix by simply closing out the previous block tag when we encounter the new tag ( |
I am extremely strongly against any solution that requires detecting changes to content, as we simply cannot assume that we know how people will end up editing post content that contains blocks. (We know some cases. A common case during the initial adoption of Gutenberg will be switching back and forth between wp-admin and Gutenberg, which is pretty badly broken at the moment.) Instead I would prefer that we validate as much as possible, detect invalid data, and add fallbacks to more flexible editing behavior when we find something invalid. For better or worse, WordPress is extremely interoperable with other tools, it allows plugins to do basically whatever they want, etc. I think these attributes are worth preserving to the extent that we can. From https://make.wordpress.org/core/2017/01/17/editor-technical-overview/:
In particular, "older editor versions" and "mobile editors" should not have to know about Gutenberg to allow viewing and making changes to post content. It's great if they do, but I think we are throwing out a large part of the justification for using |
makes sense, though I fail to see how we can be confident here on that. if we have nothing but valid blocks, and we have no content after the last block and before the current block opening, and we have content to the end without a closing comment, then I'd probably agree that it's safe. I'm more concerned about situations like this… <!-- wp:quote -->
a
<!-- wp:quote -->
b
<!-- /wp:quote --> or this <!-- /wp:quote -->
a
<!-- /wp:quote -->
well I'm still holding out on this assumption because I think we'll need nested blocks but have to figure out what that means from a UI perspective. thinking about tables and actual columns (two quarter-columns on the left, one half column on the right) or a block quote inside a block quote |
this seems to be one point where I've heard different from both @mtias and @jasmussen - that preserving the edit functionality in the "old editor" is not a primary project goal and shouldn't hold us back. hopefully they can confirm or refute me. |
But isn't it nice to know that content was edited externally, so that if we do implement these fallbacks (which may not always be to the user's liking) AND we know that the content was edited externally, we can say to the user "we detected some invalid markup, and we did our best to fix it, but if you need to revert to the last correct version click here." |
I think this is one of the worst mistakes we can possibly make with this project. |
we all understand that 😄 |
I think everyone here agrees that users should be free to use whatever editor that want, and we should do the best we can regardless. Besides, you have to solve that problem in order to work with legacy content in the first place. It seems that where people disagree is on how to handle the consequences of that. I happen to think that warning people who are using an external editor that they could break their Gutenberg experience is a good thing to do, since it's hard for me to imagine a performant, 100% solution to the terrible things they might to to your special tags. |
First off: there are no stone tablets. What we have to do, however, is decide what goes in the near term V1, and what has to be postponed to the longer term V2. And for V1, it is not a primary goal to optimize for the situations where multiple different editors edit the same blocks. Very important, though: while not a goal for V1, this does not mean we can't figure out solutions for V2. A big reason for going with this A V1 can take many shapes. Maybe it's opt-in initially, maybe it becomes opt-out after a while. Real usage in the plugin directory is also going to help surface problems, and help inform what shape the solutions might take. In the mean time, maybe there's a "preview" label on the editor. We have revisions and undo buttons to help mitigate damage. We'll wade these waters, and I really appreciate that we are thinking about these problems, because they are difficult problems. We will get to them. |
Quite honestly, it seems like we are barely trying to solve this set of problems at all.
There are several problems with this reasoning. First, we're making major changes to software that millions of people use. This is a lot of responsibility; part of that responsibility is smoothing out the upgrade process from the old editor, including when people switch back and forth between editors to edit their posts. In fact, I'd argue that this is only really important for v1, when people are trying out the new editor for the first few times, and not so much for v2. I'm concerned that our current approach will be perceived as follows: "Thanks for using WordPress. We made some major changes. Going forward, you had better use WordPress in exactly this specific way, or you will experience content deletion and other various kinds of undefined behavior." I am not even convinced that we will be able to document and communicate this appropriately. All of this seems extremely antithetical to the spirit of WordPress as a whole. Second, whether intentionally or not, saying "for v2" is often a friendly way of punting entirely. In my 2 years and several months at Automattic, I have already been on at least 5 projects that are still awaiting these "post-launch features". Finally:
Backwards compatibility in what form? The current state seems like the worst of both worlds: we are using an existing data storage mechanism (which in software terms is ancient), but pretending that it will not be used in any way other than to store data from our new project. |
If you delete a closing comment in "text" mode, we are converting block delimiters into divs.
The text was updated successfully, but these errors were encountered: