Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion (Data): Strongly-couple data model and UI #1175

Closed
dmsnell opened this issue Jun 15, 2017 · 9 comments
Closed

Discussion (Data): Strongly-couple data model and UI #1175

dmsnell opened this issue Jun 15, 2017 · 9 comments
Assignees
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable

Comments

@dmsnell
Copy link
Member

dmsnell commented Jun 15, 2017

One of the lessons Simplenote has been teaching is that (despite the more common scenarios) an editor and its data model are inherently wound together. What I mean to imply is that we might want to start thinking about making the data source an active participant through the editing flow.

You said what?

You open up a post in Gutenberg and load a block. You don't make any changes to that block. You save the document. Why is the new serialization different than the old serialization?

You make a one-letter typo update and your book has to re-parse and re-render. Why so much for such a simple edit?

You make a mistake and mangle a block's structure. The editor loads a freeform block. You don't change that block. Why do you lose the rest of that structure when you save?

These are some leading questions to motivate this (and performance) reasons why having a tightly-coupled model with its editor can be a win for us.

We're always going to be needing to think about performance since we're working in document parses and serialization. Being able to rely on immutable updates to enable shallow-comparison should be a super goal. What happens, however, when one little letter means we have to clone the entire document in memory?

This is a stub issue to raise the discussion.

Disconnected ideas

  • Structurally-sharing immutable data types
  • Tree data structure
  • Edit the tree, not the string
  • Create an editing API and abstract edit operations to give flexibility to the data structure
  • OO model of post document with interaction via methods
  • Coupling between actual blocks/trees and specific DOM nodes (block-as-VDOM)
@dmsnell dmsnell added [Feature] Block API API that allows to express the block paradigm. [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Jun 15, 2017
@BE-Webdesign
Copy link
Contributor

BE-Webdesign commented Jun 15, 2017

To throw out some thoughts into the discussion, which I am glad is opened.

This is a stub issue to raise the discussion.

I tried looking up stub issue to find nothing. So if my comments don't align with this kind of discussion let me know.

Not sure I fully follow the discussion topic, but I am gonna throw out some things on my mind, as I think they are related, or hope so 🙃. I think having a more data driven design would be best. I am not fully sold on the whole parse/serialize store our data in post content model as well. I know that is not totally related to this topic, and would be clunky to model in a DB like MySQL, but I think it is worth thinking about more. I also probably do not understand fully why storing our data in HTML is necessary. I saw source of truth being used as the main point to have everything in post_content, but I think a strong block model could equally be used as a source of truth. The ultimate goal of Gutenberg to me is to provide an intuitive new block based web content building experience, and I think the limitations of HTML are starting to smear that vision. To me it seems like we are creating a new spec of HTML, and it is probably the most confusing part of the project to me.

As far as porting legacy post_content into block format, I wonder if it should be done at all. I think having the freeform block act like the current editor would be a good solution. It could act as a catch all, and feature the switch between visual (TinyMCE and Text). Then any new Gutenberg blocks, can't be text edited directly in the editor. They are structured, trying to allow people to break them does not make sense. We could set the default block to be the freeform block, and if no blocks are associated with a post ( legacy post_content ), we would bring the post_content in as a freeform block to start the editing experience. This would alleviate a whole set of problems around figuring out how to port legacy content.

This does however introduce a divergence from interoperability with any other editors, and a divergence from WordPress's current ways. But I don't see this as a bad thing in some ways, as I previously did. We will still output HTML for good interop and realistically, I am not sure whether saving the data in our HTML saves us from this either, or creating complex HTML schema acceptances. HTML is not a good structured data format, I am not sold on why we should bind ourselves to it as one. By creating a data model that can be interacted with and is non HTML related, we also open up new possibilities like interacting with content via a CLI, or other non HTML related interfaces, because we have taken out that initial parse text step. Copy pasting from other sources into WordPress would most likely just go into a freeform block by default. The freeform block could serve as this base block which is just text of some kind or another, like the current WP editor.

What happens, however, when one little letter means we have to clone the entire document in memory?

You only need to create a new reference to the changed nodes, not necessarily copy the entire tree in memory. I am assuming you are alluding to structural sharing. Can't tell if this is a rhetorical question or not. Rhetorical questions on the internet always confuse me to be honest, because I have no idea whether a question is being asked or if I am supposed to understand something.

These are all just questions, and comments, pretty scatter brained on top of that. I have no idea what the best path forward is, but I definitely have concerns over the binding between HTML and our data model. The ideas presented above don't totally diverge at all from any of the work that has been done already either, so most of Gutenberg could be used in its current form working off of some of these ideas instead.

@dmsnell
Copy link
Member Author

dmsnell commented Jun 15, 2017

I tried looking up stub issue to find nothing.

this just means it was late and I was tired but didn't want to entirely forget what I wanted to write so I posted a small snippet to revise later

I am not fully sold on the whole parse/serialize store our data in post content model as well.

whether we end up staying on this trail or not, this is a basis on which the project has settled. even if we had JSON we'd likely still have a parse step. for now, this discussion is only intended to revolve around the point after loading the data model into memory so parsing is tangential.

interacting with content via a CLI, or other non HTML related interfaces,

yes! also related is the playground I proposed in #1178. this kind of interaction depends on having a clear data model and API to interface it.

Can't tell if this is a rhetorical question or not.

it's rhetorical to prompt the thought about performance. cloning an entire document is subpar. the purpose of this discussion is to ask how we can braid together the UI with the underlying model so that they can move in harmony, optimized in ways they can't if we isolate one concern from the other.


thanks for the feedback!

@BE-Webdesign
Copy link
Contributor

whether we end up staying on this trail or not, this is a basis on which the project has settled. even if we had JSON we'd likely still have a parse step. for now, this discussion is only intended to revolve around the point after loading the data model into memory so parsing is tangential.

Yup, not trying to derail from the path, but I thought it was at least worth bringing up some of these concerns. Even though we are talking about the context of after the data is parsed, the way we are persisting our data might impact the shape of whatever data structure we are going to interact with in the editor.

@dmsnell
Copy link
Member Author

dmsnell commented Jun 16, 2017

the way we are persisting our data might impact the shape of whatever data structure we are going to interact with in the editor.

hopefully not. if we do it right the editor won't have to care/won't care how the data is stored. in fact, hopefully someone will make their own parser/printer pair and and save the data in some other new form via a plugin.

@JJJ
Copy link
Contributor

JJJ commented Jun 23, 2017

I feel very strongly that inserting HTML comments into post_content is a decision that would be deeply regrettable later. It's extremely clever, and neat, but I am afraid of what problems plugins, themes, and core formatting functions will uncover.

I'm late to the "page builder" pro-plugin game (so maybe what I'm about to propose has already been done) but I've always imagined when WordPress finally adopted this approach in core, that it would look something like the following:

Post post_type = 'post', ID = 1, post_type_supports = gutenberg
|-- Title post_title
|-- Blocks post_parent = 1
|--|-- Text post_type = 'block_text'
|--|-- Quote post_type = 'block_quote'
|--|-- Gallery post_type = 'block_gallery'
|--|-- Text post_type = 'block_text'

This terrible rendition of a relationship tree tries to poorly depict several architectural advantages of what's already inside of WordPress:

  • wp_posts:menu_order determines the order of blocks
  • wp_posts:post_type is indexed, so a LIKE query on the blocks_ prefix limited to post_parent would perform OK (doesn't have to be like this obviously, just the easiest way)
  • Blocks would inherently have their own optional meta-data, status, revision history, etc...
  • gutenberg could be used as a post_type_supports flag, restricting child-block queries to only the types that support them
  • Blocks could be hierarchical, paving the way for things like fieldsets, fields, and other creative uses
  • Blocks could be related across posts with taxonomies (footnotes, cross-posts, etc...)
  • Blocks could easily be copied from post to post
  • wp_posts:post_content for the primary post ID = 1 would just be empty, which hints that child blocks may exist and need to be queried for, and leaves both Visual & Text editors without a gnarly mess to present to users or browsers

The other, perhaps less important reason – vanity. HTML comments for this is reminiscent of the Microsoft Word XML soup we all hated 20 years ago. We'd be deciding that the very-best we can do is invest heavily in a bespoke data format with a loose set of conventions chock-full of compromises due to legacy schema restrictions, and we know how that song goes – it's not good for anyone, and we'd have a hard time being convinced that it is a good idea if it weren't our idea.

This isn't to say that marrying it directly to the wp_posts database schema is ultimately what's best – that's arguably less portable and more bespoke to not-WordPress – only that historically, having mark-up decisions embedded into post_content has not made anything easier, so doubling down on what we know we don't like isn't something I'd imagined, and I wouldn't be surprised if I'm alone.

With my bad tree above, we could even reconcile the tree and compile it back into or out-of post_content to match the current HTML comment results. At least then there's a workable data layer, and not everything is tied to the same text blob.

To get more people on-board with the current approach, I think will take some marketing & championing, with very clear & evident pro vs. con comparisons.

@benhuson
Copy link

benhuson commented Jun 23, 2017

@JJJ I like the idea of managing via a separate schema and compiling it back to the post_content using some sort of comment markup. That way you'd get all the benefits of revisions and meta data like you mention, but compiling back into post content simplifies the issues that would arise from separate schema.

I have a plugin (definitely not extensive enough to be classified as a page builder) which manages additional content blocks as a separate post type with parent ID to assign to post, ordered by menu_order etc. I am happy with the way it works managing blocks as a separate post type and allow each block to be assigned separate templates which you current define through the theme in a similar way to page templates. It's a framework for creating block-like content but without the visual admin editor, it's just posts with templates, but obviously allows for meta data, featured images etc. The main issues I have found with this approach are:

  1. Search: Manipulating search queries to also search this content and return links to the parent page without duplicate results. It's possible but a bit of a pain when used in conjunction with other plugins that manipulate search queries. My fudge so far has been to compile content into a meta field of the parent and make that searchable or hook into one of the alternative search plugins like Relevanssi to include the child block content when it builds it's index.

  2. More queries: It's not too bad if you are building a standard long page when children can be got with just one additional query, but I have experimented with nested blocks stores as post children and grandchildren etc and the queries can start to mount up. For that reason I quite like the idea of a schema to store the data, but compiling it in some way for efficiency.

@westonruter
Copy link
Member

@JJJ Here are the reasons that come to mind as to why HTML comments are used for serializing blocks inline into post content:

  1. Shortcodes were extensively used in unanticipated places, including HTML attributes, because the bracket notation didn't enforce any limitations. By using HTML comments, however, there is a guarantee that the tags will only ever appear outside of HTML elements and thus can be reliably parsed.
  2. There are exiting HTML comments that are already used in post content in WordPress, including <!--more--> and <!--noteaser-->. So blocks are following that pattern.
  3. Plugins can introduce their own blocks. When these plugins are disabled, any of the plugin's blocks will then just be hidden, or rather the underlying fallback content contained inside the block will then be displayed. This is in stark contrast to shortcodes which then show up everywhere when the shortcode is no longer recognized.
  4. HTML comments are resilient when post content is manipulated by other editors, including the classic WP editor or editors in other apps. They won't get stripped out, though they also will be invisible.
  5. The post content remains the source of truth and changes blocks will be contained in revisions.

That being said, I don't think it's an either/or but a both/and proposition in terms of whether or not blocks are stored inline with post content or externally in a custom post type.

As I tweeted earlier today, the post content is not the only place that blocks will be stored. This is just one serialization option. Block attributes could be stored elsewhere instead, for example an Excerpt block would store data in post_excerpt field (see #1288 (comment)) whereas a Featured Image block would store data in postmeta. Blocks should be able to be stored externally in a block custom post type for re-use across multiple posts and also in “dynamic sidebars”. See also #1224 (comment).

@dmsnell
Copy link
Member Author

dmsnell commented Jun 27, 2017

Today while working on the <!--more--> tag I was reminded of why the coupling can be useful. We currently lack a correspondence between input post_content and output post_content. Instead, we get a parsed input, then forget entirely about the input, then generate a new output. It would be nice if we could preserve certain things, such as the original block if we failed-over to a fallback block.

@mtias
Copy link
Member

mtias commented Jun 28, 2017

Repeating here some of the twitter comments I left.

Our approach—as outlined in the technical overview introduction—was to augment the existing data format with the notion and intention of blocks in a way that didn't break the decade and a half fabric of content that WordPress has provided. In other terms, this optimizes for a format that prioritizes human readability (the html document of the web) and easy-to-render-anywhere over a machine convenient file (JSON in post-meta) that benefits the editing context primarily.

Like @westonruter mentions, this also gives us the flexibility to store those blocks that are inherently separate from the content stream (reusable pieces like widgets or small post type elements) elsewhere, and just keep token references for their placement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable
Projects
None yet
Development

No branches or pull requests

8 participants