-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser: Start tokenizing HTML within the blocks #2286
Conversation
b92c159
to
942cfa3
Compare
Having some trouble testing this branch because of the PHP parser. Do we need to regenerate it or something? |
3570d9a
to
cee22dc
Compare
This change adds a behavior which was present in the original parser but removed for some reason when it was pulled directly into Gutenberg. That behavior is producing a tree of tokenized HTML inside the blocks so that we can work with the simpler data structure than having to repeatedly parse and validate HTML strings from within the components and throughout the different paths in the editor. This change adds a new field to the parse output: `children`, which contains the tree as a list of objects, each child being able to reflect the structure of its parent with the same tree data structure. HTML tags appear as `type: HTML_Tag` with `name` equal to the tag name. Otherwise the HTML tags mimick the structure of blocks at the outermost level. Future work will replicate this pattern to add in nested blocks (also in the original parser but removed when it was brought into Gutenberg). Nested blocks would appear in the `children` array and contain full block structure, nested intuitively and without having an exceptional structure or indication.
cee22dc
to
e2db809
Compare
It seems like the PHP parser isn't working with
|
You can regenerate all the test fixtures by |
Thanks for letting me know! In case it's not clear, the entire purpose is to return the parsed data structure as well as the source text which led to the parse.
Thanks again! |
This is something I've wanted to see for a while... the need is a bit less immediate due to #1929 and follow-up changes, but I expect this can pave the way forward for parsing nested blocks including shortcodes in a more well-defined manner. I worry that with 10+ years of history in We'll need a new release of |
no and I tried naively to run the command you provided but it failed on account of something in |
That should be covered under https://github.com/WordPress/gutenberg/blob/master/CONTRIBUTING.md#php-testing |
related: @see #2210
This change adds a behavior which was present in the original parser but
removed for some reason when it was pulled directly into Gutenberg. That
behavior is producing a tree of tokenized HTML inside the blocks so that
we can work with the simpler data structure than having to repeatedly
parse and validate HTML strings from within the components and
throughout the different paths in the editor.
This change adds a new field to the parse output:
children
, whichcontains the tree as a list of objects, each child being able to reflect
the structure of its parent with the same tree data structure. HTML tags
appear as
type: HTML_Tag
withname
equal to the tag name. Otherwisethe HTML tags mimick the structure of blocks at the outermost level.
Future work will replicate this pattern to add in nested blocks (also in
the original parser but removed when it was brought into Gutenberg).
Nested blocks would appear in the
children
array and contain fullblock structure, nested intuitively and without having an exceptional
structure or indication.
It's my hope that components, queries, and validation can take place in the
easier/faster tree data structure instead of via the parsed HTML strings.
This could mean that attribute access is achieved with expressions as
basic as
block.children[ 0 ].attrs.href
instead of writing the equivalentquery. Further, I hope that one day we can add children not by rendering
HTML strings, but by working with the data structure itself and serializing
back to the HTML string only when we need to. This would be a very good
step if we wanted to open up the ability to store on non-stringy backends
(store elsewhere than
post_content
via a plugin: Simperium, JSON-in-post_meta
,a database table directly, some custom external service…). Should also pave
the way for other editors conforming to the Gutenberg spec which may not
have as easy of a time parsing HTML as the browser does.
cc: @jleandroperez
Testing
Since so many of our tests are testing for identical equality of output against hand-written fixtures (instead of testing for the relevant properties of some behavior under test) then this PR is breaking a lot of tests. At the moment I don't really have the will or the time to fix them 🙃. Can review next week.
You can open the editor and work as usual. Since we're just adding a new
children
property to the parser output there should be no breakage once the data structure is in Gutenberg.