-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support custom extensions "interrupting" built-in tokens #3435
Comments
The way we interrupt paragraph is by clipping src when passing it into the tokenizer Line 235 in 2124b5d
We could do something similar with other tokenizers. Although I'm not sure this is needed if we just say built in tokens take precedence over custom tokens. In well formatted markdown every block token should be separated by a blank line. The only reason start is actually needed is for inline tokens. |
For example the katex extensions block tokenizer does not have a start function because we are expecting a blank line before it so even a paragraph takes precedence. https://github.com/UziTech/marked-katex-extension/blob/main/src/index.js#L63 |
I remember. I wrote that. 😜
Pretty markdown might, but the specs still make it clear that it is valid to place certain block tokens directly against each other. demo example
Remember, we have separate handling for paragraphs and inline text. Paragraphs are clipped by block tokens Line 237 in 2124b5d
Line 436 in 2124b5d
If we did, I think it would only need to be tables and blockquotes to keep with the GFM spec. The other block tokens have a clear ending symbol (fences), or are allowed to just absorb the block tokens (lists). Maybe that's not too bad? |
The block tokenizer start function is not needed if you don't need to interrupt a paragraph. Paragraphs are automatically interrupted by blank lines. |
What pain point are you perceiving?.
Not sure the best way to describe this. So currently, custom extensions have the
start
property which we use to interrupt theparagraph
element. But there are other tokens that are interruptable according to the Commonmark/GFM spec. For example, GFM Tables must end when they encounter another block-level token.The difficulty comes with enforcing that rule for custom extensions. Say I make a new block-level token via custom extensions
If this were placed immediately after a Table, the table would just consume it, because it does not interact with the
start
property in the same way thatparagraph
does. You could roll your own Table tokenizer that does nothing but except add a few more characters to the Rules regex, but this seems like a lot of effort just to make your extension compatible with GFM rules.Describe the solution you'd like
I really don't know how this would be implemented, but the desire would be a way for an extension to signal which tokens it can interrupt. Or, maybe better the other way around, allow a token to specify which types of other tokens can interrupt it.
One thing to consider, is that each token is also a little different in terms of at what points it can be interrupted. Blockquotes can only be interrupted during the "lazy continuation" step. Paragraphs can be interrupted any time. Tables can only interrupted if the line starts without
|
. Not every token can be interrupted by the same kinds of tokens.I kind of hacked my way around this for Tables using my own extension Marked-Extended-Tables by allowing the user to input "termination" regex that would be appended to the tokenizer and cause table to stop lexing on that line.
https://github.com/calculuschild/marked-extended-tables/blob/9e56b24598e07de71e225d6c50a50d40c366965f/src/index.js#L23-L25
Not sure if this is the easiest way to go about it, but the trickiest part is somehow applying that to the built-in tokens without just ending up rewriting every tokenizer anyway.
Mostly I'm just kind of stumped on any better way to do this.
The text was updated successfully, but these errors were encountered: