Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lexical-markdown] Bug Fix: support link and inline code text formats #7004

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

AlessioGr
Copy link
Contributor

@AlessioGr AlessioGr commented Dec 30, 2024

Fixes #5148. Additionally, it fixes an issue where formatted code blocks (e.g.

**`code`**

are not exported to markdown correctly after they have been imported.

This PR refactors the logic for applying text match and text format transformers to enable support for nested text formats within text match transformers, such as link nodes.

Unlike the previous attempt to fix it, this change does not include any node-specific logic. It fixes the root cause of the issue by ensuring that nested combinations of textmatch and textformat transformers are applied in optimal order.

Before

CleanShot.2024-12-30.at.11.56.42.mp4

After

CleanShot.2024-12-30.at.11.56.03.mp4

The Problem

Previously, the import process for text transformers roughly followed this sequence:

ElementTransformers ($importBlocks) => TextFormatTransformers =if not found> TextMatchTransformers

For link nodes containing formatted text, the process failed as follows:

  1. Executed $importBlocks
  2. Found code markdown within link => Run code text format transformer => create code textnode
  3. The input text is split into 3 nodes: normal text, code text, and normal text. However, this fragmented structure prevents the link text match transformer from recognizing and creating a link node.

Initially, I attempted to solve this issue by adjusting the sequence to prioritize text match transformers:

ElementTransformers ($importBlocks) => TextMatchTransformers => TextFormatTransformers

While this resolved the issue with nested formats, it introduced a new problem in scenarios where links were wrapped by text format markdown, like this:

Text **boldstart [text](https://lexical.dev) boldend** text

Now, the link is created first and we get normal text, link, normal text. However, the bold text transformer could no longer identify and apply formatting to the entire outer bold range.

The Solution

Text format transformers already include logic to identify the outermost match, allowing them to handle scenarios like:

One **two __three__ four**

In this case, the bold transformer runs first, followed by the italic transformer, ensuring proper formatting.

However, text match transformers currently lack similar logic. Consider this example:

Text **boldstart [`text`](https://lexical.dev) boldend** text

The existing sequence processes it as:

  1. Bold text
  2. Code text
  3. Link

However, to achieve correct results, the sequence should be:

  1. Bold text
  2. Link
  3. Code text

To address this, the PR introduces logic for identifying the outermost match across both text match and text format transformers.

With this change, text match and text format transformers are treated as equals in priority, allowing their results to be compared directly. This ensures that the outermost match—whether from a text match or a text format transformer—is correctly identified and then applied, enabling seamless handling of nested text transformations.

Implementation details

  1. Find (not apply) outermost text match
  2. Find (not apply) outermost text format
  3. Determine if found text match or text format is the outermost match
  4. Apply said text match or text format
  5. Repeat this for the matched node, and the node before / afterwards if text was split, until no more text matches and text formats are found

Copy link

vercel bot commented Dec 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
lexical ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 28, 2025 6:14am
lexical-playground ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 28, 2025 6:14am

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 30, 2024
Copy link

github-actions bot commented Dec 30, 2024

size-limit report 📦

Path Size
lexical - cjs 29.07 KB (0%)
lexical - esm 28.86 KB (0%)
@lexical/rich-text - cjs 38.04 KB (0%)
@lexical/rich-text - esm 30.92 KB (0%)
@lexical/plain-text - cjs 36.55 KB (0%)
@lexical/plain-text - esm 28.22 KB (0%)
@lexical/react - cjs 39.85 KB (0%)
@lexical/react - esm 32.28 KB (0%)

@AlessioGr
Copy link
Contributor Author

Just squeezed in another fix for formatted inline code markdown, e.g.

**`code`**

Didn't want to open a separate PR, as this one is a relatively large refactor and I wanted to make sure this fix is compatible

@AlessioGr AlessioGr changed the title [lexical-markdown] Bug Fix: support link text formats [lexical-markdown] Bug Fix: support link and inline code text formats Jan 7, 2025
etrepum
etrepum previously approved these changes Jan 13, 2025
Copy link
Collaborator

@etrepum etrepum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look too carefully at the logic in the while loop other than to confirm that it should still terminate, I think the unit test coverage is probably sufficient to show that it's not more wrong than the status quo

packages/lexical-markdown/src/MarkdownExport.ts Outdated Show resolved Hide resolved
} else {
return linkContent;
}
? `[${textContent}](${node.getURL()} "${title}")`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this title should be escaped because there could be embedded " and/or )?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I escaped this with some simple regex.

Should we consider a library like dompurify to protect against XSS attacks here? While it's just markdown, XSS could still be an issue if (and depending on how) this is rendered to HTML.

Copy link
Contributor Author

@AlessioGr AlessioGr Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted the escaping change. This broke a unit test and did not work when the markdown was re-imported to lexical.

The latter is a separate issue I have experienced, where escaped markdown is not un-escaped properly when imported. E.g. \*text\* is imported as<span> \*text\*</span> instead of <span>*text*</span>

I think fixing this properly may not be trivial and is out of scope for this PR anyways, since this PR did not introduce this unescaped title being outputted

Comment on lines 67 to 69
result.nodeAfter &&
$isTextNode(result.nodeAfter) &&
!result.nodeAfter.hasFormat('code')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With how often this expression is repeated it's probably worth making a function to cover $isTextNode(node) && !node.hasFormat('code'). Checking for non-null/undefined is redundant because $isTextNode already does that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! I went for canContainTransformableMarkdown as that conveys the intent and makes the code where it's used more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While doing this, found another area that I was able to optimize:

CleanShot 2025-01-20 at 23 06 22@2x

I think we can move the remaining logic up to the $transform call out of the editor.update() call as well - what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. extended-tests Run extended e2e tests on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Markdown import/export of formatted link text
3 participants