Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Math is not rendered in Library Card previews #1508

Open
Tracked by #1511 ...
ormsbee opened this issue Nov 18, 2024 · 8 comments
Open
Tracked by #1511 ...

Math is not rendered in Library Card previews #1508

ormsbee opened this issue Nov 18, 2024 · 8 comments
Labels
bug Report of or fix for something that isn't working as intended

Comments

@ormsbee
Copy link
Contributor

ormsbee commented Nov 18, 2024

Problems and HTML blocks can both take advantage of MathJax to render math. This works in the preview sidebar, but the raw markup is presented in the preview description on the card:

Screenshot 2024-11-18 at 11 13 23 AM

This makes the preview much less useful.

Suggested Approach: MathML conversion

MathJax is slow to load, and has long been a performance pain point in the courseware. While it has some very nice features, we don't need that much in the preview cards. Meanwhile, MathML has now become broadly supported by modern browsers.

Markup for MathJax in platform is supposed to use the following guidelines:

  • Inline equations use \( equation \) or [mathjaxinline] equation [/mathjaxinline] notation.
  • standalone equations use \[ equation \] or [mathjax] equation [/mathjax] notation.

The input text format for equation can be LaTeX or AsciiMath. Python libraries exist to convert from those formats to MathML:

Using that, we should be able to convert the content to MathML when search indexing happens for a document in edx-platform. So something like:

pattern = re.compile(r'\[mathjaxinline\](.*?)\[\/mathjaxinline\]')

def repl(match):
    return convert(match.group(1))

block_data[Fields.description] = re.sub(pattern, repl, description)

... but generalized to handle the different formats and tag delimiters.

Acceptance criteria:

  1. Prioritize LaTeX over AsciiMath if there's any chance of ambiguity, since it seems to be the more commonly format.
  2. Must fail gracefully. If neither converter works for a particular piece of text, the search preview should return the original text, stripped of the mathjax delimiters. So [mathjaxinline]{unparseable}[/mathjaxinline] would return as {unparseable}. We never want to break search indexing, and showing the delimiters in the preview is unhelpful.
  3. Frontend must display the rendered text. The description is currently treated as plaintext.
  4. Search highlighting should still work. Search terms should still be highlighted in the preview card text.

It is likely that this won't support every edge case that MathJax handles, but it should hopefully give us the majority of the benefit with minimal overhead. We can also use information gathered from this in order to see where server-side handling of these things is incomplete, to inform any future steps we might take around doing backend conversion for rendering equations in courseware.

@ormsbee ormsbee added the bug Report of or fix for something that isn't working as intended label Nov 18, 2024
@ormsbee ormsbee moved this to Backlog in Libraries Overhaul Nov 18, 2024
@ormsbee ormsbee changed the title Math is not rendered in the Card preview Math is not rendered in Library Card previews Nov 18, 2024
@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Nov 18, 2024

@ormsbee I've been thinking along similar lines, but I would prefer we start with a much simpler conversion from mathjax syntax to plain text. Then we don't need to add support for rendering HTML into the cards.

For example, preview snippet: [mathjaxinline]X \sim f(x) = \frac{1}{2b} e^{i\pi}

Rendered in the MathJax view:
Screenshot 2024-11-18 at 10 49 29 AM

Rendered in the card as plain text: 𝑋 ∼ 𝑓⁡(𝑥) = (1/2⁢𝑏)𝑒^(𝑖⁢𝜋)

Screenshot 2024-11-18 at 10 55 29 AM

This can [hopefully?] be implemented with a series of simple re.sub() commands. [Though maybe some more complex parsing is required for nested expressions...] This doesn't need to support every LaTeX command or even most of them. Just enough to cover 80% of equations. Between that and stripping out the [mathjaxinline] tags, I think it will mostly solve the problem and in a much simpler way, without having to introduce the security risk of adding rich text support to the preview text.

@bradenmacdonald
Copy link
Contributor

Also, keep in mind that the snippets are truncated along word boundaries and in some cases that may result in invalid LaTeX code in the description. \frac{1} {2} is valid LaTeX, but if truncated to \frac{1} and you try to convert that to MathML, you'll get an error. Likewise if you do the conversion first, and then the description is truncated in the middle of the MathML code, it won't display at all.

That's why I think doing the conversion to plain text before even indexing it is great - it's simple, robust, and increases the information density of the description.

@ormsbee
Copy link
Contributor Author

ormsbee commented Nov 19, 2024

I realized that rendering the HTML would make it more vulnerable to injection attacks, but I had thought that it was okay because we strip tags before generating the searchable description–so this would be adding MathML to what was essentially plain text at that point.

I didn't realize that the snippets would get truncated on word boundaries like that and break formatting. I had hoped that we could output all the MathML and show a limited window into it. 😞

The one thing that gives me pause about the approach you're proposing is that it seems like it's getting very close to re-inventing AsciiMath.

# Plaintext in your example:
𝑋 ∼ 𝑓⁡(𝑥) (1/2⁢𝑏)𝑒^(𝑖⁢𝜋)

# Legal AsciiMath
X ~ f⁡(x)=(1/(2b))e^(iπ)⁢

So depending on how complex the re.sub() invocations get, it might be worthwhile to explore py-asciimath's TeX -> AsciiMath conversion facilities.

@bradenmacdonald
Copy link
Contributor

it might be worthwhile to explore py-asciimath's TeX -> AsciiMath conversion facilities.

Absolutely. No need to reinvent the wheel. Though you can get much nicer output by using Unicode math characters rather than just ASCII. There's also https://www.unicodeit.net/ / https://github.com/svenkreiss/unicodeit which is a python library to do exactly this, though it doesn't seem to support \frac properly.

@navinkarkera
Copy link
Contributor

@bradenmacdonald @ormsbee I have created a PR: openedx/edx-platform#36055 that makes use of unicodeit to render these equations in plain text. Also tried to handle some cases like \frac that are not supported by it.

@navinkarkera navinkarkera moved this from Backlog to Ready for AC testing in Libraries Overhaul Jan 20, 2025
@navinkarkera
Copy link
Contributor

@jmakowski1123 @lizc577 @sdaitzman @marcotuts @ormsbee This is ready for AC testing on the sandbox

@bradenmacdonald
Copy link
Contributor

@navinkarkera
Copy link
Contributor

navinkarkera commented Jan 21, 2025

@bradenmacdonald Yes, editing and saving the components updates the preview. I'll manually reindex the content.

Update: Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report of or fix for something that isn't working as intended
Projects
Status: Ready for AC testing
Development

No branches or pull requests

3 participants