Provide .toMathML() rendering of expressions, possibly replacing .toHTML() ? #3327

gwhitney · 2024-11-21T17:32:02Z

gwhitney
Nov 21, 2024
Collaborator

The plurimath project has a representation of arbitrary math formulas into numerous formats (latex, mathml, asciimath, etc). Providing a conversion from a mathjs parse tree to plurimath could replace toTex renderer and provide significant additional rendering capability.

gwhitney · 2024-11-21T17:50:54Z

gwhitney
Nov 21, 2024
Collaborator Author

Of course, it might add too much to the shipped mathjs bundle; since plurimath is translated from Ruby by Opal, I imagine it is a pretty heavy package.

1 reply

gwhitney Nov 22, 2024
Collaborator Author

Yes, my experimentation revealed that there is currently significant overhead each time you call plurimath to do a parse. So it's not at the moment suitable for use inside a library like mathjs. Hence, changing the title of this discussion to what was my real thrust when I posted it.

gwhitney · 2024-11-22T18:29:00Z

gwhitney
Nov 22, 2024
Collaborator Author

In general, now that MathML is supported across popular browsers, it appears that the best way for mathjs to provide the ability to display its expressions typeset attractively for the web is to provide a .toMathML() generation method on its expression tree Node types. This would be directly analogous to .toHTML(); one could insert the results of .toMathML() into the dom of a web page and see the expression nicely laid out.

In fact, these days it seems to me that most or all of the use cases for .toHTML() would be as well or better served by .toMathML(), so one approach would be to replace the .toHTML() code with .toMathML() code (perhaps leaving .toHTML() as an alias for .toMathML() so as not to completely break existing code that may be calling .toHTML() -- after all, these days MathML essentially is a part of HTML, so it wouldn't really be a lie to call it .toHTML(), just a major change in its behavior).

Or the .toHTML() code could be left in with a declaration that it is deprecated and will not be maintained and will be pruned at a later time as it bitrots.

In any case, implementing .toMathML() should not be particularly difficult. It will be entirely analogous to .toHTML(), just with different particulars at each node type.

From an architectural point of view, though, and the main reason for having a design discussion here rather than diving in, I think one should be skeptical of simply replicating the .toHTML() machinery completely analogously. I am very suspicious of the proliferation of "renderer" methods on each Node type: .toString(), .toTex(), .toHTML() -- even .toJSON() is sort of in this category. Adding another one feels like breaking the camel's back. These are all very similar transformations of an expression tree. It seems like it might be better to try to unify them in a sort of visitor pattern: have a traverse() type of method that takes the "guts" of the operation as an argument, organized in as clear and simple a way as possible. It seems like this could clarify/simplify existing code as well, by reducing the sort of structural redundancy between the existing .toString, .toTex, and .toHTML methods. One big decision if pursuing this sort of refactoring is where the details for each operation should be collected. In other words, right now we have the pieces of how to convert to each sort of format (string, LaTeX, HTML) in each Node type, so that all the knowledge about a RelationalNode (say) is in one place, but all of the information about HTML (say) is scattered. So adding a new format, like MathML, involves adding a piece to every node type. Maybe this pattern has ended up cluttering the code for the syntax trees, and it would be better to just have information about how perform traversals in the trees, and then collect all of the information about how to render to HTML in one place.

But I think either organization of where to put the information (collected with each Node type, or collected with each format type) should be possible with a refactor that prevents the need to add a new method to each Node type any time there is a need to add a new format, and reuses the traversal code rather than rewrites the traversal code for each format. But we might need to pick one way or the other for how to distribute the pieces. Since they are currently localized with the Nodes, we could keep it that way, but hopefully refactor so that rather than writing a new Node method for each format, you're just adding a quick "combiner function" for that format for that Node type and putting it in a record of formats that node supports.

To make this more concrete, if you look at RelationalNode.js, you will see that the code for toString, toTex, and toHTML all have a lot of similarity/redundancy between them: they all have to compare their precedence with their children, get the rendered versions of the children, possibly parenthesize children depending on precedence and parenthesis mode, and then combine those renderings in some way. It would be a real improvement, I think, to put that precedence and getting the children's renderings properly parenthesized code in a single place. In fact, at least at the moment, all of these methods consist of just interleaving the child renderings with some string based on each relation, so we could even combine that commonality code into a putative _render method for the Relational node type (that would take the output format as an argument), and put just the information of how
to render a single relation into a record keyed by format, something like:

static relationRenderers = {
  string: condition => ` ${operatorMap[condition]} `,
  HTML: condition => '<span class="math-operator math-binary-operator math-explicit-binary-operator">' 
    + escape(operatorMap[condition]) + '</span>',
  Tex: condition => latexOperators[condition]
}

That's it; as far as I can see, other than how to parenthesize a subexpression in each format, which should also be coded just once somewhere, not in each Node class, those are the only bits of code that differ among these three renderers. We could lose a lot of lines of code by doing similar things for each node type. And then adding a MathML renderer would be as simple as adding one more key to this record, e.g.

  MathML: condition => `<mo>${entityMap[condition]}</mo>`

where entityMap gives the HTML entity (either as an &...; type string or just as the unicode character) for each relational operator, so that we get ≤ for <=, for example.
Once this kind of refactor was in place, it would be close to trivial to implement further formats, such as a unicodeString format that takes advantage of unicode to produce things like "a² + b² ≤ π" for the corresponding expression tree (there is an existing semi-standard UnicodeMath format, which I learned about while investigating the Plurimath project; we could try to produce results consistent with that). In this way, mathjs could become some portion of a JavaScript-native Plurimath analogue.

So just to make the final proposal clear: we would add a render(options) method to Node, in which the format would be one of the options, and each of toString, toHTML, and toTex would become a trivial wrapper for e.g. render({...options, format: 'string'}) allowing significant code collapse in the individual node types and easing the addition of a MathML format, which is the real instigating goal for the change.

Thoughts?

3 replies

josdejong Nov 28, 2024
Maintainer

Great idea to implement support for MathML! Ideally, it should even be possible to convert a MathML into a mathjs expression too 😎. I did love to see that with Tex too, but the (la)tex standard is too large and there are too many dialects to make that work.

When the number of renderers grows, it may be interesting to separate them from the Node classes, and put them in a separate, standalone function like toTex(node, options) instead of a method node.toTex(options).

If such an abstraction for renderers like Node.render({...options, format: 'string'}) is possible I would love to see that. I'm a bit in doubt whether we can really treat String, HTML, Tex, and a future MathML rendering in a unified way. There are quite some very specific cases (see for example the implenentations in OperatorNode) that do smart stuff depending on nested arguments for example.

gwhitney Nov 28, 2024
Collaborator Author

Ideally, it should even be possible to convert a MathML into a mathjs expression too 😎.

In presentation MathML, there's not enough information about the intended meaning of the expression to do this reliably, I believe. There is also "Content MathML" that explicitly does specify the mathematical meaning of the expression, but it's very little used at the moment. But that does bring up the point that perhaps for future proofing, the interface should be expandable to allow conversion to either presentation MathML or content MathML, with the former being the only thing implemented at the moment (since as far as I know, there aren't any browsers that support content MathML).

josdejong Nov 29, 2024
Maintainer

👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide .toMathML() rendering of expressions, possibly replacing .toHTML() ? #3327

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Provide .toMathML() rendering of expressions, possibly replacing .toHTML() ? #3327

gwhitney Nov 21, 2024 Collaborator

Replies: 2 comments · 4 replies

gwhitney Nov 21, 2024 Collaborator Author

gwhitney Nov 22, 2024 Collaborator Author

gwhitney Nov 22, 2024 Collaborator Author

josdejong Nov 28, 2024 Maintainer

gwhitney Nov 28, 2024 Collaborator Author

josdejong Nov 29, 2024 Maintainer

gwhitney
Nov 21, 2024
Collaborator

Replies: 2 comments 4 replies

gwhitney
Nov 21, 2024
Collaborator Author

gwhitney Nov 22, 2024
Collaborator Author

gwhitney
Nov 22, 2024
Collaborator Author

josdejong Nov 28, 2024
Maintainer

gwhitney Nov 28, 2024
Collaborator Author

josdejong Nov 29, 2024
Maintainer