Page MenuHomePhabricator

html tags in text $ latex statements $
Closed, ResolvedPublic

Description

Author: mimamer

Description:
Occasionally, mediawiki places html tags inside of latex statements in text-only render mode, as examplified on http://en.wikipedia.org/wiki/Riemann_hypothesis. This prohibits automatic parsing of latex statements, and creates a weird formatting both when copy&pasting and in page display ($ sign indented on first line, latex statement on next line unindented). I would very much appreciate if mediawiki could be updated in this regard. Thank you in advance.

As a side question related to automated parsing, would it be possible to place latex statements inside of html span tags with class="tex" descriptors, equivalently to what is done for img tags in PNG rendering?


Version: unspecified
Severity: enhancement

Details

Reference
bz23190

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:04 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz23190.
bzimport added a subscriber: Unknown Object (MLST).

Can you be more specific about where the problem can be found ?

The only "html" i find there is like <img class="tex" alt="\log g(n) &lt; \sqrt{\operatorname{Li}^{-1}(n)}" src="http://upload.wikimedia.org/math/5/e/d/5edf1bc3778b2456213d7857b1f82f80.png" />

The content of that alternate text is '\log g(n) < \sqrt{\operatorname{Li}^{-1}(n)}', showing some characters as entities is needed on html. You should unentity any text you extract directly. Any html/xml parser will do it for you.

I have no problem copying and pasting that, btw.

mimamer wrote:

I was referring to the text-only render mode, i.e., if you go to preferences and select option "Leave it as TeX (for text browsers)" in wikipedia.

You will see that the Riemann zeta function in the first section is written like this in html:

<dd>$</dd>
</dl>
<p>\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \frac{1}{1^s} + \frac{1}{2^s} + \frac{1}{3^s} + \cdots. \! $

issue confirmed.

The math tag in question contains a newline at the start. In Math.php this is output as:
return ('$ '.htmlspecialchars( $this->tex ).' $');

So the indentation, $ + newline + tex. + our weird code to parse indentation and lists creates this output.
Related to bug 22818, which is the same for a different mode.

*** Bug 22818 has been marked as a duplicate of this bug. ***

mimamer wrote:

Awesome, thanks! I'm looking forward to that update on wikipedia.

Now, would it also be possible to mark tex text as class="tex", equivalently to what is done for PNG rendering?

I suppose the PHP output line should then be:
return ('<span class="tex">$ ' . str_replace( "\n", " ", htmlspecialchars( $this->tex ) ) . ' $</span>');

Thanks again!