Page MenuHomePhabricator

Keep original spacing when parsing <math> formulas
Closed, ResolvedPublic

Description

Operators like \sin, \log, ... should be separated from their argument by a
space. TeX does this automatically, but texvc puts no space there in HTML mode:
<math>\sin x</math> is rendered as <span class="texhtml">sin<i>x</i></span>,
which looks wrong. It is impossible to work around this problem by inserting an
explicit space, because <math>\sin\,x</math> forces PNG.


Version: unspecified
Severity: normal
OS: Linux
Platform: PC
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=20902

Details

Reference
bz6722

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:16 PM
bzimport added a project: Math.
bzimport set Reference to bz6722.
bzimport added a subscriber: Unknown Object (MLST).

jeluf wrote:

The problem is that inserting a space in each case is not the right thing, either.
<math>\sin(x)</math> should not be rendered as "sin (x)".

The fix would be to preserve the original spacing.

> Changing summary.

jeluf wrote:

*** Bug 9022 has been marked as a duplicate of this bug. ***

jeluf wrote:

Text of Bug 9022:

I'm trying to get the mhchem package working on my mediawiki install, but am
running into a problem with whitespace.
Mhchem is set up correctly, and I can access it from the command line, but
mediawiki doesn't render anything correctly. After
further investigation, I found that texvc is stripping the whitespace from each
equation before rendering it.

So when I type:
<math>\ce{H+ + OH- <=>> H2O}</math>
on the wiki

The .tex file is:
\ce {H-+OH-<=>>H2O}

How can I preserve the whitespace so that my equations render correctly?

(In reply to comment #1)

The problem is that inserting a space in each case is not the right thing,
either.
<math>\sin(x)</math> should not be rendered as "sin (x)".

The fix would be to preserve the original spacing.

> Changing summary.

Is it wise? The problem is an incompatibility of texvc HTML rendering with TeX PNG rendering of the same formula. Making the texvc translation sensitive to whitespace would only create *another* incompatibility, and a serious one I suspect: it is likely that loads of <math> tags in Wikipedia rely on the usual TeX rules for ignoring spaces. Arguably, <math>\sin(x)</math> *should* be rendered in HTML as "sin (x)", because that's what already happens in PNG. (Actually, thin spaces would be more appropriate in both cases.)

I do not understand in what sense is this bug related to 9022. AFAICS the issue there is that texvc strips whitespace even when the equation is passed to TeX (which is usually harmless, but here it makes a difference because of some macro which uses spaces to split its argument or some such). The presence or absence of spaces on input has no effect whatsoever on TeX processing of $\sin(x)$.

jeluf wrote:

<math>\sin(x)</math> is rendered as

<span class="texhtml">sin(<i>x</i>)</span>

There's no space in front of the (. That's how it should be.

<math>\sin x</math> is rendered as

<span class="texhtml">sin<i>x</i></span>

with no space between sin and x, which is wrong.

When the original spacing is kept, TeX produces the right output, as would HTML.

(In reply to comment #5)

When the original spacing is kept, TeX produces the right output, as would
HTML.

TeX produces the right output whether the original spacing is kept or not (with a few exceptions), because the TeX typesetting engine ignores spaces in math mode. The fact that there is an ASCII space character in <math>\sin x</math> is absolutely irrelevant as to whether there should be a space in the output. You can find plenty of cases where the expected output is opposite to the \sin situation, e.g. <math>\forall x</math> should be (and is) rendered in HTML with no space, whereas <math>\alpha\le\beta</math> should be (but is not) rendered with two spaces.

People know that TeX behaves like this, hence spaces in source <math> tags are not correlated to expected spaces in the output, and making the HTML translation suddenly preserve the spacing would produce a lot of bogus spaces in existing WP pages and vice versa. Let alone the fact that the space in <math>\forall x</math> above is *required* for syntactical reasons, as <math>\forallx</math> is unparseable.

What really happens is this. The HTML translation tries to emulate TeX as far as possible. It ignores space characters, because TeX ignores space characters. Then it inserts spaces in some places based on the type of the elements, because TeX inserts spaces there: e.g., <math>x=y</math> is rendered as <i>x</i> = <i>y</i>. However, this part of the translation mechanism is *incomplete*, it misses some cases such as <math>x\le y</math> or <math>\sin x</math>. This is the bug, and making the HTML translation sensitive to input space characters is not going to solve it.

(Caveat: what I say about TeX are facts, whereas what I say about any part of mediawiki is pure speculation based on its observed behaviour.)

giecrilj wrote:

My vote should not be regarded as support for preserving of spaces, I fully agree with Emil.
Test case:
<math> A \times B </math>
Got:
''A''&times;''B'' (unreadable)
Should get:
''A''&nbsp;&times;&nbsp;''B''
Operators and predicate symbols in HTML output should get non-breaking spaces on both sides. Knuth gave a detailed spacing table for cases where various entities meet; such precision is not needed with HTML output but that table should be regarded as a guideline.
Exceptions:
\cdot => &sdot;
\suchthat => :&nbsp;
but
\colon => &nbsp;:&nbsp;

nmichalo wrote:

Some issues raised in this bug, in particular the way texvc handles the html spacing of \sin x and \sin(x) has been corrected in r86962

r86962 has been provisionally reverted as there are no tests or even ad-hoc examples of what needs to be tested along with the commit.

Spacing in the HTML output *should* be testable in the parser test cases (mathParserTests.txt) but it might need a tweak to adjust the user math rendering preferences for HTML tests.

Looks like this only describes HTML spacing, not the tex output spacing (which is of course harder to compare in an automated way).

See bug 18912 comment 12: the patches for this seem to break existing usages such as \sin{x}. Needs more thorough testing.

That's resolved with all the followups correctly applied -- reapplied with tests on trunk in r97034.

majobug wrote:

I experience a related problem: when converting the formula
<math>\log_2 N</math>
to HTML, the result is
"log&#160;<sub>2</sub><i>N</i>"
i.e. there is a space between the "log" and the subscript 2 that should not be there, while the space between the 2 and the N is missing.

physik wrote:

<math>\log_2 N</math> looks fine html rendering is not used for that example.