Page MenuHomePhabricator

Provide BlahTex as an extension
Closed, ResolvedPublic

Description

Author: lethe

Description:
According to recent discussion at http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Mathematics#MathML_.2F_improved_TeX_support, the developers
of Blahtex feel that the software is ready and the time is right to start taking steps towards committing Blahtex into a future version of MediaWiki and
deploying it on WikiMedia sites.

Let this bug report be a feature request for merging Blahtex into the MediaWiki tree. Let us discuss what we need to do to make sure the code is ready, and
then let's do it. There are many myths about why Blahtex cannot or should not be adopted, let us also dispel them.


Version: unspecified
Severity: enhancement
URL: http://gva.noekeon.org/blahtexml/

Details

Reference
bz6383

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:21 PM
bzimport added a project: Math.
bzimport set Reference to bz6383.
bzimport added a subscriber: Unknown Object (MLST).

CC'ing Aryeh Gregor who has a "MathML support" in his MediaWiki todo list.

The lack of support for MathML is the major issue in my daily use of Wikipedia and I personally feel very uncomfortable to read articles with the horrible PNG images as formulas. Hence I would definitely appreciate seeing this bug fixed and hopefully the advent of HTML5 will make this feasible. I'm commenting on this bug in order to open a discussion about a future integration of MathML in Wikipedia.

First, as said in the first comment, the purpose was to merge the Blahtex extension into Mediawiki tree. This extension, initially developed by David Harvey, is able to do TeX-to-MathML or TeX-to-PNG conversion and could replace texvc:

http://www.mediawiki.org/wiki/Extension:Blahtex

Some efforts were made to integrate Blahtex. It seems that the idea was to use Blahtex as a fallback of texvc for the TeX-to-PNG conversion ; and to use Blahtex for the TeX-to-MathML when the user enables MathML rendering. It was also needed to serve Wiki pages as XHTML but this won't be an issue any more with HTML5. Here is the instruction page:

http://www.mediawiki.org/wiki/Extension:Blahtex/Embedding_Blahtex_in_MediaWiki

I wonder whether this integration project is discontinued. The Wiki demo of Jitse Niesen at http://wiki.blahtex.org/ is no longer available and the last commit in http://cvs.berlios.de/cgi-bin/viewvc.cgi/blahtex/blahtex/ dated back to 2006/04/10.

The Blahtex extension is fortunately still maintained by Gilles Van Assche:
http://gva.noekeon.org/blahtexml/

ayg wrote:

This will be interesting to look at for the future, thanks. itex2mml was what I had been previously considering, but it looks like some of the Blahtex integration work is already done. I'd think that the best general approach would be to use texvc to generate a PNG; if that succeeds, use something to try generating MathML. If we have MathML, then (optionally) stick it in using some magic to hide it appropriately from browsers that can't see it, like

@namespace math url(http://www.w3.org/1998/Math/MathML); math { display: none } math|math { display: inline } math|math + img { display: none }

and output <math>...</math><img ...>, relying on the fact that an HTML5 parser will put <math> in the MathML namespace, otherwise it will be in the HTML namespace. Or use JavaScript.

This would need to be hidden behind an off-by-default user preference for the indefinite future, of course, until bugs are worked out. In practice, MathML in browsers currently looks worse than LaTeX-rendered PNGs, IMO. Try http://www.mozilla.org/projects/mathml/demo/roots.xhtml, for example -- the height of the square roots is totally wrong in Firefox unless you have the correct fonts installed, and even if you do, it looks noticeably uglier than actual LaTeX output.

Created attachment 7378
screenshot sqrt

(In reply to comment #2)

I'd think that the best general approach
would be to use texvc to generate a PNG; if that succeeds, use something to try
generating MathML. If we have MathML, then (optionally) stick it in using some
magic to hide it appropriately from browsers that can't see it, like

@namespace math url(http://www.w3.org/1998/Math/MathML); math { display: none }
math|math { display: inline } math|math + img { display: none }

and output <math>...</math><img ...>, relying on the fact that an HTML5 parser
will put <math> in the MathML namespace, otherwise it will be in the HTML
namespace. Or use JavaScript.
This would need to be hidden behind an off-by-default user preference for the
indefinite future, of course, until bugs are worked out.

I think the "ideal" approach would be to use the "altimg" attribute of the <math/> element:
http://www.w3.org/TR/MathML3/chapter2.html#interf.toplevel.atts

But for now, it's better to use a combination of math + img or another workaround of this kind, as you suggest.
FYI, here what's indicated in the integration project page:
http://www.mediawiki.org/wiki/Extension:Blahtex/Embedding_Blahtex_in_MediaWiki#Generating_MathML

In practice, MathML
in browsers currently looks worse than LaTeX-rendered PNGs, IMO. Try
http://www.mozilla.org/projects/mathml/demo/roots.xhtml, for example -- the
height of the square roots is totally wrong in Firefox unless you have the
correct fonts installed, and even if you do, it looks noticeably uglier than
actual LaTeX output.

I think there is a matter of preference (note that I'm often zooming when reading an article and so using PNG images is definitely uglier). I'm just curious... can you tell me which rendering you prefer in the screenshot given in the attachment and why?

For the height of the square roots, it's clearly a bug of Firefox when STIX fonts are not installed. Hopefully, that issue will be fixed in the near future and some other improvements in the rendering of stretchy operators will be added.

Attached:

sqrt.png (238×398 px, 5 KB)

ayg wrote:

(In reply to comment #3)

I think the "ideal" approach would be to use the "altimg" attribute of the
<math/> element:
http://www.w3.org/TR/MathML3/chapter2.html#interf.toplevel.atts

That won't work unless the browser supports MathML to begin with. Do any browsers that support MathML actually use the mathimg attribute if present? If not, I don't see any reason to provide it.

FYI, here what's indicated in the integration project page:
http://www.mediawiki.org/wiki/Extension:Blahtex/Embedding_Blahtex_in_MediaWiki#Generating_MathML

I don't like the idea of using texvc to generate images sometimes, Blahtex other times. We'd have inconsistent rendering, unless they're guaranteed to generate identical images when passed the same input.

Also, if MathML can be generated but PNG cannot, for some reason, we should return an error unconditionally. That's the only sane thing to do if we can't assume MathML support in all clients -- returning errors to some clients but not others is a recipe for total confusion.

I think there is a matter of preference (note that I'm often zooming when
reading an article and so using PNG images is definitely uglier). I'm just
curious... can you tell me which rendering you prefer in the screenshot given
in the attachment and why?

The first looks best to me. I'm not sure why the second doesn't look as nice to me -- too cramped, maybe? The last looks unbalanced compared to the first two. I'm sure it's subjective, but also probably because LaTeX is *the* standard math rendering language, so it's expected.

For the height of the square roots, it's clearly a bug of Firefox when STIX
fonts are not installed. Hopefully, that issue will be fixed in the near future
and some other improvements in the rendering of stretchy operators will be
added.

Yes, hopefully. :) If you have the right fonts, MathML certainly should look better if you zoom, or for printing. Anyway, step one is to make it possible as an option, then we can talk about defaults.

The first looks best to me. I'm not sure why the second doesn't look as nice
to me -- too cramped, maybe? The last looks unbalanced compared to the first
two. I'm sure it's subjective, but also probably because LaTeX is *the*
standard math rendering language, so it's expected.

The first is generated with Wikimedia and the second rendered by Firefox (with STIX fonts). Honestly, I can't see how one can say that the first is better than the other, except if one is really used to TeX rendering. Both TeX and Firefox use character variants of different size or composite characters to stretch the radical vertically. This technique does not give exact precision and that's why the radical is often a bit greater than the argument. However, if I had to write this formula by hand, I would naturally try to align the bottom of the radicals. I feel very strange that a technical limitation makes a particular rendering standard. The third is the rendered by a patched version of Firefox, with "perfect" stretching and that's the one I prefer. But obviously, it's really subjective...

That won't work unless the browser supports MathML to begin with.

I just wanted to mention the standard way to provide alternate image/text. That's why I was saying "ideal" i.e. when all browsers support either MathML with good quality or the image fallback, but that's not likely to happen soon. Of course I also prefer the pragmatic approach.

I don't like the idea of using texvc to generate images sometimes, Blahtex
other times. We'd have inconsistent rendering, unless they're guaranteed to
generate identical images when passed the same input.

Also, if MathML can be generated but PNG cannot, for some reason, we should
return an error unconditionally. That's the only sane thing to do if we can't
assume MathML support in all clients -- returning errors to some clients but
not others is a recipe for total confusion.

Yes, it makes sense to take texvc as the reference to detect input errors. If as claimed Blahtex supports a larger class of TeX commands, then MathML will be generated if the input is accepted by texvc.

So I think we agree on the way the integration should be done. I don't know Wikimedia/Blahtex source code at all but I guess when I've time and/or when I'm really fed up with PNG images, I'll have a look to this and try to help.

ayg wrote:

(In reply to comment #5)

The first is generated with Wikimedia and the second rendered by Firefox (with
STIX fonts). Honestly, I can't see how one can say that the first is better
than the other, except if one is really used to TeX rendering.

Who isn't, who works in any mathematical field? :)

So I think we agree on the way the integration should be done. I don't know
Wikimedia/Blahtex source code at all but I guess when I've time and/or when I'm
really fed up with PNG images, I'll have a look to this and try to help.

Okay, great. I should be available this summer to review patches. If you don't get around to it, I might find the time at some point. I don't think it would be too hard.

http://www.mediawiki.org/wiki/Extension:Blahtex is available as an extension since r15109 (28 June 2006). Requests for enabling it should be made in a new issue.