Page MenuHomePhabricator

Many character sets don't work in texvc
Closed, ResolvedPublic

Assigned To
Authored By
grin
Oct 28 2004, 8:29 PM
Referenced Files
F3112407: 1.PNG
Dec 17 2015, 2:05 PM
F3112409: 2.PNG
Dec 17 2015, 2:05 PM
F1363: patch.diff
Nov 21 2014, 7:02 PM
Tokens
"Love" token, awarded by He7d3r.

Description

The page (recent as today, 22:18 CET) contains a Math with the above error.
The editor said he changed the math several times and got different errors,
mostly probably syntax errors. This error message however complaining about
not syntax problems but installation ones, which is very weird.

jeronim_ checked and said:

22:20:34 <@jeronim_> i dunno what that math error is about
22:20:45 <@jeronim_> it's not just yongle, it's other machines
22:21:35 < grin> jeronim_: editor said he changed a word in the math
22:21:49 < grin> jeronim_: and the error come up. maybe page history shows it,
I try to check
22:22:03 <@jeronim_> sorry i can't really help beyond just seeing if the right
software is installed
22:22:11 <@jeronim_> you'd have to ask someone else


Version: unspecified
Severity: normal
URL: https://meta.wikimedia.org/wiki/Help:Displaying_a_formula?oldid=3698791#Rendering
See Also:
T38496
T50032

Details

Reference
bz798
TitleReferenceAuthorSource BranchDest Branch
Update buster-based images to composer 2.1.8repos/releng/dev-images!7jforrestercomposer-2main
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 7:02 PM
bzimport added projects: Math, I18n.
bzimport set Reference to bz798.

buggy math moved to talk page until fixed.

Bug:
*<math> \mbox{pá} - </math> - bad
*<math> \mbox{pa} - </math> - good

Seems to be something messed up about accented/utf8 chars and minus sign.

foenyx wrote:

*** Bug 1759 has been marked as a duplicate of this bug. ***

  • Bug 1799 has been marked as a duplicate of this bug. ***
  • Bug 3752 has been marked as a duplicate of this bug. ***

bggoldbg wrote:

IMHO bug 3752 is not wrong but missing functionality in TeX. For example I do
not know whether the Knuth's font does have cyrillic letters at all. Thus I
consider this as a request for enhancement.

  • Bug 4199 has been marked as a duplicate of this bug. ***
  • Bug 4533 has been marked as a duplicate of this bug. ***

jan.kraljic wrote:

Is there any work going on to solve this bug?

There's not really anyone who's familiar with the TeX stuff who's been
active in the last couple years.

angus wrote:

The reason of this problem could be simply that TeX is trying to load the
package ucs.sty and dies when it does not find it. If that's the case, you
should either run

  1. apt-get install latex-ucs

or apply the attached patch. (But installing the package is better because it
covers a large unicode range.)

Index: texutil.ml

RCS file: /cvsroot/wikipedia/phase3/math/texutil.ml,v
retrieving revision 1.12
diff -u -r1.12 texutil.ml

  • texutil.ml 12 Jan 2006 20:38:31 -0000 1.12

+++ texutil.ml 18 Jan 2006 21:05:37 -0000
@@ -44,7 +44,7 @@
let tex_mod_reset () = (modules_ams := false; modules_nonascii := false;
modules_encoding := UTF8; modules_color := false)

let get_encoding = function

  • UTF8 -> "\\usepackage{ucs}\n\\usepackage[utf8]{inputenc}\n"

+ UTF8 -> "\\usepackage[utf8]{inputenc}\n"

| LATIN1 -> "\\usepackage[latin1]{inputenc}\n"
| LATIN2 -> "\\usepackage[latin2]{inputenc}\n"

max wrote:

The problem with the original LaTeX is that you have to switch font encodings
manually. E.g. for an English/Russian/Polish/Greek text three encodings should
be used: latin (T1), cyrillic (T2A) and greek (LGR). Something like this:

\documentclass{article}
\usepackage[utf8x]{inputenc}
\usepackage[T2A,LGR,T1]{fontenc} % The last encoding is default
\newcommand\cyr[1]{\bgroup\fontencoding{T2A}\selectfont #1\egroup}
\newcommand\grk[1]{\bgroup\fontencoding{LGR}\selectfont #1\egroup}
\pagestyle{empty}
\begin{document}
$$ a=b\quad\mbox{if/\cyr{если}/jeśli/\grk{εἰ}}\quad c=d $$
\end{document}

It works, but quite ugly. And I completely don't know how to deal with
right-to-left scripts and CJK.

branko.kokanovic wrote:

adds additional custom preamble to TeX code through texvc arguments

There's new variable that should be set to anything that one wants to be
appended to TeX preamble. Example:
$wgTeXPreambleAdditional="\usepackage[T2A]{fontenc}\nAnother line in preamble";

Attached:

valentin_st wrote:

Another example:
<math> C = BW \times \log_2 \left( 1+\frac{P_с}{P_ш} \right) </math>
http://bg.wikipedia.org/wiki/Беседа:Пропускателна_способност

  • Bug 6596 has been marked as a duplicate of this bug. ***

h-j.luecking wrote:

I get this error when setting $wgUseTeX = true; in localsettings:

Es gab einen Syntaxfehler in der Datenbankabfrage. Die letzte Datenbankabfrage
lautete:

(SQL-Abfrage versteckt)

aus der Funktion „MathRenderer::_recall“. MySQL meldete den Fehler „1267:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and
(utf8_general_ci,COERCIBLE) for operation '=' (localhost)“.

(In reply to comment #16)

This is not connected with this bug. The error is probably caused by wrong table definition (and the fact that you
use UTF-8 character set): The math_inputhash column in the math table should have explicit binary collation.

jutiphan wrote:

We are having this problem in Thai Wikipedia. Thai characters do not work properly with Math tags and need some help. Thanks for anyone who can shed the light on
this.

  • Bug 8305 has been marked as a duplicate of this bug. ***
  • Bug 8316 has been marked as a duplicate of this bug. ***

don-cles wrote:

I added these two lines to page http://eo.wikipedia.org/wiki/Kemia_ekvilibro

:<math>\mbox{rapido de antauxena reakcio} = k_+ {A}^\alpha{B}^\beta \,\!</math>
:<math>\mbox{rapido de inversa reakcio} = k_{-} {S}^\sigma{T}^\tau \,\!</math>

The second line works okay; the first fails, apparently because of the ux combination which it is supposed to convert to ŭ.

happy.melon.wiki wrote:

(In reply to comment #21)

I added these two lines to page http://eo.wikipedia.org/wiki/Kemia_ekvilibro

:<math>\mbox{rapido de antauxena reakcio} = k_+ {A}^\alpha{B}^\beta \,\!</math>
:<math>\mbox{rapido de inversa reakcio} = k_{-} {S}^\sigma{T}^\tau \,\!</math>

The second line works okay; the first fails, apparently because of the ux
combination which it is supposed to convert to ŭ.

The page now seems to correctly render the ŭ character correctly. The testcases in c2 also display correctly. Assuming FIXED.

ragibhasan wrote:

Are you sure that this has been fixed? I just tried the following formula in :bn:, and it still shows a parse error:

:<math>\mbox{কখগ} = k_+ {A}^\alpha{B}^\beta \,\!</math>

The error message shows: পার্স করতে ব্যর্থ (PNG রূপান্তর ব্যর্থ; latex, dvips, gs, এবং convert ঠিকমত ইন্সটল হয়েছে কি না পরীক্ষা করুন): \mbox{কখগ} = k_+ {A}^\alpha{B}^\beta \,\!

The translation in English is: Failed to parse (Failed to convert to PNG; please check if latex, dvips, gs, and convert are installed correctly)

I also noticed that we cannot use Bengali numerals (in unicode UTF-8) inside latex formulas. That gives us the failure to parse error in bn.wikipedia.

Well at least latin script unicode works (latin extended block), but see:

http://en.wikipedia.org/wiki/User:Grin/mathtest

Indeed apart from latin script it still fails.

fibonacci.prower wrote:

Not even for Latin script. <math>í</math> gets me the following error:
Failed to parse (lexing error): í

It seems that it will only work if the non-ASCII text is inside an mbox.

sumanah wrote:

Branko, thank you for your patch. I am sorry it's been unreviewed for so long; I am 99% certain that it's been somewhat obsoleted since you wrote it. Is this bug still reproducible? If so, would you be interested in revisiting it?

(In reply to comment #27)

Is this bug still reproducible?

Per [[meta:Help:Displaying_a_formula#Rendering]], \mbox{ð} and \mbox{þ} will give an error:

  • Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): \mbox {ð}
  • Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): \mbox {þ}

These error messages are still displayed on that metawiki page.

With MathJax this can be set to resolved. Mathjax site has a list of compatibility here: http://www.mathjax.org/resources/browser-compatibility/

so basically it is supported on all browsers and platforms.

The PNG conversion is still failing on WMF wikis (just checked on the documentation page mentioned on comment 28).

Besides, until MathJax is enabled by default (bug 36496), it can not be considered a fix to this bug which still happens on Wikipedia.

OK, so the difference between mbox and text seems to have disappeared at some point. The problem is now that not all character sets are supported.

Unfortunately LateX doesn't support full unicode. Perhaps we should consider switching to XeTeX ?
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=xetex
http://en.wikipedia.org/wiki/XeTeX

(In reply to comment #31)

OK, so the difference between mbox and text seems to have disappeared at some
point. The problem is now that not all character sets are supported.

Unfortunately LateX doesn't support full unicode. Perhaps we should consider
switching to XeTeX ?
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=xetex
http://en.wikipedia.org/wiki/XeTeX

It doesn't make sense to address this before resolving bug 34038 first. It'd probably be relatively easy to convert some Unicode input into something LaTeX renders correctly, but then the initial incentive to use LaTeX as a format -- i. e. freely transfer text between wiki and LaTeX documents -- gets lost completely.

I think the question should be directed the other way, lifted of any past efforts: If a British/French/Bengali/Arabian wiki author wants to enter a formula, what formats a) ease that work and b) are well established? If most authors will use a formula editor, the format choice can be guided mainly by technical considerations. If we expect most formulas to be entered manually by people unfamiliar with TeX, the latter would be an odd choice as its behaviour can be as surprising as MediaWiki's wiki parser and a format that would /define/ a formula instead of being /commands/ to a typesetter would clearly be preferable.

Even a change to use XeTeX should IMHO be reflected by the use of a new tag ("<math-xe>" or something similar), so that we don't cause more headaches than neccessary.

  • Bug 54778 has been marked as a duplicate of this bug. ***

sodabottle wrote:

Hi Brion,

As a temporary fix for Tamil wikipedia (Bug 54778), can MathJax be enabled as default in Ta Wiki alone. I read bug 36496 and it says it wasn't made default in wiki projects because of slow loading time in low-end computers. I tested some math heavy pages in some low end machines (1GB Ram, Win XP) and the time seems acceptable. Currently we face the choice between "fast page load with render as png but with errors" vs "default MathJax".

If I can obtain community consensus is it possible for making MathJax default for Ta Wiki?

physik wrote:

I think we should wait until Math 2.0 is deployed. This enables the same filtering of the commands sent to MathJaX as those sent to latex. This prevents that the grammar diverges.
Furthermore Frederic Wang did major improvments to the matjax loader, that depend on mathjax 2.3. I would strongly recommend to wait until these changes are merged as well.

@SodaBottle: Can you open up another bug + start the community discussion for it as well? Thanks!

Just to throw it out there. If the math extension used MathJax on the backend, then it seems a lot of these problems would go away. MathJax's TeX-input is slightly more powerful than texvc, is designed for a web environment and would remove the need for sanitization.

physik wrote:

@Peter: I think we should not make the same mistake
(to use a not well defined subset of latex extended by some customized macros) again and use the new MathJax language instead of texvc.
I think changing the language of math input that is shared between all languages should be a common process.

@Moritz I understand your concerns but would argue that MathJax consists of a well defined subset of TeX.

I think changing the language of math input that is shared between all

languages should be a common process.

I don't understand that part of your message :( Was something lost by an accidental edit?

physik wrote:

I think changing the language of math input that is shared between all

languages should be a common process.
I mean natural languages. At the moment all wiki installations use the same texvc input language like eg. \sen

I just googled for texvc discussion and found some parts:
http://meta.wikimedia.org/wiki/Texvc

Maybe this becomes off-topic... However, it supports my argument that there should be a discussion how the restricted set of input commands should look like. Especially it should not be determined by the technical limitations of the x-Rendering-Software.

(In reply to comment #40)

I just googled for texvc discussion and found some parts:
http://meta.wikimedia.org/wiki/Texvc

Thanks. That's very interesting.

Maybe this becomes off-topic...

Probably.

However, it supports my argument that there
should be a discussion how the restricted set of input commands should look
like.

I agree with that but...

Especially it should not be determined by the technical limitations of
the x-Rendering-Software.

I find this too idealistic. In reality, there aren't many solutions for math on the web, all of which have with their own limitations and advantages in a MW setting. The critical question is: what direction MW and its community (in particular Wikipedia) want to take mathematical and scientific content. As suggested by WMF, I tried to start a discussion about this on Wikitech-I but not much came out of it. So the answer seems to be: nobody cares.

Which is why I fully agree with you (but it makes me depressed).

Peter.

(In reply to comment #40)

I just googled for texvc discussion and found some parts:
http://meta.wikimedia.org/wiki/Texvc

Thanks. That's very interesting.

Maybe this becomes off-topic...

Probably.

However, it supports my argument that there
should be a discussion how the restricted set of input commands should look
like.

I agree with that but...

Especially it should not be determined by the technical limitations of
the x-Rendering-Software.

I find this too idealistic. In reality, there aren't many solutions for math on the web, all of which have with their own limitations and advantages in a MW setting. The critical question is: what direction MW and its community (in particular Wikipedia) want to take mathematical and scientific content. As suggested by WMF, I tried to start a discussion about this on Wikitech-I but not much came out of it. So the answer seems to be: nobody cares.

Which is why I fully agree with you (but it makes me depressed).

Peter.

(In reply to comment #42)

As suggested by WMF, I tried to start a discussion about this on Wikitech-I but
not much came out of it. So the answer seems to be: nobody cares.

Sorry to hijack this bug, but I'll just post once then hopefully it can move back to RFC and/or Wikitech.

I think some feedback from Wikitech (e.g. Flow and issues with certain languages) was helpful, but I agree the Wikitech thread basically finished.

For smaller stuff, the answer is Just Do It, and hash anything out in code review.

For bigger architectural things (what to store in the database [e.g. MathML not TeX]), or having a single way of validating TeX/ANTLR grammar) where you want an answer before coding, it's probably time for an RFC (https://www.mediawiki.org/wiki/Requests_for_comment). We talked about if/when to do this before, but now is probably a good time. Pick a single issue, unless of course one decision clearly implies others, in which case you should include the related ones.

Above are just examples based on past discussions; the RFC can be whatever you think is appropriate. An example past implemented RFC (though probably a bit simpler) is https://www.mediawiki.org/wiki/Requests_for_comment/Reduce_math_rendering_preferences

PNG is the default for the tag Math
if you try to use Cyrillic characters, then get

1.PNG (152×1 px, 39 KB)

if you use MathML, then it is showing ok
2.PNG (108×1 px, 23 KB)

Can be fixed default formation PNG?

You can set

$wgDefaultUserOptions['math']='mathml';
Debenben claimed this task.
Debenben subscribed.

The caracters probably still look bad, but there should not be any errors anymore since png is also using MathJax now