Page MenuHomePhabricator

<math>, <$ext> etc. and ISBN in thumbnails/frames is broken.
Closed, ResolvedPublic

Description

Author: avarab

Description:
The problem is that Parser::magicISBN() is called after Linker::makeThumbLinkObj
has been run on the text, which means that the magicISBN function will expand on
any ISBN number it recognizes, including those inside XHTML tags like alt="" or
title="", the same thing happens with <math>.

For example, "[[Image:Badimg.jpeg|thumb|<math>2+2</math>]]" is turned into:

<div class="thumb tright"><div style="width:202px;"><a
href="/mw/HEAD/wiki/Image:Badimg.jpeg" class="internal" title="<span
class="texhtml>2 + 2</span>"><img
src="/mw/HEAD/images/thumb/9/98/200px-Badimg.jpeg" alt="<span class="texhtml>2 +
2</span>" width="200" height="300" longdesc="/mw/HEAD/wiki/Image:Badimg.jpeg"
/></a> <div class="thumbcaption" ><div class="magnify" style="float:right"><a
href="/mw/HEAD/wiki/Image:Badimg.jpeg" class="internal" title="Enlarge"><img
src="/mw/HEAD/skins/common/images/magnify-clip.png" width="15" height="11"
alt="Enlarge" /></a></div><span class="texhtml">2 + 2</span></div></div></div>


Version: unspecified
Severity: major

Details

Reference
bz1887

Revisions and Commits

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 8:17 PM
bzimport set Reference to bz1887.
bzimport added a subscriber: Unknown Object (MLST).

avarab wrote:

*** Bug 1442 has been marked as a duplicate of this bug. ***

avarab wrote:

*** Bug 834 has been marked as a duplicate of this bug. ***

dmaj007 wrote:

What should [[Image:Badimg.jpeg|thumb|<math>2+2</math>]] properly turn into?
That should make it easier to fix.

dmaj007 wrote:

Checked my own copy, I don't see this bug.

avarab wrote:

David Majnemer: The same as it does now, except things in title="" should always
be stripped of [<>"]

rowan.collins wrote:

Copying my comment from bug 1442, since you've marked the duplicates this way round:

The problem is that Parser::magicISBN() is called after Linker::makeThumbLinkObj
has been run on the text, which means that the magicISBN function will expand on
any ISBN number it recognizes, including those inside XHTML tags like alt="" or
title="".

In that case, it should be possible to do things the other way round, since the
image code should strip the HTML tags back out of the alt/title attributes. Kind
of ugly, but I think it might work.

dmaj007 wrote:

Ok, now that I know what should be done, I can fix it.

dmaj007 wrote:

The only way to get this stupid bug gone is we set it up so it omited the value.
That.. can be solved by restructuring some code (the proper way to do it) or the
hacky way of adding an omit to the unstip function. I will try both and give
patches so y'all got a choice ;)

dmaj007 wrote:

Patch for math attached. ;)

Patch for math attached. ;)
This patch is ONLY for math, I have not done ISBN yet, I will do so when
somebody tells me what is the correct behaviour for ISBN. :)

attachment Linker.php.patch ignored as obsolete

en.ABCD wrote:

For ISBN:
input:
[[Image:Example.jpg|thumb|ISBN 0123456789]]

current output:

<div class="thumb tright">
<div style="width: 182px;"><a href="/wiki/Image:Example.jpg" title="&lt;a href="
class="internal">ISBN
0123456789</a>"&gt;<img
src="http://upload.wikimedia.org/wikipedia/en/thumb/f/fd/180px-Example.jpg"
alt="&lt;a href=" class="internal">ISBN 0123456789" width="180" height="171"
longdesc="/wiki/Image:Example.jpg" /&gt;

<div class="thumbcaption">
<div class="magnify" style="float: right;"><a href="/wiki/Image:Example.jpg"
class="internal" title="Enlarge"><img
src="/skins/common/images/magnify-clip.png" alt="Enlarge" height="11"
width="15"></a></div>
<a href="/w/index.php?title=Special:Booksources&amp;isbn=0123456789"
class="internal">ISBN 0123456789</a></div>
</div>
</div>

correct output:

<div class="thumb tright">
<div style="width: 182px;"><a href="/wiki/Image:Example.jpg" class="internal"
title="ISBN 0123456789"><img
src="http://upload.wikimedia.org/wikipedia/en/thumb/f/fd/180px-Example.jpg"
alt="ISBN 0123456789" longdesc="/wiki/Image:Example.jpg" height="171"
width="180"></a>
<div class="thumbcaption">
<div class="magnify" style="float: right;"><a href="/wiki/Image:Example.jpg"
class="internal" title="Enlarge"><img
src="/skins/common/images/magnify-clip.png" alt="Enlarge" height="11"
width="15"></a></div>
<a href="/w/index.php?title=Special:Booksources&amp;isbn=0123456789"
class="internal">ISBN 0123456789</a></div>
</div>
</div>

rowan.collins wrote:

(In reply to comment #10)

This patch is ONLY for math, I have not done ISBN yet, I will do so when
somebody tells me what is the correct behaviour for ISBN. :)

Unfortunately, although only small, there are 2 errors in this patch: first,
note that there is both $label *and* $alt; we don't mind <math></math> in the
caption (i.e. $label), only in the attributes, so we only want to extract it in
$alt; secondly (and I don't blame you for missing this one), there is duplicated
code between makeImageLinkObj() and makeThumblinkObj() - really, code like this
should be in both, or rather, should be in a part of code that's not spread
across both. [I started rewriting that whole slab of code, but never quite finished]

The "correct behaviour for ISBN" is surely just nothing - that is, within the
alt/title attributes, the text "ISBN 0123456789X" should stay exactly as it is;
but obviously there's no real way of doing that, because it will be picked up by
the magicISBN() function later whether we like it or not. I guess you could use
a placeholder of some sort, but these have problems of their own.

In fact, at one stage the whole image was being stripped out, presumably causing
certain things not to happen even in the *caption* part, and causing bug 1217
and bug 1219

The other alternative, as I say, is just to parse such things before the
internalLink code has run, so that they will be HTML, and stripped from the
alt/title attributes anyway. I've stayed up too late, so my mind's whirling too
much to work out if that would cause any problems of its own or not...

dmaj007 wrote:

Rowan, thanks for the tips. However, I will fix the ISBN function :)

rowan.collins wrote:

(In reply to comment #14)

Rowan, thanks for the tips. However, I will fix the ISBN function :)

I'll be interested to see what approach you take.

dmaj007 wrote:

Pwned ISBN

FIXED!
ISBN is fixed with two simple regexs. :)

attachment Parser.php.patch ignored as obsolete

dmaj007 wrote:

Fixed my math patch

Fixed my math patch, to the best of my knowledge, the effects of this bug are
gone.

Attached:

dmaj007 wrote:

parserTest

This patches the parser test file. Have fun.

attachment parserTests.txt.patch ignored as obsolete

rowan.collins wrote:

(In reply to comment #16)

Created an attachment (id=443) [edit]
Pwned ISBN

FIXED!
ISBN is fixed with two simple regexs. :)

I might be wrong, but won't this only fix captions whose entire content is an
ISBN string? i.e. it looks to be matching things like <<alt="ISBN
0123456789X">>, but not things like <<alt="This is a book which has the ISBN
0123456789X; which is nice">> Unfortunately, my 1.5 test install is in a state
of development where it can't actually display images, so I've only got the code
to go from. It also seems to me that this could potentially trigger false
positives in the actual text, but I guess it's not a piece of text that people
are very likely to use (extending the regex to cover foo="phrase with an ISBN
etc in it" *might* run the risk of picking up too much, though, I'm not sure)

dmaj007 wrote:

ISBN AGAIN

Fixed ISBN again. It is totally lazy and works great. It is designed to be
smart enough to to mistake user input and an actual ISBN. Still two regexs.
This should fix the problem with other words.

attachment Parser.php.patch ignored as obsolete

dmaj007 wrote:

Rewrote the whole function

I rewrote the whole function, I thought the way it did things was silly and
over complicated so I wrote a new function.

attachment Parser.php.patch ignored as obsolete

avarab wrote:

Regarding the "parserTest" attachment:

  • You made some minor formatting change to "Interwiki link encoding conversion

(bug 1636)"

  • You damaged the "Inter-language links" test by changing "zh:食品" to "zh:??"

(two question marks), please make sure your editor doesn't make changes like
that to characters it doesn't recognize.

Regarding the "Rewrote the whole function" attachment:

  • Please don't make minor whitespace changes, it makes the diff hard to read.

dmaj007 wrote:

Fixed uglyness

It is nolonger ugly as hell.

attachment Parser.php.patch ignored as obsolete

dmaj007 wrote:

fixed my ugly test patch

Fixed my horrible patch, this one is nicer.

Attached:

dmaj007 wrote:

RAWR!

Fixed the stupid thing again! Damn you world!

attachment Parser.php.patch ignored as obsolete

avarab wrote:

It doesn't work for more than one ISBN link per line, the following fails:

"""
[[Image:Foo.png|thumb|ISBN 213 ISBN 213 ISBN 213 ... ]]
ISBN 123-213 ISBN 123-213 ISBN 123-213 ...
"""

avarab wrote:

*** Bug 2440 has been marked as a duplicate of this bug. ***

The problem occurs also when using external links starting with "mailto:" for mail
addresses.

avarab wrote:

(In reply to comment #24)

Created an attachment (id=461) [edit]
fixed my ugly test patch

Applied to HEAD.

avarab wrote:

Comment on attachment 462
RAWR!

Marking attachment 462 as obsolete since it won't work for the reasons I stated
in comment #26.

avarab wrote:

This problem also affects parser hooks, like <ref> from the Cite.php extension.

Currently looking into this- i've got a small regexp that fixes the ISBN and
RFC/PMID case, but it doesn't help the math/plugin one. Another case of missing
nesting state, and probably another case of adding a brittle hack.

I've fixed this for stripped things like math and extensions now, unstripping
the $alt text in the parser before constructing the thumb does the trick. All
html tags are removed right after the unstrip, leaving only the text.

This is not committed yet, i'm collecting a bunch of parser fixes for review
sometime next week.

epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:24 AM
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).
epriestley added a commit: Unknown Object (Diffusion Commit).