Page MenuHomePhabricator

wfEscapeWikiText is inadequate
Closed, ResolvedPublic

Description

screenshot

wfEscapeWikiText does not escape enough characters, allowing undesirable formatting through in certain cases.

To reproduce, open the following URL. This is a search for
"TOC OR<CR>;a<CR>ISBN<TAB>978-3-16-148410-0<CR> a".

https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=all&search=__TOC__%20OR%0D;a%0D:ISBN%09978-3-16-148410-0%0D%20a&fulltext=Search

Expected Result:

The text "Results 1–6 of 6 for " (from message 'showingresultsheader') is followed by "TOC OR ;a ISBN 978-3-16-148410-0 a", with no special formatting or linking beyond the bolding applied by the message text.

Actual Result:

TOC disappears. The first "a" appears on the next line. The ISBN is indented (as a definition in a definition list) and linked to Special:BookSources. The second "a" appears as monospaced text inside a pre element.


Version: 1.22.0
Severity: normal
URL: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=all&search=%0D;a%0D:ISBN%09978-3-16-148410-0%0D%20a&fulltext=Search

Attached:

wfEscapeWikiText.png (585×853 px, 64 KB)

Details

Reference
bz53658

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:11 AM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz53658.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

To reproduce, open the following URL. This is a search for
"TOC OR<CR>;a<CR>ISBN<TAB>978-3-16-148410-0<CR> a".

Actually for "TOC OR<CR>;a<CR>:ISBN<TAB>978-3-16-148410-0<CR> a"

To be clear, things that need to be handled here are:

  1. Double underscore magic words
  2. Magic links using a non-space whitespace
  3. Newlines using CR instead of LF

Found some others as well:

https://en.wikipedia.org/wiki/Special:Search/PMID_1
https://en.wikipedia.org/wiki/Special:Search/urn:foo

Grepping the code reveals that Sanitizer::safeEncodeAttribute does
handle the former, though not some of the other things wfEscapeWikiText
is supposed to.

Change 82460 had a related patch set uploaded by Anomie:
Improve wfEscapeWikiText

https://gerrit.wikimedia.org/r/82460

Change 82462 had a related patch set uploaded by Anomie:
Improve mw.text.nowiki

https://gerrit.wikimedia.org/r/82462

Change 82462 merged by jenkins-bot:
Improve mw.text.nowiki

https://gerrit.wikimedia.org/r/82462

Change 82460 merged by jenkins-bot:
Improve wfEscapeWikiText

https://gerrit.wikimedia.org/r/82460

Changes merged. They should be deployed to WMF wikis with 1.22wmf16, see https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap for the schedule.

What about two or more consecutive newlines? Should all newlines be escaped (not just those preceding #, *, etc.)?

For example:

$m = new RawMessage( '$1' ); var_dump( $m->params( wfEscapeWikiText( "foo\n\n\nbar" ) )->parse() );

As of a86240a37aa729494bd4d7c7935afff4e5b62b22 I get:

string(21) "foo\n</p><p><br />\nbar"

I would expect this to be:

string(21) "foo&#10;&#10;&#10;bar"

Change 85233 had a related patch set uploaded by Anomie:
Improve wfEscapeWikiText, part 2

https://gerrit.wikimedia.org/r/85233

Change 85234 had a related patch set uploaded by Anomie:
Improve mw.text.nowiki, part 2

https://gerrit.wikimedia.org/r/85234

Change 85233 merged by jenkins-bot:
Improve wfEscapeWikiText, part 2

https://gerrit.wikimedia.org/r/85233

Change 85234 merged by jenkins-bot:
Improve mw.text.nowiki, part 2

https://gerrit.wikimedia.org/r/85234

Changes merged. They should be deployed to WMF wikis with 1.22wmf19, see https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap for the schedule.

Change 95420 had a related patch set uploaded by MarkAHershberger:
Improve mw.text.nowiki

https://gerrit.wikimedia.org/r/95420

Change 95421 had a related patch set uploaded by MarkAHershberger:
Improve mw.text.nowiki, part 2

https://gerrit.wikimedia.org/r/95421

Change 95421 abandoned by MarkAHershberger:
Improve mw.text.nowiki, part 2

https://gerrit.wikimedia.org/r/95421

Change 95420 abandoned by MarkAHershberger:
Improve mw.text.nowiki

https://gerrit.wikimedia.org/r/95420

No open patches to review here (backport patches got abandoned), hence restting status to RESOLVED FIXED. Backport_to_Stable flag might be set to "-" by hexmode.