Page MenuHomePhabricator

VisualEditor: MWExtensionNode preview (<math>, <syntaxhighlight>, …) double-escapes HTML (angle brackets, ampersands)
Closed, ResolvedPublic

Description

Repro:

  1. Create a page with text below:

<math>
\begin{matrix}
x & y \\
z & v
\end{matrix}
</math>

  1. Default rendering, in PHP parser and Parsoid (VE initial state):

x y
z v

  1. VE edit this matrix, change v to uv.
  1. Edited block becomes:

x amp; y
z amp; uv

(After saving it works fine)


Version: unspecified
Severity: minor
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=54577

Details

Reference
bz57429

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:36 AM
bzimport set Reference to bz57429.
  • Bug 60113 has been marked as a duplicate of this bug. ***

This problem is actually common for everything that uses MWExtensionNode. The problem is that ve.ce.MWExtensionNode.prototype.generateContents makes some bold assumptions about wikitext syntax that are not true.

The fix from bug 54577 didn't actually fix much, it just changed what the problem is.

  • Bug 60332 has been marked as a duplicate of this bug. ***

Basically, ve.ce.MWExtensionNode.prototype.generateContents builds an XML element object for given tag name (for example 'math'), given attributes (none in this case) and given contents (for example 'a < b'), and then serializes it to a string to ensure that everything is correctly escaped.

The problem is that MediaWiki XML-like tags don't expect their contents to be escaped. Serializing such a node generates "<math>a &lt; b</math>", while wikitext expects "<math>a < b</math>".

(An interesting corollary is that a MediaWiki XML-like tag 'foo' may not contain the string '</foo>' in any form nor encoded in any way, unless the code of the extension handling the tag specifically parses nested <nowiki/> tags or something. You can cheat – {{#tag:foo|</foo>}} – but then you still can't include the string '<foo></foo>' because it gets double-parsed.)

(Fun fact: the above behavior is the reason why <source/> was renamed to <syntaxhighlight/> some years ago.)

We should not escape the contents at all and instead do something clever if the input text contains '</foo>'. VE and wikitext are both designed to never disallow the user from doing stupid things, so I'm not sure what can be done if we don't just prevent the user from saving that. Maybe selectively HTML-escape /<\/\s*foo\s*>/ when closing the inspector?

Just for "fun", some informative examples that show what happens when you try too hard with {{#tag:}}. Two of these actually cause UNIQ markers to appear in output, which is rather bad in general. The behavior may differ depending on the tag used.

<syntaxhighlight lang=php>$a;</syntaxhighlight>

<syntaxhighlight lang=php></syntaxhighlight></syntaxhighlight>

<syntaxhighlight lang=php>&lt;/syntaxhighlight></syntaxhighlight>

{{#tag:syntaxhighlight|</syntaxhighlight>|lang=php}}

<syntaxhighlight lang=php><syntaxhighlight lang=php>$a;</syntaxhighlight></syntaxhighlight>

{{#tag:syntaxhighlight|<syntaxhighlight lang=php>$a;</syntaxhighlight>|lang=php}}

{{#tag:syntaxhighlight|&lt;syntaxhighlight lang=php>$a;</syntaxhighlight>|lang=php}}

{{#tag:syntaxhighlight|<nowiki><syntaxhighlight lang=php>$a;</syntaxhighlight></nowiki>|lang=php}}

Change 144157 had a related patch set uploaded by Bartosz Dziewoński:
ve.ui.MWExtensionInspector: Prevent from setting impossible content

https://gerrit.wikimedia.org/r/144157

Change 144158 had a related patch set uploaded by Bartosz Dziewoński:
ve.ce.MWExtensionNode: Don't escape content of wikitext tags on preview

https://gerrit.wikimedia.org/r/144158

Change 144157 merged by jenkins-bot:
ve.ui.MWExtensionInspector: Prevent from setting impossible content

https://gerrit.wikimedia.org/r/144157

Change 144158 merged by jenkins-bot:
ve.ce.MWExtensionNode: Don't escape content of wikitext tags on preview

https://gerrit.wikimedia.org/r/144158

Should be fixed.

Like in wikitext, it will now be impossible to input '</tag>' in the inspector in VisualEditor: it will be converted to '&lt;/tag>' when the inspector is closed.