Page MenuHomePhabricator

Flickr disclaimer message escaped, showing raw HTML
Closed, ResolvedPublic

Description

Some time in the last couple weeks something changed that caused the Flickr disclaimer message in UploadWizard to become escaped. In other words, it is now displaying the HTML tags as text rather than parsing them as HTML.

I haven't had a chance to debug yet, but the message is 'mwe-upwiz-flickr-disclaimer' and it is output around line 329 of /resources/mw.UploadWizard.js.

Right now, this is only happening locally, not on Commons yet.


Version: 1.21.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=29542

Details

Reference
bz44525

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:24 AM
bzimport set Reference to bz44525.

I tried mw.message('mwe-upwiz-flickr-disclaimer').plain(); in the console and the output had <br /> escaped. "<a>" isn't escaped though. Rewinding core to few days back gives the right message.

Moving to core-i18n

(In reply to comment #2)

Is this a dupe of bug 44459?

I think they are different.

Yes, I think they are different as well.

Looks like this bug was trigger by the change to the concat function in https://gerrit.wikimedia.org/r/#/c/44843/

CCing Timo.

Looks like that change is going to get picked up by the deployment train on Monday.

Actually, I think those bugs might be related. The one you refer looks like it was implicitly fixed in clean up I did in Ifbeae7e9.

The jqueryMsg parser only supports very basic wikitext (external links, local links and some other basic syntax) and i18n parser functions (plural, gender, grammer). Everything else is not understood and as such considered text nodes and must NOT be used in interface messages parsed in jqueryMsg.

Before Ifbeae7e9, if the text node contained "<" it was parsed as arbitrary HTML. Which broke some special characters and hyperlinks that contained &amp;. Hence I fixed that.

However, It was actually much worse than the above. jqueryMsg didn't check for "<" (that would've been a sloppy implementation with shady intentions), it was just passing it to jQuery's .append() with comment "// strings and numbers". jqueryMsg was intending to appending as text nodes, no html parsing of any kind.

However in jQuery append> domManip> buildFragment> test> rhtml

rhtml = /<|&#?\w+;/

There, buildFragment decides to parse html or append text node based on some regex...

Messages are not allowed to contain arbitrary html. Everything is plain text unless wikitext supports it (and in this case, the subset of that supported in jqueryMsg). Right that subset is i18n parser functions and wikilinks.

Ifbeae7e9 removed the insecure and unreliable coincidence factor that would sometimes parse any arbitrary html – unfiltered.

If you need certain html tags such as <br> (i.e. the ones allowed in wikitext), they'll have to be properly supported and implemented in jqueryMsg. Otherwise, avoid those kind of tags in those messages.

Suggesting to close as wontfix, invalid or worksforme.

"jqueryMsg was intending to appending as text nodes, no html parsing of any
kind."

On what basis do you say that? In https://bugzilla.wikimedia.org/show_bug.cgi?id=29542 says, Neil (one of the main devs on it) says:

"(HTML which is fully outside of or fully inside of another element is parsed
correctly.)"

In the actual prior code:

// strings, integers, anything else
$span.append( node );

I take "anything else" at face value.

"Messages are not allowed to contain arbitrary html."

Not true. Some are. https://en.wikipedia.org/wiki/MediaWiki:Stub-threshold has a HTML link, which is used as HTML in preferences.

If you try the same thing as wikitext, it will not render. https://en.wikipedia.org/wiki/User:Superm401/stub_html

Messages are special, and one of the distinguishing factors is that they can use HTML in some cases.

I believe we should allow HTML as HTML, like before, as a temporary solution.

After that, we should consider an HTML mode, since I think that (HTML vs. wikitext, by context), reflects how it is on the server.

It sounds like the html detection was breaking other things so we may need to come up with a more specific solution. The idea of an explicit HTML mode, perhaps even as a separate function, sounds workable. Specifically detecting HTML tags that are supported in Wikitext would also be nice, but it wouldn't solve all the cases (and would add some bloat).

In the meantime, we have to decide what to do with Ifbeae7e9 before Monday. @Krinkle: How widespread was the breakage caused by the HTML detection? Would it be less breakage than will be caused by removing the detection? If we're not going to revert Ifbeae7e9, I would favor at least adding explicit support for <br> tags (and perhaps anchor tags) before Monday.

It looks like change Ifbeae7e9 also breaks some messages in PageTraige.

Ryan, which message(s)? We should be careful, since it could be bug 44459 (which I know affects PageTriage). One of my changes introduced bug 44459, but I also have a fix pending (https://gerrit.wikimedia.org/r/#/c/47044/)

They're both jqueryMsg, but otherwise basically unrelated.

It's where HTML entities are being escaped instead of displayed as characters. Pretty sure it's the same as this bug rather than bug 44459.

  • Bug 44885 has been marked as a duplicate of this bug. ***

Note, the elements that are allowed in wikitext are listed here (https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext#Permitted_HTML). The implementation is in Sanitizer.php.

As it says, that's just a workaround. This isn't really fixed until jqueryMsg at least supports the elements that are permitted and commonly used, like b, i, etc.

Reopening.

@Matt: Agreed. Do you want to take it, or should I?

If you have cycles, I'll let you take it.

Otherwise, I'll get around to it at some point.

No, that's a different bug, but obviously affected by change Ifbeae7e9.

However, making supported HTML tags a fully (recursively) parsed construct should fix bug 29542 too.

Ryan Kaldari and I talked. I'm going to take the first part of this. He'll help generalize it later.

Okay, this is at https://gerrit.wikimedia.org/r/#/c/53017/ . I implemented a general framework for any whitelisted elements and attributes. If a tag isn't allowed (due to bad element, bad attribute, or mismatched), the tag itself will be treated as text. The contents are still HTML if applicable, which matches the server's behavior.

Only <i> and the common attributes are implemented. I tested that works (including some nesting tests (links inside) and that invalid (not whitelisted) HTML (<script>, onclick) is treated as text.

I'll try to do code review on this tomorrow.

Yes, the root cause is fixed (whitelisted HTML is now allowed). Specific extensions can now be changed to use that.

Additional tags already allowed on the server-side can be added to mediawiki.jqueryMsg.js as needed.

Related URL: https://gerrit.wikimedia.org/r/59602 (Gerrit Change Ie142dc57359121866b603e65d0f7941cf1539d22)