Page MenuHomePhabricator

Descriptions and titles from Flickr should be unescaped
Open, MediumPublic

Description

Right now special characters in Flickr descriptions are imported in their escaped format. In other words, you see things like " in the description field in UploadWizard. This isn't too bad since those get rendered as the correct characters on the File page, but it would be better to have the raw characters used.


Version: master
Severity: normal

Details

Reference
bz42813

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:55 AM
bzimport added a project: UploadWizard.
bzimport set Reference to bz42813.

Unfortunately unescaping HTML entities in Javascript isn't easy without introducing an XSS vulnerability. We may just want to do some regexes for the most common HTML entities in the meantime.

Looks like the only HTML special chars we actually have to worry about are " and &. Flickr leaves apostrophes unescaped and strips out tag characters.

I don't think this is a good idea. Input

"

I supposed to end up being displayed as

"

With your change (or even my suggested inversal of the replacements, see my comment on the gerrit change), it'll end up being displayed as

"

In Flinfo I've never found it necessary to do this. Flinfo does, however, replace <strong></strong> and <b></b> by ''' and <em> or <i> by '', and escapes the special characters {, }, | to &#x7B;, &#x7C;, &#x7D; to avoid that Flickr descriptions containing these can break the enclosing {{information}} template. Flinfo also transforms <a>-links to wikilinks.

Why would we want &quot;? The idea of the patch is to convert &quot; to ", as explained in the initial bug comment. This bug is "Descriptions from Flickr should be UNescaped", not "Descriptions from Flickr should be escaped".

You should file a separate bug for the things that need to be escaped, such as {, }, |, etc.

This bug may be a bit confusing since escaping is normally a good thing. The issue in this case, however, is we are putting the text in a textarea that the user can edit (not in HTML), so we actually need to UNescape certain characters (specifically &quot; and &amp;).

Looking at your comment in the patch, I think I understand the confusion. You were thinking that this bug was about converting double-escaped strings into single-escaped strings, but it's actually about converting single-escaped strings into non-escaped strings. If we happen to convert double-escaped strings into non-escaped strings as well, that's OK.

I see, the double-decoding handles a Flickr bug.

BTW, you probably want to do that then for the title, too. See e.g. the output of this:

http://api.flickr.com/services/rest/?method=flickr.photos.getInfo&api_key=2beda6083bcdc84da6f8a5071abcda18&photo_id=423565224&format=json&nojsoncallback=1

(From their API explorer; non-free image, though.)

The photo should be entitled

4" and counting

but Flickr returns

4&amp;quot; and counting

as title.

Ah, good catch. I forgot about the title!

If this is done specifically for flickr shouldn't this be done in mw.FlickrChecker.js ?

I think it makes more sense to unescape right before the data is inserted into the text fields. Ideally, we don't want any escaped text in the description or title fields. Escaped text (and html) could theoretically come from any source. It just happens that Flickr is the only one we have to worry about right now.

Ryan: This issue is assigned to you. Are still working (or still plan to work) on this issue? Only in case you do not plan to work on this issue anymore, should the assignee be set back to default and the bug status changed from ASSIGNED to NEW/UNCONFIRMED? Thanks.

Also removing the "gci2013" whiteboard entry. If this is a good self-contained task for a new contributor, please add "easy" to the "Keywords" field. Thanks.

kaldari set Security to None.