Page MenuHomePhabricator

Weaken DISPLAYTITLE restictions
Closed, ResolvedPublic

Description

At the moment, we have a {{DISPLAYTITLE}} magic word that allows us to modify the displayed page title the way we want, but with several limitations that come to a check whether a DISPLAYTITLE'd title can be normalized to the actual page title. But sometimes this is not enough (for example, it is inapplicable for titles that should have unallowed characters like | or # in them). Or there might be a need [1] to display some formatting. And not to mention all these user pages that use ugly hacks for changing their title to something different to dull User:XXX.

Present solution for this in some wikis is different kinds of JavaScript that, besides their weird and tricky implementation, have compatibility problems with skins. My proposal is to:

  1. Leave everything as it is if DISPLAYTITLE argument normalizes to the current page title.
  2. Make behavior the same for DISPLAYTITLE arguments that normalize to the current page title after formatting has been stripped (maybe the same algorithm that is used for TOCs?)
  3. Display a small irremovable sub (something like "The internal title of this page is {{PAGENAME}}") if two conditions above have not been matched.

[1] http://ru.wikipedia.org/wiki/Паровоз_Су


Version: 1.12.x
Severity: enhancement

Details

Reference
bz12998

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:05 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz12998.

I'd agree... There is also another case to consider.

I've seen a few wiki which are using the Title hack because they need to have some of the title italicized, such as http://starwars.wikia.com/wiki/Venator-class_Star_Destroyer in which "Venator" is italicized in the title.

Of course, if someone goes and uses {{DISPLAYTITLE:''Venator''-class Star Destroyer}} inside of the page, it is just ignored.

However, after minimal parsing (only lightweight things like emphasis, perhaps css classes and style tags as something to be added additionally) the resulting title is actually still valid, while in markup you may see [[''Venator''-class Star Destroyer]] after parsed the text is actually [[Venator-class Star Destroyer]] which is still a valid link to the same article.
((Ohwait, that's your 2))

Though I don't know about irremovable. Stuck there strongly by default yes... However there are some private wiki which don't feel a need for that and can deal with that themselves.

There's actually another case to be thought of, redirects from other text formats which would actually be more valid, but have url issues.

While not the case on the English Wikipedia (Though it might be for other things, couldn't find info on it) titles like "MÄR" are used for the actual title. However the actual url rendering of this is "M%C3%84R". There are other wiki which may likely enforce use of the title "MAR" instead to avoid that issue. However while the wiki does prefer the use of "MAR" in the url, they wish to have "MÄR" in the title. Not something they can do, but they at least redirect the "MÄR" article to "MAR".
An addition may be to allow pages which redirect to the page to be allowed as titles. So one could place a redirect to "MAR" at "MÄR", and then use {{DISPLAYTITLE:MÄR}} on the "MAR" article. As a result, the url will display "MAR" and be address readable by the user, at the same time as displaying "MÄR" as the title of the article.

  • Bug 13545 has been marked as a duplicate of this bug. ***

I thought I would have already noted it. But In the title rewrite I am working on (The one using the real title field to allow storing the non-normalized title in pair with the normalized one so we can save [[_main_Page]] and have it display that even though the title is the same as [[Main Page]]) I also intend to do some work on DISPLAYTITLE.

One of the two primary focuses on the title rewrite is improving extensibility of the title system. The length that the titles themselves are going to be extended into will actually void out the need to use DISPLAYTITLE for the current reasons it exists. However because it's currently in use, I didn't want to break compatibility by removing it. So I intend to change it.

I am going to be changing DISPLAYTITLE from a big parser hack, into a separate system for manipulating the title displayed in the page (Do note though, that changing that title will no longer change the title in the address bar, the title in the address bar will be the real title, non-normalized)

This will have a few effects to it.
Extensions will now be able to mess with the Display Title. So you could actually create an extension to change Food/Fruit/Apple/Spartan into Food > Fruit > Apple > Spartan where everything except Spartan is a link to the other page... Which could turn subpage structure, into a directory type structure. Or have an extension change Template:Foo show as {{Foo}} in the title bar instead.
Additionally while the DISPLAYTITLE magic word will default to working nearly exactly the same way that it works now (sans displaying in the Browser Title though), what it does will become extensible. So yes, if you decide you want some markup to become valid in the title, then by making use of an extension to the Display Title system you can modify DISPLAYTITLE's behavior and make that markup valid and display inside the title header.

In r39552 i have added the option $wgRestrictDisplayTitle which lets you disable all restrictions on the displaytitle. This is not primarily intended for use on mediawiki sites, but caters to popular demand on small private wikis. It just seemed silly to insist on the hard coded restrictions.

So, let's see if it gets reverted or stays :)

rememberthedot wrote:

Proposed patch v1

Here is a preliminary patch that should help resolve the problem. It uses
Sanitizer::removeHTMLtags, so it allows tags allowed in wikitext (like <sup>
and <sub>) but not tags not allowed in wikitext (like <script>). This is very
similar to what the English Wikipedia's JavaScript implementation already does
(see [[MediaWiki:Common.js]]). I tested this patch on all skins and it appears
to work OK.

Unlike the previous patch, this patch differentiates between the HTML title
(what will go into <h1>) and the plain text title (what will go into <title>).
This avoids problems with tags finding their way into <title> when <title> is
not supposed to have any tags inside of it.

One of the limitations of this patch is that it doesn't process templates. It'd
be nice if we could say {{DISPLAYTITLE:{{Unicode|unusual characters}}}},
including a template designed to improve browser compatibility with unusual
characters. But this is a minor concern since I believe all the compatibility
templates like this can be expressed as <span class="Unicode"> instead.

And of course, if nobody finds any major bugs with this patch, we could just
implement it for now and worry about tweaking the code to be more permissive
later.

attachment Proposed patch v1 ignored as obsolete

rememberthedot wrote:

*** Bug 14226 has been marked as a duplicate of this bug. ***

I'm a little worried about this:

+ $titleText = trim(DOMDocument::loadXML('<title>' . $titleHTML . '</title>')->textContent);

a) how does it perform and

b) will it trigger a fatal error if there's an imbalanced tag that the sanitizer misses?

Further, is it really necessary? How does it compare to Sanitizer::stripAllTags() ?

I would also recommend adding some kind of test suite containing examples of titles that should and shouldn't make it through the checks, and what the result is.

Don't we already have a sanitizer function for removing dom tags?

rememberthedot wrote:

Proposed patch v2

Revised patch to use Sanitizer::stripAllTags, which works great! Now, my goal in coding this patch was to completely eliminate the need for JavaScript hacks to set the values of the <title> and <h1> elements. The <h1> element _must_ be copy-pasteable (in other words, when you copy the <h1> text you should be able to make a link to the article just by pasting what you have). However, we permit non-normalizing titles in the <title> element because the user can't easily select the contents of <title> to copy it.

We have some pretty weird titles on the English Wikipedia. It wouldn't be unreasonable to have an article about something that had both a special character and a superscript, say "Abc#d<sup>e</sup>f". In order to be as accurate as possible, the DISPLAYTITLE magic word needs to be able to put "Abc#def" in <title> and "Abcd<sup>e</sup>f" in <h1>. So, I've updated the DISPLAYTITLE word to take two parameters. The syntax is now {{DISPLAYTITLE:requestedDisplayTitleH1|requestedDisplayTitleTitle}}. If requestedDisplayTitleTitle is not specified then a stripped version of requestedDisplayTitleH1 is used in <title> instead.

I ran into problems trying to make an automated test suite for this, however I did test it. Here are the major tests that I did manually on Main Page:

{{DISPLAYTITLE:<span style="text-decoration:underline">Main Page</span>}}
<title>Main Page - {{SITENAME}}</title>
<h1><span style="text-decoration:underline">Main Page</span> - {{SITENAME}}</h1>

{{DISPLAYTITLE:<i>Main Page}}
<title>Main Page - {{SITENAME}}</title>
<h1><i>Main Page</i> - {{SITENAME}}</h1>

{{DISPLAYTITLE:Main#Page}}
<title>Main#Page - {{SITENAME}}</title>, <h1>Main Page - {{SITENAME}}</h1>

{{DISPLAYTITLE:<script>Main Page</script>}}
<title>&lt;script&gt;Main Page&lt;/script&gt; - {{SITENAME}}</title>
<h1>Main Page - {{SITENAME}}</h1>

{{DISPLAYTITLE:Main_P<sup>age</sup>|#Main_Page}}
<title>#Main_Page - {{SITENAME}}</title>
<h1>Main_P<sup>age</sup></h1>

{{DISPLAYTITLE:Main_P<sup>age</sup>|#<script>Main_Page</script>}}
<title>#&lt;script&gt;Main_Page&lt;/script&gt; - {{SITENAME}}</title>
<h1>Main_P<sup>age</sup></h1>

So, what do you think? What else needs to be done before this can go into MediaWiki?

attachment Proposed patch v2 ignored as obsolete

OK, some quick notes:

Renaming 'setDisplayTitle' breaks any extensions that may use it. I'd keep it as it was.

Not sure I like the naming of '$requestedDisplayTitleH1'. Why does it have 'request'?

'MediaWiki escapes this automatically before it is seved out' should be *serves

rememberthedot wrote:

Proposed patch v3

I've made the requested changes, what do you think?

attachment Proposed patch v3 ignored as obsolete

rememberthedot wrote:

Wow, that was fast, thank you!

Behavior seems a bit hard to predict, as far as what's going to go in the header and what in the browser window etc. Pulling it back for further testing and discussion.

Reverted in r44432

rememberthedot wrote:

Could you be any more specific about what's wrong? I can't fix it if I don't know what the problem is.

Basic problem on two minutes of testing was that it seemed difficult to tell what was going to happen. Sometimes I'd see some pretty formatting in the <h1>, but I'd see a bunch of ugly markup in the <title>, or vice-versa.

This seems consistent with the test cases listed in comment #9, but I'd have to say those are pretty undesirable... If I see "Main Page" in the <h1> I should see "Main Page" in the <title> too.

rememberthedot wrote:

I think I see what you're saying. Here are some actual article titles that we have to accommodate and what the <title> and <h1> ought to read for each:

MacLife
<title>Mac|Life</title>
<h1>MacLife</h1>

HardOCP
<title>[H]ard|OCP</title>
<h1>HardOCP</h1>

Dweeb (band)
<title>[dweeb] (band)</title>
<h1>dweeb (band)</h1>

E=MC2 (song)
<title>E=MC² (song)</title>
<h1>E=MC<sup>2</sup> (song)</h1>

Signaling System 7
<title>Signaling System #7</title>
<h1>Signaling System 7</h1>

So you see, we cannot guarantee a 1:1 relationship between the <h1> and the <title>. Nevertheless, I still don't know exactly what problems you ran into - could you please post the specific calls to DISPLAYTITLE that produced undesirable behavior for you?

All of the above examples are examples of problems. There's no clear reason for them to be different in the described ways. Certainly it makes no sense for the <title> portion, which must be plaintext, to have additional markup or characters that are not in the <h1>, which is much less restrictive by being HTML.

rememberthedot wrote:

Thanks for your reply. <h1> is copy-pasteable, <title> is not. The user expects to be able to copy the contents of <h1> and paste them to make a link to the article, whereas the user cannot select the contents of the title bar to copy it. This is why the <h1> is more restrictive than the <title>. Does that at least make sense?

IMHO if it's not in the <h1> it shouldn't be in the <title>.

The mixed approach has some problems; since the <title> contents are less obviously visible, most of the time nobody will notice the "fancier" characters in the title bar. At a minimum, I think people will be much more interested in making the <h1> look nice, and don't have as much interest in the <title>.

At worst it means that a bogus attempt to use DISPLAYTITLE will not have any affect on the very visible <h1> while dumping broken ugly markup into the <title>, which may be forgotten and left around.

I'd prefer to avoid "multiplying elements" per Occam's razor... :)

rememberthedot wrote:

Proposed patch v4

Dropped support for freeform <title>s. I see your point, the inconsistency could very well confuse the end user. A few test cases for the new patch:

Dweeb (band)
{{DISPLAYTITLE:[dweeb] (band)}}
<title>Dweeb (band)</title>
<h1>Dweeb (band)</h1>

Dweeb (band)
{{DISPLAYTITLE:dweeb (band)}}
<title>dweeb (band)</title>
<h1>dweeb (band)</h1>

E=MC2 (song)
{{DISPLAYTITLE:E=MC<sup>2</sup> (song)|E=MC² (song)}}
<title>E=MC2 (song)</title>
<h1>E=MC2 (song)</h1>

E=MC2 (song)
{{DISPLAYTITLE:E=MC<sup>2</sup> (song)}}
<title>E=MC2 (song)</title>
<h1>E=MC<sup>2</sup> (song)</h1>

Signaling System 7
{{DISPLAYTITLE:Signaling System #7}}
<title>Signaling System 7</title>
<h1>Signaling System 7</h1>

attachment Proposed patch v4 ignored as obsolete

rememberthedot wrote:

Ah sorry, the third test case should have been:

E=MC2 (song)
{{DISPLAYTITLE:E=MC<sup>2</sup> (song)|E=MC² (song)}}
<title>E=MC2 (song)</title>
<h1>E=MC<sup>2</sup> (song)</h1>

  • Bug 496 has been marked as a duplicate of this bug. ***

Done in r45181, assuming no other conceptual issues. Looks fine.

rememberthedot wrote:

Thank you! Just reply back here if there are any more problems.

(In reply to comment #25)

Thank you! Just reply back here if there are any more problems.

See http://www.mediawiki.org/wiki/Special:Code/MediaWiki/45181#c1046

rememberthedot wrote:

Proposed patch v5

Much cleaner patch that makes OutputPage::setPageTitle escape bad tags like <script> but leave good ones like <i> in <h1>. setPageTitle then strips out all remaining tags and places the result into <title>. All CoreParserFunctions::displaytitle has to do is make sure that what will wind up in <title> is consistent with the actual page title.

I removed the highly unsafe $wgRestrictDisplayTitle "feature", since with loosened displaytitle restrictions this is unnecessary. As an added bonus, I also did some stylistic cleanup in GlobalFunctions.php.

Attached:

Assigning to myself for review.

rememberthedot wrote:

Patch committed in r49330.

Wiki.Melancholie wrote:

Just a minor note:
It is possible to vandalize a title, using <span style="display: none;">...</span>
Test case (if page is "Shipment"):
{{DISPLAYTITLE:shi<span style="display:none">pmen</span>t}} makes *shit* out of "shipment" e.g. ;-)

Manipulating of the title's main text characters should either be forbidden at all or otherwise allowed completely.