Page MenuHomePhabricator

Allow saving pages with LRM and RLM in titles, showing a warning and requiring a user right
Open, MediumPublic

Description

Bug 3696 asked for RLM and LRM to be restricted; it was resolved and now RLM and LRM are not just restricted, but completely forbidden in article titles.

However, RLM and LRM are still useful when used correctly. For example, using them would allow to solve Bug 28411. Creating pages with LRM and RLM in titles and moving pages to such titles can be *restricted* to accounts with a certain user right and a warning can be displayed to someone who is doing it.

Someone who knows what LRM and RLM are and who has been given a right to save articles with it probably won't do a lot of damage.


Version: unspecified
Severity: normal

Details

Reference
bz28428

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:33 PM
bzimport set Reference to bz28428.
bzimport added a subscriber: Unknown Object (MLST).

How are other users going to be able to type titles like that in the url or various fields we have?

What fields, for example? I think that redirects without the special characters can cover it all.

Since i opened this bug, however, i learned about the upcoming HTML5 features, which can fix *most* of these problems without having to use the LRM character. Titles like http://he.wikipedia.org/wiki/%28You_Drive_Me%29_Crazy will be able to be fixed by using these features and without the need to add LRM (this title appears correctly in the article by using a template, but it appears incorrectly in the category listing). I can imagine other edge cases, but i've never actually seen titles that would need it.

So, this bug does need to be fixed, but the upcoming HTML5 bidi model makes it less urgent.

Not LATER, because in some cases these characters are the only imaginable choice. I never saw such cases, but they are conceivable. But since it is so rare, the priority can be low.

(In reply to comment #2)
Say, Special:UserRights or Special:Block where you have to type the username. Or Special:Log where you have type the article name.

What about making links in automated lists (or even in just the linker::link method) use the displaytitle for the text of the link (if nothing else specified), but still link to the actual page name? Not sure if that'd actually be feasible or not.

Oh. That is a bit of a problem. What Bawolff says is probably the right direction, but i'll have to think about it.

(As a side note, it's possible to type LRM and RLM on Hebrew keyboards, but very few people actually do it even though it's quite useful.)

I think Bawolff's suggestion needs a schema change. However that would fix many other issues too, not just RTL related.

  • Bug 30422 has been marked as a duplicate of this bug. ***

And now we get various REL and LRM controls inserted everywhere in pages or in templates without the user seeing them.

They have caused existing templates being polluted by LRM controls inserted at various end of lines (or sometimes within the tested value of a {{#:if{{1|}}|...|...} when saving pages, causing these templates to no longer work as expected.

If you allow inserting LRM or RLM marks, then the wiki code editor MUST convert these controls using visible by converting them to character entities ‏ or ‎

An example, now the parser function #language returns a LRM at end of the autonym for be-x-old, causing rendering bugs.

See by yourself at end of {{#language:be-x-old}}.

This LRM was inserted in the database file edited by the MediaWiki editor, and containing the list of language names. Frequently these damned LRM are inserted magically at end of lines or before some ending punctuations like '}' or separating punctuations like '|'.

These controls are difficult to find. The user never entered them. We need to cleanup code everywhere at various places on various wikis, including Meta, Wikipedia, and we find strange bugs with edited templates no longer working at places were NO edit was actually made by the user.

(In reply to comment #10)

And now we get various REL and LRM controls inserted everywhere in pages or
in
templates without the user seeing them.

They have caused existing templates being polluted by LRM controls inserted
at
various end of lines (or sometimes within the tested value of a
{{#:if{{1|}}|...|...} when saving pages, causing these templates to no longer
work as expected.

If you allow inserting LRM or RLM marks, then the wiki code editor MUST
convert
these controls using visible by converting them to character entities ‏
or
‎

An example, now the parser function #language returns a LRM at end of the
autonym for be-x-old, causing rendering bugs.

See by yourself at end of {{#language:be-x-old}}.

This LRM was inserted in the database file edited by the MediaWiki editor,
and
containing the list of language names. Frequently these damned LRM are
inserted
magically at end of lines or before some ending punctuations like '}' or
separating punctuations like '|'.

These controls are difficult to find. The user never entered them. We need to
cleanup code everywhere at various places on various wikis, including Meta,
Wikipedia, and we find strange bugs with edited templates no longer working
at
places were NO edit was actually made by the user.

Huh? We never did this (allow rlm in title). If parser funcs are randomly returning rlm's that's probably a separate bug. If rlm's are appearing in wikitext, thats probably somebody's browser or msybe ve (?).

The idea of making rlms in wikitext auto convert to entities is probably a good one, but only tangentislly (at best) related to this bug. Please file a separate bug about that.

Did I say that there was a RLM in a title ? No.

I said that I see since mid october RLMs appearing randomly on various pages that have been saved at that time. Without any warning.

The result of {{#language:be-x-old}} is reproductible by you, it's definitely not anywhere in the user-editable wiki code, it is in the code implementing the parser function (which must have been edited with the same wiki code editor which currently randomly inserts some RLM when saving pages, even if the user never typed them (not present on the keyboard).

They probably come from the internal code of some javascript supporting the code editors, but I did not investigate long about them.

Using named (or numbered) character entities in the code editor for all invisible format controls (ZWSP, WJ, Bidi controls) is certainly a good solution, even if the server later (or the javascript supporting the form validation before sending to the server) stores them as single characters.

At least we'll see them in the wiki code, which is not the WYSIWYG form you'll see in the preview or in the generated HTML when viewing pages.

For the VisualEditor, it may require some work to choose how to render their presence in the editable preview.

(In reply to comment #12)

Did I say that there was a RLM in a title ? No.

Which is why I said you were off topic and asked you to file a separate bug.

The title of this bug is "Allow saving pages with LRM and RLM in titles, showing a warning and requiring a user right" if your comment isn't about titles with rlms in them, you are doing something wrong in commenting on this bug.

There is a lot of bugs in bugzilla. If new bugs are filed as off topic comments on existing bugs, they are significanyly less likely to be fixed.

This was the only current bug I found speaking about the introduction of RLM/LRM to be allowed. I demonstrate here that they can cause troubles not just in article title names, but anywhere: in the wiki code (of pages but more seriously as well code of templates), in the PHP code, or in a Lua module, if they are left invisible there (or inserted by some magic or bugs in the wiki editors using one of these unexpectedly altered PHP or Lua modules).

such titles can be *restricted* to accounts with a certain user right and a warning can be displayed to someone who is doing it.

Well the restriction part can probably be done with a simple titleblacklist entry? Though I am not sure how expensive it would be.