Page MenuHomePhabricator

Malayalam language characters don't work well with mediawiki
Closed, ResolvedPublic

Description

I asked the following task (3 username renames - a swap) from the local bcrat:

  1. ഉപയോക്താവ്:കമ്പ്യൂട്ടര് -> ഉപയോക്താവ്:കമ്പ്യൂട്ടര്‍ temp
  2. ഉപയോക്താവ്:WOPR -> ഉപയോക്താവ്:കമ്പ്യൂട്ടര്‍
  3. ഉപയോക്താവ്:കമ്പ്യൂട്ടര്‍ temp -> ഉപയോക്താവ്:WOPR

Step 1 failes with the error: The username "കമ്പ്യൂട്ടര്‍ temp" is invalid

Related local wiki discussion:

http://ml.wikipedia.org/wiki/%E0%B4%89%E0%B4%AA%E0%B4%AF%E0%B5%8B%E0%B4%95%E0%B5%8D%E0%B4%A4%E0%B4%BE%E0%B4%B5%E0%B4%BF%E0%B4%A8%E0%B5%8D%E0%B4%B1%E0%B5%86_%E0%B4%B8%E0%B4%82%E0%B4%B5%E0%B4%BE%E0%B4%A6%E0%B4%82:Vssun#My_bot


Version: unspecified
Severity: normal

Details

Reference
bz11162

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:48 PM
bzimport set Reference to bz11162.
bzimport added a subscriber: Unknown Object (MLST).

sadik.khalid wrote:

chillu (Malayalam)

Malayalam language characters don't work well with mediawiki (chillu problem)

Attached:

chillu.png (221×637 px, 18 KB)

sadik.khalid wrote:

Look at the above attachment the last character make problems

The username in comment #3 has a U+200D Zero-Width Joiner control character at the end. This is in a blacklisted control character range, and is not currently allowed in usernames. It also looks totally incorrect, seeing as how it comes at the end of a name, not really a valid place for one even if it was allowed.

Remove that last character and it should be accepted (confirmed on a local installation.)

jacob.jose wrote:

Could you elaborate why is "U+200D Zero-Width Joiner control character at the end" blacklisted? Many Malayalam words need this at the end to represent their meaning.

I can't see how we can say mediawiki software is fully unicode compliant without this support. So I would very much like to hear about the technical concerns.

Well, the fact that it's invisible and hard to cut-n-paste makes it a bit tricky to manage. :P

We've generally forbidden most magic invisible chars from usernames for security (spoofing etc) purposes.

Right and thats a good practice on all usernames that are not in Malaylam. In the case of Malaylam that is a different story.

Perhaps a solution is to let bureaucrats rename users to these 'invisible' characters while banning users from creating accounts with such characters. That way vandals won't be able to abuse this and good users would benefit from it.

I believe the validity check (weather a username is valid or not) used by new username creation and username rename is conducted by the same block of code.

jacob.jose wrote:

Please hold from making any fixes. This seems is part of a much bigger issue with Malayalam Unicode. I am withdrawing my vote for now.