Page MenuHomePhabricator

Request Sorting Thai Wikipedia (and sister projects) with UCA
Closed, ResolvedPublic

Description

Nowadays, we have confronted the problem about sorting pages in categories;
they are grouped only very first letter.
But the nature of Thai language has some front-vowels -- เ แ โ ใ ไ -- before consonants
such as "เดียว", it should be ordered under "ด", should not be under "เ".

Additionally, tone marks are treated as primary sortkeys
that causes the ordering messes up,
"ด้วย" comes after "ดิน" which is incorrect for example.

Some time ago, a makeshift is made for the problem by using DEFAULTSORT.
But consequently, it must be added everytime we created them;
it labours our users.

Since 1.17, there is a correct way to sort these pages:
$wgCategoryCollation is born to solve the problem.

UCA has pretty good collation for Thai characters (and other alphabets too):
the front-vowels are sorted after their consonants,
the tone marks are sorted as secondary level.
http://www.unicode.org/charts/uca/

I must request adding this to Thai Wikipedia (and also other Thai sister projects):

$wgCategoryCollation = 'uca-default';

This will solve the long-time issue that we have experienced.
(Please don't forget to run update script.)

I have tried in my localhost and it works well.
"ด้วย"→"ดิน"→"เดียว" are correctly sorted under "ด" without DEFAULTSORT.


Version: wmf-deployment
Severity: enhancement

Details

Reference
bz48097

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:39 AM
bzimport set Reference to bz48097.
bzimport added a subscriber: Unknown Object (MLST).

sorting test on my localhost (plz don't care the category name)

Attached:

category-sorting.PNG (237×281 px, 4 KB)

prior incorrect sorting result

Attached:

category-sorting-wrong.PNG (233×259 px, 3 KB)

Related URL: https://gerrit.wikimedia.org/r/63661 (Gerrit Change I850aac8ca4da89b5e3c27178b26c3d98da02b235)