Page MenuHomePhabricator

Proper collation support in categories for Ukrainian wikis
Closed, ResolvedPublic

Description

This problem is reproducible on the Ukrainian Wikipedia.
Some Ukrainian page names do not appear in the correct order when listed in anĵ category. The problem is about letters "Єє", "Іі", "Ґґ" and probably some others. Right now pages whose names start with "Є" and "І" appear before the rest of the alphabet and the whose that start with "Ґґ" - after the rest of the alphabet. The correct order should be the following:
А а
Б б
В в
Г г
Ґ ґ
Д д
Е е
Є є
Ж ж
З з
И и
І і
Ї ї
Й й
К к
Л л
М м
Н н
О о
П п
Р р
С с
Т т
У у
Ф ф
Х х
Ц ц
Ч ч
Ш ш
Щ щ
Ь ь
Ю ю
Я я


Version: unspecified
Severity: enhancement

Details

Reference
bz41040

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:49 AM
bzimport set Reference to bz41040.

We currently have uk wiki sorting based on codepoint order.

There is some support in mediawiki for sorting based on language order. Its still a little experimental, and currently only active on Portuguese wikipedia (And I'm also unsure if the uca-default collation would be sufficient for Ukraine or a custom collation would be needed).

[There is some information about this feature at http://www.mediawiki.org/wiki/Manual:$wgCategoryCollation ]

Could you test whether uca-default works for Ukrainian?

Hmm, seems wrong, but in different ways.

Have a look at the page https://pt.wikipedia.org/wiki/Categoria:Alfabeto_cir%C3%ADlico which should have most of those letters and sorted using the uca code

Actually it works perfectly. Additionally I created pages with all letters of Ukrainian alphabet in my user space and put it into some temporary category: https://pt.wikipedia.org/wiki/Categoria:Temp_category

When would it be possible to change the collation of Ukrainian Wikipedia to uca-default? Should we wait for the next mediawiki deployment cycle or is it possible to do sooner?

(In reply to comment #4)

Actually it works perfectly. Additionally I created pages with all letters of
Ukrainian alphabet in my user space and put it into some temporary category:
https://pt.wikipedia.org/wiki/Categoria:Temp_category

My apologies. I didn't realize that there were {{DEFAULTSORT}}'s on the category I was looking at which messed up the order.

Looking at some collation charts, it seems like the main (only?) difference from the default UCA collation for ukranian is the treatment of Ґ/ґ. I think (from what I'm reading, you would know better than I though) that Ґ should be considered a "different" letter than Г (To be technical they should have different primary weights) this would mean that in uca-default, Ґ doesn't get its own header in the category (and might sort more like a case difference rather then a letter difference). I've added some examples to your test category with Ґ/ґ. Please make sure it is what you expect/acceptable.

(I'm not sure how serious that is).

When would it be possible to change the collation of Ukrainian Wikipedia to
uca-default? Should we wait for the next mediawiki deployment cycle or is it
possible to do sooner?

Such config changes are generally unrelated to mediawiki deployment cycles so can be done at any time

Ґ is a completely separate letter which goes after Г and before Д in the Ukrainian alphabet (see http://en.wikipedia.org/wiki/Ukrainian_alphabet#Alphabet)
Currently not only sort order of "Ґґ" is a problem but also "Єє", "Іі" and "Її" are not sorted correctly in Ukrainian Wikipedia.

uca-default suits perfectly as far I can see from Portuguese Wikipedia, so using of this collation for Ukrainian Wikipedia should solve the problem there too.

Sorry, I haven't got what you said in the first place. Now I see what you meant. Still it is better to have at least correct sort order even without separate category header.

(In reply to comment #7)

Sorry, I haven't got what you said in the first place. Now I see what you
meant. Still it is better to have at least correct sort order even without
separate category header.

Note its not just the category that's off. The actual sorting will be wrong for that letter will be off in words with multiple letters. See the examples I added to your test category.

(In reply to comment #8)

(In reply to comment #7)

Sorry, I haven't got what you said in the first place. Now I see what you
meant. Still it is better to have at least correct sort order even without
separate category header.

Note its not just the category that's off. The actual sorting will be wrong
for
that letter will be off in words with multiple letters. See the examples I
added to your test category.

Sorry, earlier I was typing on my phone and couldn't type non-english letters. To be more explicit, if you have the following pages: Г, Ґ, ГА, ҐА, ГЦ, ҐЦ

The expected order is (I believe, correct me if I'm wrong): Г, ГА, ГЦ, Ґ, ҐА, ҐЦ

But uca-default orders them as: Г, Ґ, ГА, ҐА, ГЦ, ҐЦ

You are right, sorry for my incompetent comments. I personally believe that while uca-default is not really a solution, but still it is a better option than the current collation.

Are there any chances to solve this problem in general (I guess it is not the problem of Ukrainian Wikipedia only) by adding more collations? This problem has been in MediaWiki for many years and I guess it could be time now to solve it finally

Change I838484b9 should fix it.

Merged - marking as fixed.

This ability is now available in the software. I created bug 45444 to discuss and implemented deploying it on the Ukrainian Wikipedia.