Page MenuHomePhabricator

Sorting keys for Russian Wikimedia projects
Closed, ResolvedPublic

Description

Author: dubrow

Description:
Please add to InitialiseSettings.php the following code:
'ruwiki' => 'uca-ru',
'ruwikibooks' => 'uca-ru',
'ruwikinews' => 'uca-ru',
'ruwikiquote' => 'uca-ru',
'ruwikisource' => 'uca-ru',
'ruwikivoyage' => 'uca-ru',
'ruwiktionary' => 'uca-ru',

That will correct the error with sorting of "Ё" letter. The change was proposed by ru.wikitionary.org administrator DonRumata in
http://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%97%D0%9A%D0%A2%D0%90#.D0.A1.D0.BE.D1.80.D1.82.D0.B8.D1.80.D0.BE.D0.B2.D0.BA.D0.B0_.C2.AB.D0.81.C2.BB

then discussed and accepted on
http://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%A4%D0%BE%D1%80%D1%83%D0%BC/%D0%A2%D0%B5%D1%85%D0%BD%D0%B8%D1%87%D0%B5%D1%81%D0%BA%D0%B8%D0%B9#.D0.A1.D0.BE.D1.80.D1.82.D0.B8.D1.80.D0.BE.D0.B2.D0.BA.D0.B0_.C2.AB.D0.81.C2.BB:_.D0.BA_.D0.BF.D0.BE.D1.85.D0.BE.D0.B4.D1.83_.D0.BD.D0.B0_.D0.B1.D0.B0.D0.B3.D0.B7.D0.B8.D0.BB.D0.BB.D1.83

The similar problem was resolved for Ukrainian wikis, see bug 45776


Version: unspecified
Severity: enhancement

Details

Reference
bz52997

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:55 AM
bzimport set Reference to bz52997.
bzimport added a subscriber: Unknown Object (MLST).

Change 79770 had a related patch set uploaded by Andrey Kiselev:
(bug 52997) $wgCategoryCollation to 'uca-ru' on all Russian-language

https://gerrit.wikimedia.org/r/79770

Why "Ё" letter hasn't been added to $tailoringFirstLetters for Russian language like it done for Byelorussian? I think it should be done for Russian also.

This change should fix sorting pages not only within sections, but sorting of section's names also. For example:

А

  • Аааа
  • Аббб

...

Е

  • Ееее
  • Еёёё

Ё

  • Ёааа
  • Ёббб
  • Ёеее
  • Ёёёё

I don't know whether you need to change a variable $tailoringFirstLetters for that.

The uca-ru collation does not appear to have 'Ё' (U+401 CYRILLIC CAPITAL LETTER IO) being a distinct letter from Е (U+415 CYRILLIC CAPITAL LETTER IE). Ё is considered to be an accented version of Е. This means it does not get its own section, and is considered the same unless there is a tie.

For example Еb, Еd, Ёd, Еk, Ёz would get sorted like:

Е

  • Еb
  • Еd
  • Ёd
  • Еk
  • Ёz

Note how Е and Ё are considered the same, and sort under the Е header, except in the case of a tie like 'Ёd' vs 'Еd' where 'Е' sorts first.

This appears to be different from what you are expecting. Is this still considered ok behaviour.

dubrow wrote:

Yes that's what we need, and that was discussed. This behaviour is consistent with the academic rules of words sortind for dictionaries contrary to current state which is totally unacceptable.

(In reply to comment #5)

Yes that's what we need, and that was discussed. This behaviour is consistent
with the academic rules of words sortind for dictionaries contrary to current
state which is totally unacceptable.

Ok, just double checking because that's different from what I thought you were talking about in comment 3.

In this case, $tailoringFirstLetters does not need to be modified.

Change 79770 had a related patch set uploaded by TTO:
(bug 52997) $wgCategoryCollation to 'uca-ru' on all Russian-language

https://gerrit.wikimedia.org/r/79770

(In reply to comment #6)

(In reply to comment #5)

Yes that's what we need, and that was discussed. This behaviour is consistent
with the academic rules of words sortind for dictionaries contrary to current
state which is totally unacceptable.

Ok, just double checking because that's different from what I thought you
were
talking about in comment 3.

In this case, $tailoringFirstLetters does not need to be modified.

No, 'Ё' isn't an accented version of 'Е'. This is a separate letter in Russian. The pages, that begins with a capital letter 'Ё', should gets its own section for dictionaries, like I described in comment 3. Sorting algoritm should contains a rules to compare 'Ё' > 'Е' and 'ё' > 'е'.

See [[w:Russian alphabet]] and [[Yo (Cyrillic)]].

dubrow wrote:

We in ru-Wikipedia have discussed the "accented" version and some two of our users even voted against this proposal beacuse they wanted "separate" sorting. I don't know why the version accepted by acadenic dictionaries do not satisfy ru-Wiktionary. Well, if it may be done for ru-Wikipedia only, just do it please and let the other projects choose their own method.

Can make different rules for ru-Wikipedia and other Russian-language wiki projects?

(In reply to comment #10)

We in ru-Wikipedia have discussed the "accented" version and some two of our
users even voted against this proposal beacuse they wanted "separate"
sorting.
I don't know why the version accepted by acadenic dictionaries do not satisfy
ru-Wiktionary. Well, if it may be done for ru-Wikipedia only, just do it
please and let the other projects choose their own method.

Any project can chose whatever method they want. Provided there is agreement on the project in question and the folks on the project know what the discussion is about

(In reply to comment #11)

Can make different rules for ru-Wikipedia and other Russian-language wiki
projects?

Rules are per project not per lang. However, We can't easily make custom sorting orders that aren't part of the cldr collations.

Change 79770 merged by jenkins-bot:
$wgCategoryCollation to 'uca-ru' on all Russian-language wikis

https://gerrit.wikimedia.org/r/79770

Should be done now. Please reopen this bug if it is not working as expected.

(In reply to comment #14)

Should be done now. Please reopen this bug if it is not working as expected.

Not exactly. The migration script for ruwiki is still running. The other projects are complete

Ah, forgot about that. See, I should just leave it to those who know what they're doing :)

It seems unclear if there was actually consensus for all russian projects per comment 10 (?)

dubrow wrote:

(In reply to comment #17)

It seems unclear if there was actually consensus for all russian projects per
comment 10 (?)

We're absolutely happy, many thanks!