Page MenuHomePhabricator

Set $wgCategoryCollation to 'uca-sv' on Swedish Wikipedia and rebuild category sort keys
Closed, ResolvedPublic

Description

Set $wgCategoryCollation to 'uca-sv' on Swedish Wikipedia and rebuild category sort keys

Needs community notification and discussion.

Split off from bug 29788.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=29788

Details

Reference
bz45446

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:28 AM
bzimport set Reference to bz45446.

Assuming this change will sort ABC...XYZÅÄÖ correctly there have been multiple community discussions on Swedish Wikipedia agreeing that the old sort order was a bug needing fixing. We are just waiting for the bug to be fixed (first bug 164, then bug 29788, and now this bug). Sorting diacritics (other than ÅÄÖ) better (ÁÀÂ... as variants of A, ÇČ.. as variants of C, and so on) is a bonus, but already worked around in many cases with DEFAULTSORT. Unless there are very strange changes in sorting of other characters there should be no opposition to changing this setting.

I created a testwiki for you at http://users.v-lo.krakow.pl/~matmarex/testwiki-sv/ to verify that the behavior is indeed correct. Feel free to use it for testing and link in on-wiki, but be aware that I'll probably kill it once the change is performed.

Please link some of the discussions (preferably ones with results clearly indicated with yes/no icons :) ) so I can suggest the change with a clear conscience ;) This will have to wait at least for the deployment of 1.21wmf11 anyway: https://www.mediawiki.org/wiki/MediaWiki_1.21/Roadmap

And yes, that's exactly what this change will do.

And if there are no such discussions, it would be nice to hold one, even if it's just a formality. I am not a WMF employee, but their policy is clear – a configuration change (especially one that is this disruptive) can only be made if there's obvious consensus.

There's no hurry, especially since this change can only be made after MW 1.21wmf11 is deployed on March 13.

Here's a very similar voting/discussion I created on pl.wikipedia, regarding the same change, but for Polish: short explanation, voting and comment with yes/no icons. You can link the testwiki I created there.

https://pl.wikipedia.org/wiki/Wikipedia:PR#Zmiana_konfiguracji_.E2.80.93_w.C5.82.C4.85czenie_poprawnego_sortowania_artyku.C5.82.C3.B3w_na_stronach_kategorii

The sorting looks good at the test wiki. Thanks for making that available.

The discussions on Swedish Wikipedia are mostly someone asking "Why are Å and Ä in wrong order?" and someone else answering "It's a bug" then maybe followed by discussing if anything can or is being done to fix it. Fixing an obvious bug is not the kind of discussions that would get long lists of supporters (also Swedish Wikipedia generally avoids votes with icons). One such discussion is [[sv:Wikipedia:Wikipediafrågor/Arkiv/2011#ABC...ÄÅÖ]], which ended with submitting bug 29788 "Sort Swedish letters ÅÄÖ correctly on Swedish Wikipedia". How the servers are set up technically to achieve this is better decided by Wikimedia technical staff than by a Wikipedia user vote. But if a vote is needed I am sure it can be arranged.

I think this can proceed without another voting, the community has made it pretty clear they do want it :) Ib357adba.

Community was notified and agrees to this change at the local Village Pump: [[sv:Wikipedia:Bybrunnen#Svensk_sorteringsordning_i_kategorier_.28.C3.85.C3.84.C3.96.2C_inte_.C3.84.C3.85.C3.96.29]]

Unfortunately there is a problem with words starting between "Th" and "Tö". They are sorted in the right order. But they get sorted under letter "Þ", and not under letter "T" like words between "T" and "Tg".

I think sorting letter "Þ" as "th" is fine, but then it should go under a letter "T" heading.

This would probably not happen if bug 43740 was fixed. (Thorn is expanded to "th", which ideally would get removed during the prune primary collision step but doesnt)

In the mean time should probably have a system for remmoving certain elements from the big list of first-letters for certain collations (opposite of the current $firstLetters array that adds elements to the big list)

Submitted I57e07a20 to fix this. Deployed on my testwiki, where it seems to solve the issue.

Once it's merged and deployed on sv.wiki, the following has to be done to fix the categories:

  • remove the entry for first-letters from the object cache
  • re-run updateCollation.php with a --force argument

Only purging first-letters:sv (or i suppose the full key would be svwiki:first-letters:sv) from memcache after merging this change is neccesary. Re running the script should not be needed.

mysql:wikiadmin@db1034 [svwiki]> select count(cl_collation), cl_collation from categorylinks group by cl_collation ;
+---------------------+--------------+

count(cl_collation)cl_collation

+---------------------+--------------+

4556745uca-sv

+---------------------+--------------+
1 row in set (11.04 sec)

(In reply to comment #8)

Once it's merged and deployed on sv.wiki, the following has to be done to fix
the categories:

  • remove the entry for first-letters from the object cache
  • re-run updateCollation.php with a --force argument

For the latter, I think we might want to hold off (if possible).. Tim is/was going to do some server side ICU upgrades, which will then require for all the wikis on uca-* to be re-run with --force

Reedy tried to do some cache purging on IRC today and failed. I have no idea what is going on, and it seems neither has he. :)

Worst case, we'll just have to wait a week for the cache to expire and hopefully it'll start working properly by itself. Sorry about the mess.

Example category page with thorn ('Þ') visible: https://sv.wikipedia.org/wiki/Kategori:Svenska_kokboksförfattare?action=purge

Maybe CACHE_ANYTHING goes to a different cache then was being purge(?)

So it seems like we finally managed to purge the right servers. I'm making this as RESOLVED FIXED.

(If any category pages are still looking funny, they just need action=purge.)