Page MenuHomePhabricator

Swedish-language wikis should use Swedish-locale sorting (ie. ÅÄÖ should sort correctly)
Closed, ResolvedPublic

Description

MediaWiki sorts the Swedish letters Å, Ä and Ö in the order ÄÅÖ, but in the [[Swedish alphabet]] they are in the order ÅÄÖ.

Questions about this have been asked several times at Swedish Wikipedia. Usually the answer has been this is bug 164 in MediaWiki. But now that bug is marked fixed and says other sorting orders are possible.

It would be nice if Swedish Wikipedia (and maybe also other Swedish projects) used a sorting order where those letters are in Swedish alphabetical order.

(A current question about this is at [[sv:Wikipedia:WF#ABC....C3.84.C3.85.C3.96]])


Version: unspecified
Severity: minor
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=45446

Details

Reference
bz29788

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:34 PM
bzimport set Reference to bz29788.
bzimport added a subscriber: Unknown Object (MLST).

I..... _think_ Collation::factory() would need to be extended to accept new collation names like 'uca-sv' (?) which would then return an IcuCollation with a different base locale ('sv_SE'?)

And then .... $wgCategoryCollation is a per-site setting. Ugh! This all feels awfully awkward.

Of course this will still only apply to category sorting, not to alpha sorting of other page lists, user lists, etc.

(In reply to comment #1)

I..... _think_ Collation::factory() would need to be extended to accept new
collation names like 'uca-sv' (?) which would then return an IcuCollation with
a different base locale ('sv_SE'?)

I think that's pretty much what Aryeh intended when he built this system. I do know for sure that he intended creation of e.g. Swedish collations to be possible.

And then .... $wgCategoryCollation is a per-site setting. Ugh! This all feels
awfully awkward.

How is it awkward that this is per-site? We can just set it on all Swedish-language wikis. I guess ideally you'd be able to set it per-category, I've heard that request before.

Of course this will still only apply to category sorting, not to alpha sorting
of other page lists, user lists, etc.

Yup, that's right.

But now that bug is marked fixed and says other sorting orders are possible.

Its "fixed", but the fix is not turned on ;)

(In reply to comment #1)

I..... _think_ Collation::factory() would need to be extended to accept new
collation names like 'uca-sv' (?) which would then return an IcuCollation with
a different base locale ('sv_SE'?)

That's not even necessary - the default uca-default collation sorts things fine for Swedish (or at least in my test A Å Ä Ö was sorted in the order of AÅÄÖ). So once Wikimedia wikis switch to uca-default from uppercase collation, this issue should disappear.

per-category

That's bug 28397 for reference.

(In reply to comment #3)

That's not even necessary - the default uca-default collation sorts things fine
for Swedish (or at least in my test A Å Ä Ö was sorted in the order of AÅÄÖ).
So once Wikimedia wikis switch to uca-default from uppercase collation, this
issue should disappear.

The Swedish alphabet is A B C ... X Y Z Å Ä Ö . Does it sort correctly with that in mind?

Whoops. No it doesn't, so some specific sv collation would have to be created.

Adding bug 164 as a dependency. Reopened it since there's no way to set a Swedish locale currently without writing additional code, which seems rather nonsensical.

lowering priority since this isn't going to be fixed THIS WEEK.

mats wrote:

This is ridiculous! How can such a bug as this be present?
Immagine if x was sorted after z, how could ANYONE treat this as anything but top prio?

(In reply to comment #8)

This is ridiculous! How can such a bug as this be present?

Very likely because nobody has fixed the problem yet. Contributing patches is very welcome to speed up the process. See http://www.mediawiki.org/wiki/Developer_access for more information.

Also, an example link would be great so everybody can see the problem in practice.

Immagine if x was sorted after z, how could ANYONE treat this as anything but
top prio?

Because it is not the same category as "Swedish Wikipedia completely down" or "All content on Swedish Wikipedia is scrambled" or "No images load anymore".

mats wrote:

A suggested fix was posted here http://sv.wikipedia.org/wiki/Wikipedia:Wikipediafr%C3%A5gor#Sortering.2C_finns_det_n.C3.A5n_workaround.3F but unfortunately it doesn't work in my installation, any suggestions are apreciated.

My test site: http://privat.mohsart.se/mats/tbwiki/index.php?title=Kategori:%C3%85%C3%A4%C3%B6

In a perfect world, á, à etc would be sorted as a etc. and the last Swedish characters as åäö after xyz.

(The suggested fix for tables works btw)

Or the same as sorting is broken on all wikis. Plus there is a workaround of specifing fake sortkeys, albeit an annoying workarround that doesnt work that great

I know it can be frustrating waiting for a fix for an issue that's important to you, but everyone has an issue that they think is super important. They can't all be top priority.

If it makes you feel better, this is an issue that I would really like to see fixed and will work on in some mythical future when I have less to do

(In reply to comment #10)

A suggested fix was posted here
http://sv.wikipedia.org/wiki/Wikipedia:Wikipediafr%C3%A5gor#Sortering.
2C_finns_det_n.C3.A5n_workaround.3F
but unfortunately it doesn't work in my installation, any suggestions are
apreciated.

My test site:
http://privat.mohsart.se/mats/tbwiki/index.php?title=Kategori:
%C3%85%C3%A4%C3%B6

In a perfect world, á, à etc would be sorted as a etc. and the last Swedish
characters as åäö after xyz.

(The suggested fix for tables works btw)

That looks like it would work. Ideal world use something based on uca but no point waiting for sn ideal world.

On your test wiki you may have to run updateCollation.php

mats wrote:

Thanks, updateCollation.php did not help, if this http://www.mediawiki.org/wiki/Manual:Upgrading#Web_updater is the correct way to run it.

(In reply to comment #13)

Thanks, updateCollation.php did not help, if this
http://www.mediawiki.org/wiki/Manual:Upgrading#Web_updater is the correct way
to run it.

I believe the web updater should work.

I'll try to poke at the code either later today or tomorrow.

(In reply to comment #14)

(In reply to comment #13)

Thanks, updateCollation.php did not help, if this
http://www.mediawiki.org/wiki/Manual:Upgrading#Web_updater is the correct way
to run it.

I believe the web updater should work.

I'll try to poke at the code either later today or tomorrow.

Works good on my test install. Sorting order is ...Z Å Ä Ö

Next step to getting this code deployed to svwiki would be to put it in the Wikimedia version control. Who is the author of the extension (for the credit line) and what copyright license is it under (GPL is a good choice if you don't care which one).

I am the author and I am credited as "Lejonel" for my other contributions to MediaWiki. This code is mostly based on UppercaseCollation in MediaWiki core, so I think the code has to use the same GPL license.

(Thanks for testing this and helping making it a usable extension for svwiki.)

(In reply to comment #17)

I am the author and I am credited as "Lejonel" for my other contributions to
MediaWiki. This code is mostly based on UppercaseCollation in MediaWiki core,
so I think the code has to use the same GPL license.

(Thanks for testing this and helping making it a usable extension for
svwiki.)

I requested a git repository be created to put the extension in. ( https://www.mediawiki.org/wiki/Git/New_repositories/Requests )

Initial commit of extension in gerrit change 43372. Next step is to get a senior developer to review the extension and deploy it.

(In reply to comment #19)

Initial commit of extension in Gerrit change #43372. Next step is to get a
senior developer to review the extension and deploy it.

Now gerrit change Id39406c3 as a core change

mats wrote:

(In reply to comment #21)

Could someone familiar with this change help this user:
https://www.mediawiki.org/wiki/Thread:Project:Support_desk/
I_suspect_a_problem_with_my_installation...

:-D
I asked there and not here, because I thought this was the wrong forum for my problems

Ive been exchanging emails with the user. I have no idea why he is experiancing the behavoir he is describing.

mats wrote:

Did someone verify this on MediaWiki 1.20.2?
If so, could that person please email me all relevant files (except for localsettings.php ofc), ie extensions/SwedishCollation.php, includes/Collation.php, and if there are other files that could affect the behaviour...

Thanks,

mats@mohsart.se

Tim merged the change into core. Next step is to get it enabled on svwiki.

(In reply to comment #25)

Tim merged the change into core. Next step is to get it enabled on svwiki.

Sounds like the MediaWiki part of this is done then.

(In reply to comment #26)

(In reply to comment #25)

Tim merged the change into core. Next step is to get it enabled on svwiki.

Sounds like the MediaWiki part of this is done then.

Well no one has switched the config yet. But I suppose that could be a separate bug.

I submitted I838484b9 to fix this "properly".

(Removing bug 31235 tracker, the idea of making this an extension was dropped.)

Reopening. This needs cleaning up now that I838484b9 is merged.

I propose removing the 'uppercase-sv' collation entirely, and setting sv.wikipedia to 'uca-sv' collation. I created bug 45446 to track this.

(In reply to comment #30)

I propose removing the 'uppercase-sv' collation entirely

Doing this in I2cd22ad8.

Merged, so I'm finally making this bug as properly RESOLVED FIXED.

Merged, so I'm finally making this bug as properly RESOLVED FIXED.

The original version of this bug was Wikimedia bug "Sort Swedish letters ÅÄÖ correctly on Swedish Wikipedia" is not fixed. But I assume that is now the same (more or less) as bug 45446.

Someone changed this to MediaWiki bug "Swedish-language wikis should use Swedish-locale sorting (ie. ÅÄÖ should sort correctly)". Is this really fixed? Now it is possible to use Swedish sorting, but I think creating a newly installed Swedish language wiki will still use the old non-Swedish sort order.

I split off that bug mostly for clarity, as most of the discussion here is no longer relevant.

And yes, the default is still the 'uppercase' collation. You inspired me to create bug 45611 to discuss changing this, as I'm not sure if this is a good idea, and I have no idea how to go about this.