Page MenuHomePhabricator

Interwiki lists sort in phonetic, site-defined order
Open, LowPublicFeature

Description

Many site's admins have talked about making the bot framework order interwiki
lists according to the site's requirements. I think this should be the feature
of the mediawiki instead. Site's admin should be able to specify that for the
scandinavian site that she manages, all other scandinavian links should be
listed first, after which they can be ordered alphabetically. ( see
http://en.wikipedia.org/wiki/User_talk:Yurik#Nynorsk_Wikipedia )

On top of the default configuration, the user should be able to override those
settings. For example, when i browse a russian wikipedia, i would like english
and french listed first.


Version: unspecified
Severity: enhancement
URL: http://meta.wikimedia.org/wiki/Interwiki_sorting_order

Details

Reference
bz2867

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:42 PM
bzimport set Reference to bz2867.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 740
Patch to allow sorting customization (per language)

Attached is the proposed changes to skin.php, language.php, and names.php. With
this patch (tested on my own mediawiki installation), each language can be
customized to make any languages appear first in the interwiki list.

Attached:

utne wrote:

Great work, Yuri!

Will the interwiki links chosen to be on top be repeated in the general list, or are they “used up”? Is that (to repeat or not to
repeat) configurable in your solution? In either case, this seems to be a solid step forward! -Olve ( http://nn.wikipedia.org/wiki/
Brukar:Olve )

First, why this patch was created (from IRC discussion:)

  1. interwikies should be sorted in a "phonetic" order - this way finish language

suomi (code 'fi') goes after russian ('ru')

  1. some languages are closelly related - like slavik or scandinavian. They want

their sister sites to be listed first

  1. some sites may prefer to have english as their first choice simply because of

being lingua-franca

  1. this opens the path for user-level sorting - i, as a user, would like to keep

languages that i know at the top, instead of looking through 50 different links

The iw links are not repeated - if you want 'en' to appear at the top of the
list, it will not be included the second time in the regular phonetic order.
Otherwise imagine if the page has only few common links, all of a sudden they
are all duplicated.

utne wrote:

(In reply to comment #3)
OK -- the question arose locally on nn:, so I had to ask... :)
It seems to me that the advantages of this new solution
outweigh the disadvantages by far, so I am all for it.

  • Olve

The patch should be fixed to allow site administrators to change local ordering
without asking devs to change the language_XX.php file, similar to the way
localization and other items can be done.

utne wrote:

Twenty votes for this bug and no reaction from any developer except the one who posted the bug... (Thank you, Yuri!)
I hope that something can start happening here soon!

wikipedia wrote:

I think they are ignoring it, cause they probably think this results in a
performance decrease.
Many interwiki links are submitted using Python Wikipedia Robot Framework
http://sourceforge.net/projects/pywikipediabot/
We should probably fix this there in interwiki.py. Also, we could make a machine
readable conventional page Wikipedia:interwikiconv to describe
that conventions.

Two popular conventions are detailed in the comma-separated string format at
[[m:Interwiki sorting order]].

  1. Performance - sorting a small array of strings is not very expensive if you compare it with database and

bandwidth limitations. With the current state of CPU power, its really negligible. Compared with the other
processing, such as tons of regex expressions and parsing dates to change time zone is substantially more CPU
intensive.

  1. Interwiki bot - there is already code in there that sorts interwikies. The missing part is the ability for

individual site admins to easily alter the list.

  1. Implementing this in mediawiki will allow per-user customizations (I would like to see the languages I know

first!)

  1. I think the reason for no activity is two-fold: A) The existing patch does not allow for dynamic customizations using special:AllMessages page. (need to

rewrite that)

B) (speculating) English version tend to have much higher priority than other sites, and this is clearly an

internationalization issue.

So, will this patch be added? It would be extremely useful (and I know a lot of
Wikipedias are eager to get this).

It's too low on developer's priority :(, thus more compaining has to be done to get the vote rate up, as well as ask them about
it on the IRC channel at irc://irc.freenode.net/mediawiki .

robchur wrote:

Alright, hold up. First of all; yes, we have priorities. No, we don't ignore
patches. I'm going to review it right now, in fact. Performance MIGHT be an
issue, as might the effects on caching, so those will need to be taken into
consideration. Plus a user above mentioned something about localisation, so I
need to check that's been done properly. But don't make blanket, "they don't
care" statements, because it's patently false.

Rob, noone is accusing developers of not carying - the fact the wikipedia is still up and improving fast is the testiment to
that! What is at issue is bringing what's important to the various users of wikipedia to developerers attention, to show which
features are of higher value, and which might be delayed. That is why vote counts and telling site admins of a possible
solution that may (or may not) be helpful are important.

The localization I mentioned above is not about localization, but the process of changing settings with Special:AllMessages
page rather than modifying language.php file (the way the patch currently does it). The code should be changed to allow
Special:AllMessages method to be trully usable.

Thank you for your help with the issue!
--Yuri

Didn't I wontfix this already? Order should be consistent across all languages to
aid in navigation, *not* "site-specific".

j-ha-s wrote:

Any list of 200+ language names in 200+ different languages is *not* aiding
navigation. The list should be user configurable.

utne wrote:

Brion:

Yes, I do believe you "wontfix"ed this one. Errare humanum est... ;-)

As many people have already pointed out here, it is important for many
wikipedias (especially in smaller and/or localised languages) that the display
order is one that draws out those languages that are actually of use to the
"average" reader of that language.

An additional reason for having this system is that it will make sorting
independent of code order on the edit page. In practice, this means:

The input order can be strictly alphabetical by code. This is a vast advantage

for interwiki fixers, since they don't have to know the specific policy of each
wikipedia.

  1. The displayed interwikis can be arranged according to local needs:
    1. Alphabetically by language name. This order is convenient for wikipedias in

the largest international languages, such as the English, Spanish, Portuguese,
Chinese, French and Arabic ones.

Local or related languages first, then others alphabetically by language

name. This is extremely practical for closely-related language clusters like
Swedish/Danish/Bokmål/Nynorsk, Serbian/Croatian/Bosnian, Czech/Slovakian,
Hindi/Urdu/Panjabi, as well as for many of the smaller languages within,
''e.g.'', the Germanic and Romance language groups.

The locally best-understood international languiage/s first (''e.g.''

English, others; French, Spanish, others; or French, Portuguese, others). This
order is particularily helpful for wikipedias in languages which are in their
initial stage of building a (or any) encyclopaedia and which have stronger links
to this/these international language/s than to local/related languages.

Please take this request seriously (no "wontfix", in other words) even though
the matter seems unimportant to the wikipedias you work on. There are plenty of
other wikipedias that would benefit greatly from such a project!

Respectfully,

Olve

bjarte wrote:

Seconded,

Bjarte Sorensen

trond.trosterud wrote:

Yes, it is important (I am active on the nn, se and fi wikipedias, and the average reader of these wikipedias is capable of reading 5 other
wikipedias. We thus want these neighbouring ones to be listed first). Trond.

stajohns wrote:

Given the significant growth of smaller language, the length of the interlanguage listings is fast becoming less
user-friendly. The intended 'at-a-glance'-functionality disappears when I have to browse through a lengthy list
of languages to find one I know I can read and understand.

Best regards
Ståle Johnsen

arp.kruithof wrote:

By pure coincidence a discussion on nl was just (re)started yesterday about the
weird sort orders by language name due to using the language codes for sorting.
It seems to me that it would be a very usefull feature to have, especially if
configurable at user-level so users can bump their prefered languages up. Of
course, if that would cause significant performance/caching problems it would
still be a very neat first step to have it configurable per wikipedia.

Cheers
Arp Kruithof

Ulf.Lunde wrote:

Here's a voice in favor of the change
from me, too.

Even a per wikipedia generic customizability of the interwiki
order would be a *very* useful feature, and one which I hope
will be given top priority now that all the more serious
shortcomings (that I know of) have been fixed.

If, in addition, each user could (in some simple way) choose
to hide languages which she is not interested in, this could
be used to make the lists shorter and thus more user friendly.

Both "levels" of change should be weighed against the
performance issue; a change which would degrade the wiki's
speed, becomes less desirable.

Verdlanco

yonatanh wrote:

"Didn't I wontfix this already? Order should be consistent across all languages to
aid in navigation, *not* "site-specific"."

There's already more than one project that has a different order from the
default one. For example, the Hebrew Wikipedia has English come first and
afterwards it goes by order of the language prefix (Finish goes after en and
before it (under f) rather than right before sv (under s for Suomi). Maybe it
should be unified across all wikis but if so, the bots should be modified for
this and if not the option should probably be included in the software.

  • Bug 15990 has been marked as a duplicate of this bug. ***

The most logical default sorting is not phonetic, but Unicode.

Let me explain.

It doesn't actually make too much sense that Finnish (Suomi) would come after Russian (Русский). It does make a little sense, because Cyrillic is somewhat related to Latin - both have a letter for "R", although it looks different. But what if there was a language which is written in the Cyrillic alphabet, and its name begins with a "Ж"? It is transliterated into Latin as "ZH", but a speaker of that language would find it odd if it appeared at the end of the list, because in Cyrillic this letter is close to the beginning. So sorting Русский near the Latin R's happens to make some sense, but it is a lucky coincidence.

This problems occurs with Yiddish (יידיש): It is sorted near the end. Why? Because Y is near the end of the Latin alphabet? But the Hebrew letter י is near the beginning of the Hebrew alphabet in which Yiddish is written.

It makes even less sense that Hebrew (עברית) would come after Italian. It is suggested that it Hebrew would come after Italian, because a simple non-scientific translation of עברית is "Ivrit". The reality, however, is that Hebrew speakers don't think that the first letter of their language's name is an "I", but an "ע" (Ayin), which has no analog in the Latin alphabet; hence, there is no clever way to put עברית in a "phonetic order".

These are just a few of the problems with languages with which i am familiar. I don't know, for example, how convenient it is for a Japanese speaker to find his language at N (for Nihongo, i presume).

The only solution to this is to make the default language names adhere to the order of the scripts in Unicode. This means that language names will be grouped by script: Latin (French, Ban-lam-gu, Estonian), Cyrillic (Russian, Mongol, Sakha), Arabic (Arabic, Farsi, Urdu), Hebrew (Hebrew, Yiddish), Chinese (Mandarin, Cantonese, Yue), Devanagari (Hindi, Nepali) etc. These groups will appear in the order in which they appear in the Unicode standard. It is technocratic, but it is the most neutral way i can think of. Certainly better than putting עברית under I, which is not useful for Hebrew speakers.

And for the record - i support the option to have a language project define languages that will appear at the top. De facto, for Norwegian it's Swedish, Danish et al., for Hungarian and Hebrew it is English etc., and nothing is wrong about it. It makes Wikipedia convenient.

Removing need-review, the patch is out of date. Also, it could be beneficial if:

  1. Local communities were able to define sort order themselves, via a system message.
  2. The sorter attempted to find a reasonable place for unknown langcodes instead of throwing them to the bottom in undefined order.

The problem is pretty important, by the way.

utne wrote:

2006-03-18 16:33:53 UTC, Rob Church wrote:

No, we don't ignore patches.
I'm going to review it right now, in fact.

So -- any result yet? ;-)

  • Bug 28156 has been marked as a duplicate of this bug. ***

sumanah wrote:

I'm adding the "reviewed" keyword since the patch has been reviewed and, sadly, the passage of time has obsoleted it, per comment 25. Thank you for the bug report and the patch, Yuri.

I'm also marking this for the internationalization/localization team to look at, by adding the "i18n" keyword.

sbharti wrote:

Interestingly, we have similar suggestions for indian wikipedias (hindi, bengali, kannada, telugu etc.). It would be nice to list indian languages first on indian wikis with two options : 1) in general 2) user specific. It increases usability and popularity of/access to smaller wikipedias.

Adding Denny on CC, as they may be an issue that would be very prominent for the adoption/acceptance of Wikidata based interwiki links.

sumanah wrote:

Mailing list discussion on wikimediaindia-l: http://lists.wikimedia.org/pipermail/wikimediaindia-l/2012-February/thread.html#6755 starting with http://lists.wikimedia.org/pipermail/wikimediaindia-l/2012-February/006755.html

Indian community member's request: "automatically sort all the languages according to the language preferences" since "For Malayalam, a list starting from English, Hindi, Tamil, Kannada, Sanskrit, etc. [would be more useful] to many users than providing a list starting [with obscure] languages." (original: http://lists.wikimedia.org/pipermail/wikimediaindia-l/2012-February/006768.html )

I have submitted an updated patch as gerrit change 24211. This allows the sorting order to be set per wiki by the system message "interwiki config-sorting order". I didn't implement the whole shebang at [[m:Interwiki sorting order]] because I think that the patch will be more than adequate in 99.9 % of all cases.

KaewWiki wrote:

There is additional comments in Thai from the link above that may benefit this threads.

The sorting order for non-registered users may also be defined by the user's browser setting (e.g. language), cookies, IP address (=location).

Wondering what is needed to get that rotting patch on its way again...

The internationalization related bits work as expected, so it seems to be more about using that for a customized sort.

Change 24211 abandoned by Siebrand:
(bug 2867) Sort interlanguage links.

Reason:
I'm abandoning this as this change hasn't had any love in a long time. There are open comments and it doesn't merge any more. Can be restored if author wants to work on it again.

There's a ULS compact links beta feature now that may replace this.

https://gerrit.wikimedia.org/r/24211

The ULS-CompactLinks beta feature being developed is described at:

https://www.mediawiki.org/wiki/Universal_Language_Selector/Design/Interlanguage_links

IMO there is a still a need for side-defined orders, but with Wikidata and Compat Links going to land in Wikimedia wikis, it is unlikely this feature is going to make it into core.
Perhaps it would be better as an extension, so small wikis can use it, even on older versions of MediaWiki?

@jayvdb Now that we have CompactLinks on Wikimedia wikis, who is still asking for this feature? If the answer is no one, the bug should be closed.

@jayvdb Now that we have CompactLinks on Wikimedia wikis, who is still asking for this feature? If the answer is no one, the bug should be closed.

This is a MediaWiki core bug. CompactLinks is an extension. Furthermore CompactLinks is not enabled by default on all wikis.

Also #ULS-CompatLinks doesnt provide this functionality yet.

Even the search results in ULS interlanguage panel are not phonetic. When I search for 'Bas', in the Asia group I see a few 'Basa ..' first, and then 'Baso Minangkabau' and then another 'Basa ..' (Basa Sunda). I'd love to know what order that is, because it is far from expected.

Also #ULS-CompatLinks doesnt provide this functionality yet.

Even the search results in ULS interlanguage panel are not phonetic. When I search for 'Bas', in the Asia group I see a few 'Basa ..' first, and then 'Baso Minangkabau' and then another 'Basa ..' (Basa Sunda). I'd love to know what order that is, because it is far from expected.

Hmm, I'm not sure what are you referring to.

If I go to True Jesus Church, click "251 more" and type "bas", I get:

  • Basa Banyumasan
  • Basa Jawa
  • Baso Minangkabau
  • Basa Sunda

It's pretty simple: All the language names that begin with "Bas", sorted alphabetically. Isn't that sensible?

And generally, yes—this bug should probably be closed.

  • Interlanguage bots are not relevant any longer.
  • Wikidata can sort languages according to the site's request.
  • The ULS-CompactLinks feature sorts the language according to what is most likely to be relevant to the user. It's beta now, and should go out of beta in a couple of months or so.

Basa, Basa, Baso, Basa = not sorted alphabetically.

Basa, Basa, Baso, Basa = not sorted alphabetically.

Facepalm :)

I'll check that. It should be alphabetical. But that would be a UniversalLanguageSelector issue.

And generally, yes—this bug should probably be closed.

  • Interlanguage bots are not relevant any longer.
  • Wikidata can sort languages according to the site's request.

This is true only where Wikibase is used, which has config item 'wmgWikibaseClientSettings'.
Which is not Wiktionary (and others: T109579: [Epic] Give more sister projects access to Wikidata), and a lot of non-Wikimedia sites.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM