Page MenuHomePhabricator

[Compact links] Default languages shown for an Italian user can be improved
Closed, ResolvedPublic

Description

0) Move to Italy and more precisely Milan
I. https://bits.wikimedia.org/geoiplookup says Geo = {"city":"Milan","country":"IT",[...]}

  1. Enable the beta feature for compact interlanguage links at https://test.wikipedia.org/wiki/Special:Preferences#mw-prefsection-betafeatures
  2. Visit the main page and check the interlanguage links.

II. Observed:

In other languages

Edit links
Complete list
català
Deutsch
Ελληνικά
français
furlan
hrvatski
italiano
Nnapulitano
sicilianu
slovenščina
…
277 more languages

III. Expected: it's near impossible that I care about català, furlan, Nnapulitano or sicilianu especially as I'm not in any of those regions; and rather rare that I could care for Ελληνικά, hrvatski or slovenščina. On the other hand:
a) I surely want to see all the languages most studied in Italy, i.e. English [when I'm not on an en wiki], French, Spanish, Latin and maybe Russian (in this order);
b) I probably want the (romance) languages which are most closely related to Italian and likely to be useful for a superficial reading, like Portuguese and Romanian in addition to Spanish and French;
c) I may want the other languages historically taught in schools ex law 492/1995 i.e. sq, ca, fr, hr, fur, el, oc, frp, sl, de, fr, srd, lld BUT only in the regions where they are relevant if we're able to make such a distinction;
d) native languages of biggest immigrant populations (over 100k inhabitants) i.e. the languages of Romania, Albania, Morocco, China, Ukraine, Philippines, Moldova, Poland, perhaps only if in centre-north Italy where the 85 % of them is;
e) *perhaps* the regional italic languages, but only where in active use if we're able to make such a geolocation.

Sources: http://www.insegnareonline.com/istanze/interlinguismo/insegnamento-ls-scuola-italia for a-c, http://ssai.interno.it/download/allegati1/amelio_dati-popolazione_straniera_residente_in_italia_-_22_set_2011_-_testo_integrale%5B1%5D.pdf for d.

In https://www.mediawiki.org/wiki/Universal_Language_Selector/Design/Interlanguage_links#Issues_found I read "All the (main) languages of the same *family* should be shown in it", this would solve (b) and most of (a); I also read "Include minor languages from the user region" but this is not currently possible because we only consider *country*.

Note how Nnapulitano and sicilianu are not in any of a-c, they can be safely removed; Ελληνικά, català, furlan, hrvatski, slovenščina are in (c), but are only relevant in very limited territories even if we include the margin of error in geolocation for Slovene and Croatian (I doubt Greek??) users, they should probably not be displayed; Spanish, Romanian and Albanese are at the top of several points, so they can safely be added.


Version: master
Severity: enhancement
See Also:
http://unicode.org/cldr/trac/ticket/7099
http://unicode.org/cldr/trac/ticket/7100
http://unicode.org/cldr/trac/ticket/7101
http://unicode.org/cldr/trac/ticket/7102
https://github.com/wikimedia/jquery.uls/issues/134
http://unicode.org/cldr/trac/ticket/7947
http://unicode.org/cldr/trac/ticket/7946

Details

Reference
bz62346

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:06 AM
bzimport set Reference to bz62346.

Off-topic, but is there a way I can be automatically added to the CC list for such bugs? Or it will be better to create a separate component?

(In reply to Nemo from comment #0)

  1. Enable the beta feature for compact interlanguage links at

https://test.wikipedia.org/wiki/Special:Preferences#mw-prefsection-
betafeatures

  1. Visit the main page and check the interlanguage links.

II. Observed: [...]

Note, that doesn't work any longer on this wiki because it was disabled in the meanwhile; it can currently be tested in http://en.wikipedia.beta.wmflabs.org (where I see different languages, but I have no idea how preferences and labs influence that; I'll paste some more results when this lands on Italian projects).

(In reply to Nemo from comment #0)

II. Observed:

Same in production now, [[:it:Milano]]:

In altre lingue

Modifica link
català
Deutsch
Ελληνικά
English
français
furlan
hrvatski
Nnapulitano
sicilianu
slovenščina
…
Altre 123 lingue

jquery.uls.data.js (https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FUniversalLanguageSelector/bfeaebdf635a2e8fb1f366c76b4876fdd390e36f/lib%2Fjquery.uls%2Fsrc%2Fjquery.uls.data.js) has: "IT":["it","en","nap","scn","fur","de","fr","sl","ca","el","hr"]

which in turn comes from http://unicode.org/repos/cldr/trunk/common/supplemental/supplementalData.xml

<territory type="IT" gdp="1805000000000" literacyPercent="99" population="61482300">
<!--Italy-->
<languagePopulation type="it" populationPercent="95" officialStatus="official"/>
<!--Italian-->
<languagePopulation type="en" populationPercent="24"/>
<!--English-->
<languagePopulation type="nap" writingPercent="5" populationPercent="12" references="R1148"/>
<!--Neapolitan-->
<languagePopulation type="scn" writingPercent="5" populationPercent="8.4" references="R1149"/>
<!--Sicilian-->
<languagePopulation type="fur" writingPercent="5" populationPercent="1.4" references="R1150"/>
<!--Friulian-->
<languagePopulation type="de" populationPercent="0.39" references="R1251"/>
<!--German-->
<languagePopulation type="fr" populationPercent="0.17" officialStatus="official_regional" references="R1252"/>
<!--French-->
<languagePopulation type="sl" populationPercent="0.17" references="R1253"/>
<!--Slovenian-->
<languagePopulation type="ca" populationPercent="0.035"/>
<!--Catalan-->
<languagePopulation type="el" populationPercent="0.035"/>
<!--Greek-->
<languagePopulation type="hr" populationPercent="0.0057"/>
<!--Croatian-->
</territory>

Sigh.

There are some good ideas here.

I guess that Nemo figured this out already, but the displayed languages are based on languages spoken naturally in a country according to CLDR data. CLDR, in turn, bases its data on information received from UN and possibly national statistics agencies. It definitely has mistakes - for example Israel doesn't include Yiddish, and last time I checked, USA didn't have any variant of Chinese (I might be wrong about this last point).

And yes, it gets data for the whole country and not for a region. Neapolitan would, of course, be more relevant in Southern Italy, and Catalan would be more relevant in Sardinia. Similarly, it would be much smarter to show Tatar, Adyghe, Sakha, etc. only in the relevant regions in Russia. In a country as massively multilingual as India this problem is even more acute. Unfortunately, the geo info we have is only per country. I'd love to have it more fine-grained.

Slovenian and Croatian are not completely out of place, because there are people who speak these languages in Italy.

Adding languages that are widely studied in a country is a valid use case. Because the number of languages is limited to 16, it may sometimes be preferable to show a widely studied language than a local language spoken by a few people (as tragic as it is). We just need to figure out how to collect and store this data. Maybe getting relevant feedback from the communities is acceptable as long as we don't have a comprehensive data source like CLDR for this.

Submitting bug reports to CLDR may work well enough, except for the regional variance which is a complex matter (at some point we might "just" ask them more locales, or something). But we'll see how my reports are received.

What's really not covered by CLDR is something quite common on our wikis (and online in general?) i.e. reading an article in your best language and then also skim/get the gist of half a dozen other languages you don't know but are sufficiently closely related to extract some basic info from, like "OMG that French article has five sections on person X which we don't even mention in Italian".

Update: I got response yesterday from the CLDR committee on all the four tickets, I had to submit some more data for three of them and I just did so for all though immigrants are going to be tricky.
On one thing they've been clear, they don't handle Latin and Ancient Greek: any way to edit those in out own version?

Please clarify what "they don't handle Latin and Ancient Greek" actually means in the context of this bug.

(In reply to Niklas Laxström from comment #9)

Please clarify what "they don't handle Latin and Ancient Greek" actually
means in the context of this bug.

This:

(from comment #0)

a) I surely want to see all the languages most studied in Italy, i.e.
[...] Latin

As far as I know, we do not currently have mechanism to override or amend the list of suggested languages per country. That can be a separate feature request.

It's taking some effort but all the dependencies filed upstream are progressing, except https://github.com/wikimedia/jquery.uls/issues/134 ; then there's still the issue of how to deal with Latin.

All CLDR tickets other than the request for immigrants' languages have been fixed.
http://unicode.org/cldr/trac/changeset/10650
http://unicode.org/cldr/trac/changeset/10629
http://unicode.org/cldr/trac/changeset/10626
No replies from jquery.uls at https://github.com/wikimedia/jquery.uls/issues/134.

I don't know how to make the two open issues move forward, but let's please update jquery.uls and ULS as soon as CLDR 26 is released.

(In reply to Nemo from comment #13)

All CLDR tickets other than the request for immigrants' languages have been
fixed.
http://unicode.org/cldr/trac/changeset/10650
http://unicode.org/cldr/trac/changeset/10629
http://unicode.org/cldr/trac/changeset/10626
No replies from jquery.uls at
https://github.com/wikimedia/jquery.uls/issues/134.

let's please update jquery.uls and ULS as soon as CLDR 26 is released.

We still wait for adapting CLDR 25 I think (bug 62861)... :-/

CLDR extension can be updated separately from the plural rules in core, so that is not an issue.

Change 162626 had a related patch set uploaded by Santhosh:
Update jquery.uls to upstream at a8afed3

https://gerrit.wikimedia.org/r/162626

Change 162626 merged by jenkins-bot:
Update jquery.uls to upstream at a8afed3

https://gerrit.wikimedia.org/r/162626

Change 162626 merged by jenkins-bot:
Update jquery.uls to upstream at a8afed3

https://gerrit.wikimedia.org/r/162626

Great!

(from comment #4)

"IT":["it","en","nap","scn","fur","de","fr","sl","ca","el","hr"]

became "IT":["it","en","fr","lmo","pms","sc","de","vec","nap","lij","scn","sl","sdc","fur","egl","ca","el","hr","rgn"]

"lmo" and "pms" before "sc" are "un pugno in un occhio", but the list is now much better and most points are solved. I added two more CLDR tickets to avoid forgetting, but this can be closed.

Suggestions for immigrants etc. are still very bad, I've commented upstream:
https://unicode-org.atlassian.net/browse/CLDR-7102