Page MenuHomePhabricator

Auto-detect interface language for anonymous users on Wikimedia projects
Open, MediumPublic

Assigned To
None
Authored By
daniel
Oct 9 2005, 5:38 PM
Referenced Files
F2385: detectlang.diff
Nov 21 2014, 8:52 PM
Tokens
"Love" token, awarded by Ivanhercaz."Like" token, awarded by Kozuch."Like" token, awarded by Nemo_bis."Love" token, awarded by He7d3r.

Description

Navigators send a list of acceptable languages (English, French etc.) by decreasing preferences. By default, MediaWiki on commons and other multilingual
sites should default to the first language that it knows in that list.

Francophone users have complained that the commons user interface is in English even though they follow links from fr.

The UniversalLanguageSelector extension can do this, but it is not enabled for anonymous users due to caching issues. See T58464: Allow anonymous users to change interface language on Wikimedia wikis with ULS as interim step towards this.


We don't actually want to blindly apply the Accept-Language and that only, we need something smarter. See original description:

I propose to include a feature that auto-detects the inferface language for
anonymous users. This would be especially helpful for multilingual projects like
the commons. The language can be set in the user interface, but one needs to
understand the default language in order to even create an account, or find the
right setting.

The detection has three modes, controlled by $wgDetectLanguage:

  • LANG_USE_CONTENT: use the content language for anonymous users, i.e. dont use

auto-detection. This is the default, and shows the same behaviour as without
this patch.

  • LANG_PREFER_CONTENT: use the conten language if present in the Accept-Language

list. Otherwise, behave like LANG_PREFER_CONTENT

  • LANG_PREFER_CLIENT: use the first language in the Accept-Language list that is

supported by the wiki.

Caveats:

  • the Accept-Language field is often not configured correctly in the browser.
  • the Accept-Language field would effect caching - the appropriate changes to

the Vary: header are done automatically, but this reduces cache efficiency.

  • in order to decide which languages are supported, this relies on

$wgContLang->getLanguageNames(). It does not actually check for the files to
exist, as this would be pretty slow, and the detection is performed for every
page request.

  • The languages in Accept-Languages are handeled as being given in the order of

preference. Any weight-modifiers are ignored.

patch to follow in a minute.

See Also:
T3135: Interface language on multilingual Wikimedia projects should default to browser language preferences
T58464: Allow anonymous users to change interface language on Wikimedia wikis with ULS

Details

Reference
bz3665

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNikerabbit
ResolvedNone
OpenNone
OpenNone
DuplicateNikerabbit
ResolvedNone
ResolvedNone
Resolved santhosh
DeclinedPginer-WMF
ResolvedAmire80
ResolvedAmire80
OpenNone
ResolvedNone
ResolvedNikerabbit
ResolvedNikerabbit
DeclinedNone
Resolved santhosh
Resolvedori
DeclinedNone
Resolvedtomasz
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
Resolved brion
DeclinedNone
DeclinedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:52 PM
bzimport set Reference to bz3665.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 963
patch against HEAD for Defines.php, DefaultSettings.php, OutputPage.php, and User.php

Attached:

I worry this is too hostile to caching; without aggressive caching of anon views our
entire infrastructure will collapse into a little pile of rubble. Adding a Vary on
accept-language will splinter things a lot -- even where the same language gets selected
there can be big variance in what's in the header.

I agree that this meight be a problem. It meight be quite useful as an option
for smaller, non-Wikimedia projects, though.

The only Wikimedia project I would suggest to enabled this for is the Commons.
There, the main traffic is image data anyway, which can be cached independently
of user language, and even for logged in users. May be worth a try.

Also, it would be interresting to find out how many different Accept-Language
headers we are actually seeing. Most people never change it by hand, and the
default setup of the popular browsers doesn't vary too much, I guess.

All this being said: one important point would be to provide a localized
interface for creating an account to people that do not speak english. Maybe it
would be enough to have a language selection on the login page, that would be
used while creating an account. The default choice could be made based on
Accept-Language. The language selected during account creation should then also
become the language pre-set for the new account. But that would be a separate
feature request, I guess.

arnomane wrote:

I also think that the main data of Wikimedia Commons caching is not the text (web site) data but the images.

At first Wikimedia Commons has never been prominently cited by the media (AFAIK) despite its central role (and its
potential being a serious competitor to traditional stock image archives). Luckily all medias did look at Wikinews
from day zero on. :p

The second thing is that Commons has not an intuitive URL like Wikipedia: wikipedia.org vs. commons.wikimedia.org.

Most "outsiders" (people not involved in any Wikimedia wiki) coming to Wikimedia Commons come there via Wikipedia
image descriptions or Google images. Both ways are not used by the masses.

So I am quite confident that this patch enabled for Wikimedia Commons only would help us a lot within several fields
without harming our caching architecture:

  • People coming from local wikis and creating an account will now have the possibility getting (the main page and)

account creating pages in their native language if you link special:userlogin in the local project (as long as
single site login is not possible). This would help use reducing precudices against Wikimedia Commons a lot (sadly
we have to live with these "english only" precudices against Commons currently, although we are working hard
supporting many languages in a decent way).

  • Outside people that want to reuse Wikimedia Commons get information in their local language, which would help us a

lot as people often ask about Wikimedia Commons conditions (and are apparently confused a little bit by the english
only interface; despite the problem that Commons help pages weren't the best ones until recently).

  • We could reduce the English bias in Commons. Currently we have the problem that people are softly forced into

English and thus do not realise that their local language is supported too (you know changing of preferences is not
done by the masses even after login...). This leads to the problem that these non-English languages get neglected
somewhat. A self strengthening effect supporting mainly English only... Many problems in Commons are caused by the
lack of local language support. I personally do currently some work supporting help pages in several languages in a
decent way but the more people you get from the beginnings the better is the result (and I also do only speak
German, English and French)...

So I think these patches would be a great thing for Wikimedia Commons.

robchur wrote:

The whole point is that if this great thing kills the site, it's not such a
great thing, is it?

arnomane wrote:

Well Rob could you give us some serious figures? I outlined quite detailed and with a rational
analysis from my perspective why an negative impact on the servers is not going to happen if this
patch is applied to Wikimedia Commons wiki *only*. For sure I could be wrong so in order to get a
rational discussion of the issue we need the following figures:

  • How many image traffic is caused by anonymous page visits in Wikimedia Commons per month?
  • How many (page) text traffic is caused by anonymous page visits in Wikimedia Commons per month?
  • How many page visit traffic is caused per month by logged in users to Wikimedia Commons?

I'd appreciate if you can afford the time extracting these numbers as I personally do invest quite
some time as well providing you decent bug reports in order to reduce your amount of work needed to
solve that bug reports.

I think the best thing for now would be to add a language selector the to
account creation page, as described above. The selection could be pre-set based
on browser preference, but that's not even necessary. The important thing is
that people joining commons hsould not have to know english to create an
account. If Commons is internationalized enough yet that it will be useful to
them without a basic level of english is another question... I at least hope it
will soon be usable without knowing english.

So, shall I open a separate feature request for that?

robchur wrote:

(In reply to comment #7)

So, shall I open a separate feature request for that?

An optional language selector would be cool.

Side note: this could be combined with bug 5638 to make multilingual projects
like commons more useful to people not speaking english, even if not logged in.

Perhaps using the browser's language setting is not a good idea - maybe it would
be better to offer a drop down manu and a "set language" button that would set a
cookie.

  • Bug 7761 has been marked as a duplicate of this bug. ***
  • Bug 26506 has been marked as a duplicate of this bug. ***

dohnp5a1 wrote:

It's very dissapointing that nothing was improved anent this bug during six years.

Many people in my surrounding really hate English and would never contribute to a project that appears totally in this language, without an easy and fast way to switch it.

Maybe the browser language settings are sometimes incorrect. Nevertheless, the
present default setting is incorrect almost always, displaying everything in
English to everyone. Most users also don't know how to change the language
settings in Meta or Commons: it is much easier to get to know own browser once
than particular settings of every web visited.

I suppose the browser setting can be set defaultly to English (often
incorrect), or to the system or browser localisation language (probably nobody
uses them in an unintelligible version, so there's no problem). All the possibilities are better than default English always.

(In reply to comment #12)

It's very dissapointing that nothing was improved anent this bug during six
years.

Although I agree with your argument that English always is not necessarily nice, there's a technical reason we haven't done anything in six years, presented in comment #2: Squid caching would suffer severely. The "pile of rubble" part may not be as accurate today as it was in 2005 (we gained some capacity since then), but please understand this is not an easy change at all. Back in 2005 our servers really did rely on every anonymous user seeing the same thing at the same URL for the servers not to melt down; and I'm not so sure Accept-Language detection for anonymous users would be feasible in 2010/2011 either.

Many people in my surrounding really hate English and would never contribute to
a project that appears totally in this language, without an easy and fast way
to switch it.

"an easy and fast way to switch it" might just be what we *can* do. We could use JavaScript to obtain the user's Accept-Language preferences from the API or something (which wouldn't go through Squid cache, but that's OK: it's just a language list, not an entire wiki page) and use that information to display a link with the native language name (i.e. 'Deutsch' for German, 'Français' for French, etc.) that would then lead to the account creation form in that language or maybe trigger persistent uselang (language selection for anonymous users, basically) if and when we have that.

In fact, I once wrote some proof-of-concept code that obtained the user's Accept-Language settings from the API, stored it in a cookie (to avoid repetitive API requests) and used it to reorder the "In other languages" links in the sidebar. We never ended up using it but it's still lying around somewhere.

tl;dr: Automatically showing wiki pages in the browser language for anonymous users is probably not gonna happen, but a feature offering to switch languages based on the browser language isn't hard to do.

If WMF cannot do it, it doesn't mean MediaWiki cannot do it. In fact the LanguageSelector extension does it already. In my opinion it would be nice to pick the automatic language detection code from it to core (disablable for WMF and other cached sites of course).

Lets not mix two issues in this bug.

dohnp5a1 wrote:

Well, should I create a new issue, requesting a language switcher for Commons (ideally accessible on every page, not only on the main one), that would trigger persistent uselang, so that the interface language could stay the same even after clicking links?

(In reply to comment #15)

Well, should I create a new issue, requesting a language switcher for Commons
(ideally accessible on every page, not only on the main one), that would
trigger persistent uselang, so that the interface language could stay the same
even after clicking links?

There already is one, setting “persistent” uselang, used when coming from another Wikimedia project. Try going from e.g. http://cs.wikipedia.org/wiki/File:Example.jpg (not logged in) to the image page on Commons, you should get uselang=cs automatically. See http://commons.wikimedia.org/wiki/MediaWiki:PersistentUselang.js

dohnp5a1 wrote:

It's nice, but we need a language switcher for not logged users, setting such a persistent uselang, on every page (or at least on the main one). Where should it be sorted out? Here or directly somewhere on Commons?

dohnp5a1 wrote:

The switcher now exists but is not perfectly permanent, disappears always after searching a string in the search field.

(In reply to comment #18)

The switcher now exists but is not perfectly permanent, disappears always after
searching a string in the search field.

Please report any bugs at:
http://commons.wikimedia.org/wiki/MediaWiki_talk:AnonymousI18N.js

The script can be seen at:
http://commons.wikimedia.org/ (logged out)
The source is at:
http://commons.wikimedia.org/wiki/MediaWiki:AnonymousI18N.js

This has been done both from javascript in the front-end (see previous comment).

And in the core/php (server side) in the following extensions:
http://www.mediawiki.org/wiki/Extension:LanguageSelector

Knowing that extension is in use on TranslateWiki and is doing pretty well I'd recommend closing this bug and directing further questions to either that extension or to a new bug (eg. "Fix bug X in Extension:LanguageSelector" or "Merge Extension:LanguageSelector in core (disableable)").

dohnp5a1 wrote:

The switcher exists and its uselang is permanent at Commons, it is nice.

Nonetheless I do not understand why couldn't Commons detect the browser default language and set the interface according to that for non-registered users as well, if the switcher hasn't been used. What's the problem?

  • Many users don't have set in their preferences in the browser – as far as I know, the default value is English there, so they will receive the Commons interface in English, the same way as now.
  • For users having it set, the interface would be in the preferred language.

For nobody it would be worse, just better for one part of the users. Why not?

(In reply to comment #22)

The switcher exists and its uselang is permanent at Commons, it is nice.

Nonetheless I do not understand why couldn't Commons detect the browser default
language and set the interface according to that for non-registered users as
well, if the switcher hasn't been used. What's the problem?

  • Many users don't have set in their preferences in the browser – as far as I

know, the default value is English there, so they will receive the Commons
interface in English, the same way as now.

  • For users having it set, the interface would be in the preferred language.

For nobody it would be worse, just better for one part of the users. Why not?

This could be done for logged-in users, I guess, but it definitely can't be done for anonymous users due to Squid caching. The browser language headers can't be detected client-side, only server-side.

Also, many people have their browsers languages misconfigured. Since those settings are hard to find (generally), its often very unclear to the user why they are getting language x vs language y. Any use of browser headers should have clear ways in the interface to change the auto-detected defaults.

As for detecting language client side - you can always do an ajax like http://en.wikipedia.org/w/api.php?action=query&meta=userinfo&uiprop=acceptlang

dohnp5a1 wrote:

Yes, they maybe have them misconfigured, but in fact it means "Not configured", default, in other words they have English there on the first place – so nothing would change for them, as now they have the interface in English as well. For users having the browser configured it would be better: Why to configure the browser language setting, if the webs neglect it?

I really do not understand other thing now. Being not logged in, having the cache renewed and browsing anonymously with Mozilla, with Slovak in the language setting on the first place there, being in Portugal – in Commons there is a Czech notification "Wikimedia Commons is available in Czech". From where does the site take the language information? I thought it is the language setting, but obviously not, as now I prefere there Slovak, nonetheless nothing changed in Commons, it still offers Czech (but unfortunatelly just offers, it doesn't display the interface in that language).

It is unclear whether this bug is about having this feature in MediaWiki (exists in an extension) or in the Wikimedia projects (not done). Assuming the first since this bug is categorized as MediaWiki bug.

dohnp5a1 wrote:

In my understanding, the bug is about having this feature in Wikimedia Commons.

(In reply to comment #25)

I really do not understand other thing now. Being not logged in, having the
cache renewed and browsing anonymously with Mozilla, with Slovak in the
language setting on the first place there, being in Portugal – in Commons there
is a Czech notification "Wikimedia Commons is available in Czech". From where
does the site take the language information? I thought it is the language
setting, but obviously not, as now I prefere there Slovak, nonetheless nothing
changed in Commons, it still offers Czech (but unfortunatelly just offers, it
doesn't display the interface in that language).

Once again, see http://commons.wikimedia.org/wiki/MediaWiki:AnonymousI18N.js and its talk and discuss that script _there_. If you read that page, you would learn the user language should be selected using the following priorities:

  1. Cookie (previous user preference)
  2. According to the previous (referring) page (e.g. when you click on a Commons link on the Czech Wiktionary, you’ll get Commons in Czech)
  3. Browser language
  4. Fallback to the default language

sumanah wrote:

The Indic language community is interested in this feature. I do not have time to summarize it and am not sure I would summarize adequately. The thread, for anyone who wants to read through it:

http://lists.wikimedia.org/pipermail/wikimediaindia-l/2011-December/thread.html#5890

I've asked them to come here and detail what they want.

(In reply to comment #29)

The Indic language community is interested in this feature. I do not have time
to summarize it and am not sure I would summarize adequately. The thread, for
anyone who wants to read through it:

http://lists.wikimedia.org/pipermail/wikimediaindia-l/2011-December/thread.html#5890

I've asked them to come here and detail what they want.

My impression of the thread is they want a big site banner "View wikipedia in language X" with X being auto-detected via either geo-location or accept-language headers (aka your web browsers lang prefs). That isn't really this bug, otoh doing that is more likely to be implemented then this bug (since it can be done in pure js so low amount of caching issues, and most of the work is already done as we already can get accept-language headers from js ( http://www.mediawiki.org/w/api.php?action=query&meta=userinfo&uiprop=acceptlang ) and geo-location is also already set up for js as a side affect of geo targeted central notices.

He7d3r set Security to None.
He7d3r added a subscriber: Lechatjaune.
Nikerabbit claimed this task.

ULS can do this. Please refer to the tasks mentioned in See also for implementing this for anonymous users on Wikimedia wikis.

Please improve the title and description to clarify the scope.

Nemo_bis renamed this task from Auto-detect interface language for anonymous users to Auto-detect interface language for anonymous users on Wikimedia projects.Jul 12 2016, 11:29 AM

Please improve the title and description to clarify the scope.

Better now?

Nikerabbit updated the task description. (Show Details)
Nikerabbit removed a subscriber: wikibugs-l-list.
Ivanhercaz rescinded a token.
Ivanhercaz awarded a token.
Ivanhercaz subscribed.

Currently this appears to be resolved by an accident (T246071). Could you keep Accept-Language detection enabled for some multilingual wikis like Commons and Wikidata? It seems some work has been done in the meanwhile towards alleviating caching issues (see T203179).