Page MenuHomePhabricator

Allow anonymous users to change interface language on Wikimedia wikis with ULS
Open, LowPublic

Description

Since the previous bugs I was thinking would address this issue were closed (T35677#382858 and T44186#474966) I'm opening one specifically about the request to allow interface language selection by unregistered users (fix T5665) via UniversalLanguageSelector.

Context

On our biggest multilingual wikis, Commons and Wikidata, the (monolingual) cache gets bypassed by using a gadget which makes everyone use the uselang URL parameter. Hence, as discussed in T114662, there are no significant performance problems with making the cache more permissive.

Using Commons' AnonymousI18N.js globally is not feasible until we can have central gadgets to deploy it (T31272: Implement Gadgets 2.0).

Such a feature would help e.g. on this discussion w:pt:Wikipédia:Esplanada/propostas/Uso do português de Portugal, pt-PT (4mar2012)

The problem

Interface language selection is only available to registered, logged-in users on Wikimedia projects. This is especially problem for multilingual projects including Wikisource, Wikidata and Commons.

Some wikis use workarounds to simulate this feature, but those do not scale and increase our technical debt.

Language selection (both manual and automatic) has been implemented in UniversalLanguageSelector and is already in use (both registered and unregistered) by many third party wikis, given it is enabled by default.

This issue has been discussed before, but due to unclear status of ownership (language? editing? reading?) and unclear status of blockers has stalled it. The main issue seems to be making some trade-offs to work within the current caching infrastructure.

There are two directions that allow gradually going towards the ideal end state.

By functionality:

  • Enable manual language selection
  • Enable automatic language selection

By scope:

  • Enable for multilingual wikis
  • Enable for all wikis

Draft proposal

  1. Add support for varying caches by value of language cookie.
  2. Enable manual language selection for multilingual wikis
  3. Evaluate feasibility of extending based on increase in resource usage and discuss again

The way forward from here could be

  1. Stop expanding
  2. Expand to non-multilingual wikis
  3. Try out automatic language selection on a small multilingual wiki [1]

[1] Possibly using using a different approach than reading the Accept-Languages request header in PHP side. For example using JavaScript to suggest a language change if we are confident, that would then set the language cookie and work like manual language selection.


See Also:
Similar feature requests:

Discussion of the technical background, Varnish caching: T233609: [SPIKE 4hrs] What is technically feasible in terms of logged-in/logged-out users?

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNikerabbit
ResolvedNone
OpenFeatureNone
OpenNone
OpenNone
OpenNone
DuplicateNikerabbit
ResolvedNone
ResolvedNone
Resolved santhosh
DeclinedPginer-WMF
ResolvedAmire80
ResolvedAmire80
OpenNone
ResolvedNone
ResolvedNikerabbit
ResolvedNikerabbit
DeclinedNone
Resolved santhosh
Resolvedori
DeclinedNone
Resolvedtomasz
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
Resolved brion
DeclinedNone
DeclinedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This task has been referenced in a thread at the wikisource-l mailing list: http://thread.gmane.org/gmane.org.wikimedia.wikisource/2635

The detailed analysis on the feasibility, mainly from the Ops perspective, is at T43451: ULS causes pages to be cached with random user language. That was in 2012. If the infrastruture changes if any after that is favorable now, we should re-assess.

Since 2012, we have migrated from Squid to Varnish, so the situation might need to be reevaluated.

By design, testwiki responses are never cached in varnish, so it's not a great wiki to target if you're hoping to learn anything about how enabling $wgULSAnonCanChangeLanguage would interact with the caching layer. You could change your patch to target test2wiki instead, but even then I don't expect we would learn much that we don't already know.

@Nikerabbit succinctly explained the crux of the problem in T58464#994780. I'll try to expand on that comment a little.

The WMF's Varnish layer handles the Cookie header in the following way: unless the cookie header sent by the client includes a session token, the cookie header is not treated as significant for the purposes of a cache lookup.

Thus the following request is a cache miss:

$ curl --silent --head --cookie "session=$(date +%s)" "https://de.wikipedia.org/wiki/Wikipedia:Hauptseite" | grep X-Cache
X-Cache: cp1065 miss (0), cp2016 miss (0), cp2016 frontend miss (0)

Whereas this request is not:

$ curl --silent --head --cookie "xyz=$(date +%s)" "https://de.wikipedia.org/wiki/Wikipedia:Hauptseite" | grep X-Cache
X-Cache: cp1065 hit (2), cp2016 hit (2), cp2016 frontend hit (54)

Even though in both cases, the cookie value is novel.

Unless Varnish is configured to treat the language cookie set by ULS as cache-relevant, on every request, one of the following two scenarios will happen: either the page will be served from the cache, in which case its content will be oblivious to the visitor's language preference, or (if the page is not in the cache) it will be cached with HTML specific to the visitor's language preference and then served to subsequent visitors, regardless of whether or not they have the same language preference set. This is what would happen if you simply set $wgULSAnonCanChangeLanguage to true (for any wiki other than testwiki) and took no further action. (Edit: it is what happened; see T43451.)

So, for this feature to work, we would need to configure Varnish to vary the cache based on the language preference. Increasing the number of variants that are cached for each page would increase the demand for memory on the Varnish hosts, causing Varnish to be more aggressive in evicting older cache entries, leading to a decrease in the cache hit-rate and a consequent increase in load on the application servers and in latency. It would represent a significant change to the WMF's site architecture, and it would require rigorous capacity planning, informed by metrics data from production and from experiments which simulate substantial load. Testing this by enabling it on test2wiki would be a bit like testing the capacity factor of the local power station by keeping your night-lamp on all night.

The description doesn't define whether the ultimate goal is to have this feature enabled in all Wikimedia wikis or only in the multilingual ones.

If the priority are the multilingual ones, we could start there. Somewhere we have data about how many anonymous users we have visiting mediawiki.org, Meta, Commons... The impact on server use caused by changing this configuration on i.e. mediawiki.org probably is a lot smaller than the big wikis. In fact, why not starting to test this change in mediawiki.org, being ready to revert if the change is problematic?

I started writing my response before Quim added a link to the thread on wikisource-l, so I did not know the context. It may be feasible to turn this on if it were scoped to wikisource.

Change 255953 abandoned by TTO:
Enable $wgULSAnonCanChangeLanguage on testwiki

Reason:

testwiki responses are not cached in varnish

Is that documented anywhere?

In any case, I don't think this is going to be very useful.

https://gerrit.wikimedia.org/r/255953

So suddenly this is again an operations request? ;)

Increasing the number of variants that are cached for each page would increase the demand for memory on the Varnish hosts

If it were just a matter of RAM, who can decide to buy more? :)

Storing page content and user chrome separately and combining them either on the client (as recently proposed in T106099: RFC: Page composition using service workers and server-side JS fall-back) on edge caching servers would be ideal, but it is at least two years away from full deployment on all platforms (in my estimation; others may disagree).

It may be feasible to turn this on if it were scoped to wikisource.

What is the difference in Wikisource? What does it take to start enabling language choice in Wikisource?

FWIW, excepting commons (for obvious reasons), most wikis where this feature would be desirable/necessary seem to me to be the smaller wikis and it's not immediately clear that the performance impact would be all that catastrophic if the caches did vary on a language selection cookie for them.

Use case for me is the Wikimania2017 wiki, where it is very important that visitors be able to switch to (at least) French for the interface and contents without having to register an account (and - in an ideal world - according to browser language prefs if only to originally set the cookie).

As a user spending some time translating help content on Wikidata, I would much prefer that new users could actually find the translated content.

It’s also kinda hard to promote Wikidata as a multilingual site, when it’s not so until you register.

Nikerabbit renamed this task from Anonymous users can't pick language on WMF wikis with ULS ($wgULSAnonCanChangeLanguage is set to false) to Allow anonymous users to change interface language on Wikimedia wikis with ULS.Jul 12 2016, 9:17 AM
Nikerabbit edited projects, added Commons; removed Patch-For-Review.
Nikerabbit updated the task description. (Show Details)

This task not blocked by T58292: Make ULS more lightweight. ULS is already deployed to all users, just that some features are not enabled for anonymous.

Commons has its own custom language selector enabled for anonymous users. It also comes with a banner at the top advising the user to change their language, depending on their browser language:

pasted_file (951×1 px, 602 KB)

After you choose a language other than English, ?uselang=foo is automatically added to the URL of any links you click.

(source code: https://commons.wikimedia.org/wiki/MediaWiki:AnonymousI18N.js)

And all page views with ?uselang= are uncached. So, at least for Commons, I think just enabling this (and making pages with the ULS cookie set not cached) would be just fine. This is already the case, I don't think we can do any worse. :)

So, at least for Commons, I think just enabling this (and making pages with the ULS cookie set not cached) would be just fine.

Sounds reasonable, then we can remove the ja hack from commons.

The situation at Commons has been mostly discussed on the parent task T5665.