Page MenuHomePhabricator

Specify default language on a per-page basis
Closed, ResolvedPublic

Description

Author: psychonaut

Description:
Some wikis, such as meta.wikimedia.org, are multilingual (that is, they don't
use separate language-specific wikis in separate directories or subdomains).
Generally, each page is in a single language. However, there are two problems
with this:

1. The interface texts are always in the default language.
For example, meta.wikimedia.org is in English by default. However, when one visits a non-English page, such as the German-language http://meta.wikimedia.org/wiki/Hauptseite, the portlets and action tabs are still in English. This is sensible if the user is logged in and has specified English as his preferred interface language, but for anonymous users this is inappropriate.

2. The XHTML source code incorrectly specifies that all pages are in the default language.
For example, meta.wikimedia.org is in English by default, so the XHTML container is specified as follows:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">…</html>

This is incorrect for pages in other languages, and has important implications for a variety of applications. For example, a search engine may be misled by the incorrect language and inappropriately index the page as if it were English, or skip indexing the page altogether because it is looking for pages in another language. Another example: a web browser with text-to-speech
capability (for the blind, for example) may select the wrong pronunciation settings and try to read the text as if it were English. (Editors can currently work around this problem by wrapping the whole wikitext in <div lang="xx">…</div> tags, though this is not optimal as some search/indexing applications won't bother reading past the opening <html> tag to determine the page's language if the <html> tag already specifies it.)

A solution for both these problems is to allow the user to somehow tag a page to indicate that it is in particular language other than the default language. Perhaps this could be done with some sort of directive in the wikitext (e.g., [[lang:de]] or __LANG:DE__.) However, it might be easier to implement a system whereby wiki system administrators could simply specify that an entire namespace, or prefix, or page title matching a particular pattern, is in a particular language. For example, one could specify that all pages with the prefix "de/" are in German.

See Also:
T12736

Details

Reference
bz9360

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:36 PM
bzimport set Reference to bz9360.

Is there having some problem regarding to the cache for anonymous users? Seems
the "cache" is a big challenging issue regarding to them.

robchur wrote:

The Multilingual MediaWiki project, wherever it's dematerialised to, was working
on a broad solution for this.

Are you requesting about changing the interface language, or the language in which the page content is parsed, or both?

At least, it has to be about the languagein which a page is parsed.

It can generally be specified inside the page by wrapping the entire page content in a <div lang= XML:lang= dir="..."> ... </div> container, but usually isn't.

I can be taken from the last subpage part of the page title such as /bat-smg for some pages on meta, for the entire MediaWiki: name space and several more so such *except for English pages*.

For some non-wmf-wikis, e.g. creative commons, it can be taken from a prefix string in the page title, such as de:

In the Multilingual MediaWiki proposal, it is stored in the data base. We must take care to keep full locales (en-scouse, kk-Cyrl, ku-Arab, ku-TR-Latn, etc.) since this may be relevant for directionalities, GENDER, GRAMMAR, sorting orders, etc.. We must implement a way to alter/reset locales inside page so as to accomodate citations across languages, true multilingual pages such as language courses etc.. We must make sure that parser hooks and templates receive both the page language and the current language at the place they are called in addition to wiki and user languages, for the same reasons.

Message handling must return the true language used. Assuming, we were marking a language switch inside a text with lang:xzz and the switch back to the prvious language with lang:* and there was no localization for the message xyzzy, then:

This is in french: lang:frun {{int:xyzzy}} fichelang:* which means...

should render something like:

This is in french: <span lang="fr" XML-lang="fr" dir="ltr"> un <span lang="en" XML-lang="en" dir="ltr">paper</span> fiche</span> which means...

There could or should imho be an option - for multilingual wikis at least - making the interface language go with the page language by default, but that should be deselectable by users.

For the part of specifying the page content language, this bug overlaps with bug 28970

See http://www.mediawiki.org/wiki/Language_in_MediaWiki for the recent improvements.

Since MW 1.18 the html tag gives the lang & dir attributes of the user interface language. The page content got a lang & dir attribute of the "page content language", which is by default in most cases the wiki content language, but can be changed by extensions through a hook.

[[mw:Talk:ContentHandler#Language]] proposes a solution.

(In reply to comment #6)

See http://www.mediawiki.org/wiki/Language_in_MediaWiki for the recent
improvements.

Since MW 1.18 the html tag gives the lang & dir attributes of the user
interface language. The page content got a lang & dir attribute of the "page
content language", which is by default in most cases the wiki content language,
but can be changed by extensions through a hook.

From what I understand, this means it would be trivial to make an extension that added a special page allowing people to change the "page language" of a page.

(In reply to comment #8)

From what I understand, this means it would be trivial to make an extension
that added a special page allowing people to change the "page language" of a
page.

Hmm, I don't think a special page is a good way to handle this. Imho ideally each page should have a selector where you can change the page language, which is then stored in the database as a "property" of the page, possible using [[mw:ContentHandler]] which Nemo_bis linked above.

Title::getPageLanguage() would read that data and that would set everything correctly (i.e. what I've worked on step by step the last year or so).

p.selitskas wrote:

Setpagelanguage magic word added in I25f211e9.

Related URL: https://gerrit.wikimedia.org/r/62306 (Gerrit Change Id63573a7f88686caaf71083c07f851256f6b6590)

I0ff707d5f04218bef5721e6fc162c6359bb7538a adds the concept of page language, that might make this hook easier.

p.selitskas wrote:

Id63573a7f introduces page content language as an integral part of a Page entity.

(In reply to comment #13)

Id63573a7f introduces page content language as an integral part of a Page
entity.

The patch needs rebase and it's still labeled "PoC/DON'T MERGE"; it seems it's blocked on https://gerrit.wikimedia.org/r/#/c/62227/1 getting a solution about qqq and qqx being removed from $wgDummyLanguageCodes.

(In reply to comment #14)

(In reply to comment #13)

Id63573a7f introduces page content language as an integral part of a Page
entity.

The patch needs rebase and it's still labeled "PoC/DON'T MERGE"; it seems
it's
blocked on https://gerrit.wikimedia.org/r/#/c/62227/1 getting a solution
about
qqq and qqx being removed from $wgDummyLanguageCodes.

Given that the changes don't seem related, its not that hard to rebase it in a way so it no longer depends on the other change.

(In reply to comment #9)

Hmm, I don't think a special page is a good way to handle this. Imho ideally
each page should have a selector where you can change the page language,
which
is then stored in the database as a "property" of the page, possible using
[[mw:ContentHandler]] which Nemo_bis linked above.

Not sure what's wanted now, but does my extension https://www.mediawiki.org/wiki/Extension:PageLanguage resolve the issue?

psychonaut wrote:

(In reply to comment #16)

Not sure what's wanted now, but does my extension
https://www.mediawiki.org/wiki/Extension:PageLanguage resolve the issue?

No, that extension doesn't work, at least not for the two problems originally reported. All it does is to set lang="xx" for the mw-content-text div. The language setting doesn't apply to the rest of the content on the page, such as the article title (i.e., the content of the "firstHeading" h1 element). It also doesn't apply to the top-level html element -- that still has the default xml:lang and lang attributes. The web server still returns the default language in the Content-language HTTP header. And of course the MediaWiki user interface language isn't changed.

  • Bug 10736 has been marked as a duplicate of this bug. ***

This is tentatively fixed by https://gerrit.wikimedia.org/r/136623 , pending further polishing. We can close this when the feature is enabled by default, I guess (or is that not desired? I'm not sure if bug 22985 requires it even without Translate). Then we can see at bug 35489 what else Translate needs.

Next step: schema change and configuration patch for translatewiki.net, to test more.

I meant https://gerrit.wikimedia.org/r/#/c/135312/ .
Can be tested at http://dev.translatewiki.net/wiki/Special:PageLanguage already.

One improvement possible: save the user the effort of doing action=purge on the page.

Nikerabbit, 27. Jun 21:13:

Would be nice to purge the article when changing language, so that language
dependent rendering is immediately visible.

kunalgrover05: Do you plan to further work on this?

(In reply to Nemo from comment #20)

This is tentatively fixed by https://gerrit.wikimedia.org/r/136623

Patch was abandoned. Hence also removing Target Milestone which was set when linking to this patchset.

admin wrote:

This was brought up on mailing list earlier today. I am pleased to hear that this is near completion. The Affiliations Committee has committed to helping Wikimedia movement affiliates utilize this extension to help increase transparency via Meta-Wiki. This feature not yet existing remains the remaining barrier for numerous affiliate pages.

  • Bug 53756 has been marked as a duplicate of this bug. ***
  • Bug 49588 has been marked as a duplicate of this bug. ***
Qgil added a subscriber: Aklapper.

kunalgrover05: Do you plan to further work on this?

No reply, sadly. Changing the assigned field, assuming that nobody is working on this task right now.

Sorry, couldn't find much time to finish the pending patch. This task is a bit too complex for GCI. Will try to complete it when I get some time.

Jdforrester-WMF set Security to None.

Removing from MW-1.24-release because this isn't going to be fixed in the next few months' worth of security point releases.

As for MediaWiki core + Translate, this works well. https://phabricator.wikimedia.org/T69223#2026417
Only WMF has to catch up... https://phabricator.wikimedia.org/T69223#2028322

Hi, @jcrespo, @Nemo_bis suggested me to ping you on this ticket, following my T149595 report, so here I am.

@Psychoslave: Ping about what exactly? Contentless pings aren't too helpful. :)

If this is about the state of the ticket T69223, its state is clearly given on its last comment: https://phabricator.wikimedia.org/T69223#2685860 What further clarifications do you need?

There is no need for constant pinging when there is no extra information to provide or comment, that will not make things happens more quickly (they may be distracting, in fact, and slow them down). Helping with other existing tasks, however, can make tasks happen more quickly: https://phabricator.wikimedia.org/tag/dba/