Page MenuHomePhabricator

Having both lang and xml:lang attributes not identical to $wgContLanguageCode
Closed, InvalidPublic

Assigned To
None
Authored By
bzimport
May 2 2006, 1:38 PM
Referenced Files
F2795: LanguageTags.patch
Nov 21 2014, 9:13 PM
F2794: LanguageTags.patch
Nov 21 2014, 9:13 PM
F2793: mw_language_tags.png
Nov 21 2014, 9:13 PM
F2792: LanguageTags.txt
Nov 21 2014, 9:13 PM
F2791: wikipedia_zh_font_ff.png
Nov 21 2014, 9:13 PM
F2789: wikipedia_zh_font_ie.png
Nov 21 2014, 9:13 PM

Description

Author: public.wiki

Description:
Hi, I am a user from Chinese Wikipedia. I found there is a description "<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh" lang="zh" dir="ltr">" on the top of every page.
As you know, Chinese has two charsets, Traditional and Simplified, and the default font of them are
also different. Becasue all pages are specified to be "zh", and by default, "zh" means Simplified
Chinese (zh-cn), so that users of Traditional Chinese (zh-tw & zh-hk) can not use their default font
to display pages.
To simply speaking, can the description "xml:lang="zh" lang="zh"" change with user's system default
charset? Then for example, the description will be "xml:lang="zh-hk" lang="zh-hk"" if user is from Hong
Kong or his system's charset is zh-hk.


Version: unspecified
Severity: normal

Details

Reference
bz5790

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:13 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz5790.
bzimport added a subscriber: Unknown Object (MLST).

Text rendered in zh Wikipedia in IE

This is the texts that rendered in different language tags, using IE for
Windows.

Attached:

wikipedia_zh_font_ie.png (371×667 px, 36 KB)

Text rendered in zh Wikipedia in Firefox

This is the texts that rendered in different language tags, using Firefox for
Windows.

Attached:

wikipedia_zh_font_ff.png (354×646 px, 90 KB)

This problem has been recently posted at March 2006 in wikitech-l

http://mail.wikipedia.org/pipermail/wikitech-l/2006-March/034397.html

seems no one replies on that issue, it's been suggested to resolve this issue here.

We could probably have it change the code based on the selected
variant conversion. is this what you mean?

I think thatit can be done by similar tech that very on the value xxx which using
[xml:lang="xxx" lang="xxx"] instead using the #wgContLangCode directly.

this can be done by adding a piece of code at OutputPage.php, or including another
file which handles the displaying language ccode. It is suggested the displaying
language code is based on the both $wgContLangcode and a aeries of checking, by
these steps:

  1. Logged in users can be detected by the interface language, return a value

according to the table below;
2.1. Anoym users can first detect by the HTTP_ACCEPT_LANGUAGE value, and a value
according to the table below;
2.2. if step 2.1 failed, just return the #wgContLangCode value;

Note: the return value that returned by the functions are varies by _both_
$wgContLangCode and the interface value by user, which:
*If (($wgContLangCode == en) && (user interface language <- zh-tw)) => return en
(#wgContLangCode)
*If (($wgContLangCode == zh) && (user language language == zh-tw)) => return zh-tw

The table (or array) below is the value that need returned by _both_
$wgContLangCode and interface language check (currently lists fot zh only):

  • $wgContLangCode == zh
    • if interface language == zh returns $wgContLangCode
    • if interface language == zh-cn returns zh-cn
    • if interface language == zh-tw returns zh-tw
    • if interface language == zh-hk returns zh-tw (for browser compatibility issue)
    • if interface language == zh-mo returns zh-tw (for browser compatibility issue)
    • if interface language == zh-sg returns zh-cn
  • but while $wgContLangCode != zh
    • if interface language == en returns $wgContLangCode
    • if interface language == de returns $wgContLangCode
    • if interface language == fr returns $wgContLangCode
    • if interface language == ja returns $wgContLangCode
    • if interface language == ko returns $wgContLangCode

The table above will _not_ using this tech by detecting those two values.
*If (($wgContLangCode == en) && (user interface language <- zh-tw)) => return en

This one is intending _not_ to affect the display language code on other sites
like in en, de, ft, ja, ko, ... wiki.

This trick also applies the $wgContLangCode is not available in the browser:
*If (($wgContLangCode == zh-min-nan) && (user interface language <- zh-min-nan))

> return en (for compatibility which the browser, including IE6/7 or Firefox does

not support zh-min-nan tags).

The term [interface language] above means the language that currently used by the
logged on user (i.e. the [user language]).

(In reply to comment #4)

We could probably have it change the code based on the selected
variant conversion. is this what you mean?

Nope, the interface language and the variant conversion is different stuffs, and
this issue is not releated with the variant conversion.
This issue can be resolved base on the user interface language.

This is the draft version how to detect and change the value in the <html> tag. (Please note that some code cleanup is reqireed before commits)

I've given a very draft version how to detect and change the <html> tag from
various options, and some code cleanup is needed _before_ commits into the
trunk since the code is not tested yet.

attachment LanguageTags.txt ignored as obsolete

A bit cleanup for the prototype of the code

attachment LanguageTags.txt ignored as obsolete

Further cleanup of the prototype code

attachment LanguageTags.txt ignored as obsolete

A fine tuned function prototype

This is the fine tuned function prototype, it works by calling the function.
However it needs to be fine-tuned with conjunctive operations in the
OutputPage.php.

Attached:

This is the patch which enable the ability to set a assigned language code at the lang tags

This is the patch that is use to correct the assosiate language tags with
assigned font, and this patch needs a new file called
"includes/LanguageTags.php" to work with this resolution. :)

attachment LanguageTags.patch ignored as obsolete

A LanguageTags.php file used with this patch.

This is the file that needs to run with the patch file.

attachment LanguageTags.php ignored as obsolete

Finally the patch is coming, I hope this patch is a workaround to address the
Language and Font problem on various MediaWiki sites, It's not only designed for
zh sites, other languages can also use this solution to address the Language and
Font problem like als, ang, ast, bat-smg, simple, sr, etc.

As mentioned above we can't use something that relies on the Accept-
Language header as it would break our caching system. Patch cannot
be accepted.

A flow chart explaining how to determine the language code to be displayed

Firstly, I think I need to send a flow chart to explain whether my concept is
correct, then as per suggestions we got, write a code to resolving this
problem. :)

Attached:

mw_language_tags.png (999×612 px, 63 KB)

modified patch file based on previous patch.

Anyway, I uploaded a patch file on my previous patch to resolve the primary
problem on state issue in some cases. (For example, using a zh-tw interface in
a zh-yue site).

Attached:

A updated LanguageTags.php file to make this code operating

This is the updated LanguageTags.php file to make the new patch working.

Attached:

(In reply to comment #15)

As mentioned above we can't use something that relies on the Accept-
Language header as it would break our caching system. Patch cannot
be accepted.

The Accept-Language header is applicable when the browser supports that and
enabled that, if this method fails, it would take the $wgContLanguageCode directly.

But no idea why this would break the cache system......??? or is that my patched
code is placed in the location that not suits in those files? :)

Accept-Language header check only applicable for anonymous users, it would take
the $wgLanguageCode directly if above method fails.
For logged-in users, it would take the interface language in user perferences to
determining the Language Tag.

The patch seems to be trying to do something totally different
from what's described in the summary, and by changing the
output based on unsafe headers it would break caching. I'm
marking this INVALID; please replace with a more directed
issue.

I've bring this issue into the wikitech-l maillist for further discussion until
this issue is resolved.

Gname discussion direct link:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/23533

As mentioned before, I've changed the summary title to suits the situation we're
having. And also a non suitable patch != invalid bug report. Hence, I've REOPENed
the bug again to resolving this issue.

By the way, I've been conducting a survey to having the enquiry for the users in
the local wiki (http://zh.wikipedia.org/wiki/User:Shinjiman/LanguageTags) to
asking their user interface language and the language variant that they're using.
Therefore it's seems impossible to solve this issue according the language
variants.

See also the mail at wikimedia-l for more detailed information regarding to this
issue:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/23542 and
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/23573

public.wiki wrote:

(In reply to comment #21)

The patch seems to be trying to do something totally different from what's described in the

summary, and by changing the output based on unsafe headers it would break caching. I'm
marking this INVALID; please replace with a more directed issue.

Sorry, Brion, I don't understand why they are unsafe headers. Could you describe it more clearly?
And could you advise a safe way to achieve our goal? Thank you.

Cache. Cache. Cache. And, cache.

Shinjiman, your mail to wikitech-l makes even less sense.
Please see my reply there.

As far as I have been able to tell: the lang attribute on <html> is set to the content language. Nothing in my testing indicates this isn't working 100% as intended. xml:lang attributes are unneeded in HTML5 anyway, which is what we're moving towards.

Resolving INVALID.

How about the lang attribute in the HTML 5?
I think the lang attribute is stilll needed in the HTML 5, according to
http://dev.w3.org/html5/markup/common-attributes.html#common-attributes .

  • Bug 20387 has been marked as a duplicate of this bug. ***