Page MenuHomePhabricator

Add domain aliases, or redirects, for ISO 639-2/3 language codes
Closed, InvalidPublic

Description

Each two-letter language code (of ISO 639-1) has an equivalent three-letter code (as of ISO 639-2, and -3)

Since WMF servers use a mix of ISO 639-1 and ISO 639-2/3 (and some made up) codes for the language subdomains of their servers, it would aid those wanting to automatically link to them from multilanguage environments, and those who cannot memorize "the other" set of abbreviations, if servers were reachable under arbitray codes. There are no conflicts.

This could be had in two ways:

  1. on the DNS level, the three-letter-code-domain could be made an alias of, or cname to, the two-letter-code-domain.
  1. on the http level, all requests reaching a server via the three-letter-code-domain could be redirected to the appropriate two-letter-code-domain, using the http "redirect permanently" response code.

Imho the latter is preferrable because it creates less net traffic and server load while at the same time allowing linksetters to use any code set.

I don't know how many of the these codes are affected: While ISO 639-2 codes generally overlap with ISO 639-3 code, there are few languages, where ISO 639-2 offers additional ones, which are not present, and not used, in ISO 639-3. Also these could and should be redirected to their ISO 639-1 equivalents.

http://www.loc.gov/standards/iso639-2/php/code_list.php
gives a mapping of both code sets to the 2-letter codes


Version: unspecified
Severity: enhancement

Details

Reference
bz14010

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:09 PM
bzimport set Reference to bz14010.

Should probably have a look over the updated RFCs; originally we attempted to more or less follow RFC 3066, which has since been superseded...

http://www.w3.org/blog/International/2006/09/11/rfc_3066_tags_for_identifying_languages_

http://www.rfc-editor.org/rfc/rfc4646.txt

http://www.rfc-editor.org/rfc/rfc4647.txt

Assigned: Brion, as he should probably have a look over the updated RFCs :)

RFC 4646 para 2.2.1:

"Note: For languages that have both an ISO 639-1 two-character code
and an ISO 639-2 three-character code, only the ISO 639-1 two-
character code is defined in the IANA registry."

This indicates pretty clearly to me that we should *not* use the three-letter codes for those languages for which two-letter codes are registered, since they wouldn't be considered valid RFC 4646 language codes for web usage. If we're not using them, and there's no past usage to drive traffic, there's little or no reason to go adding redirects.

Resolving as INVALID, as request was made on the incorrect basis that we use "a mix of ISO 639-1 and ISO 639-2/3". Rather than the implied unordered mix (in which case trying to redirect everything to everything else could make a lot of sense), we have always attempted to simply abide by RFC 3066 (superseded by RFC 4646), which is quite clear about the method in which it draws from those sources. Since there is no ambiguity, there's no need to provide alternates.