Page MenuHomePhabricator

Handling of parentheses for Korean, Chinese and Japanese
Closed, ResolvedPublic

Description

Author: yes0song

Description:

Summary

Introduce new features for East Asian languages. Examples are:

  • Writing "[[Foo(bar)|]]" in the edit page and save the page, then it will be replaced with "[[Foo(bar)|Foo]]". (half-width parentheses without a space)
  • Writing "[[Foo(bar)|]]" in the edit page and save the page, then it will be replaced with "[[Foo(bar)|Foo]]". (full-width parentheses without a space)

Explanation

Currently, the expression below is automatically changed to provide convenience in editing.

  • [[Android (operating system)|]] -> [[Android (operating system)|Android]]

However, in East Asian languages, these isn't convenient.

In Chinese and Japanese, full-width parentheses () are usually used (and without space), such as "戦国時代(中国)".

In Korean, half-width parentheses () are overwhelmingly used like English but without space, such as "허브(식물)".

Because of technical limitation in MediaWiki, the notations above can't be used in Wikimedia projects in those languages. Titles in those projects are written like "戦国時代 (中国)" or "허브 (식물)" (half-width parentheses with a half-width space). It's not convenient for speakers of those languages.

Therefore, I suggest to add those handlings below:

  • [[Foo(bar)|]] -> [[Foo(bar)|Foo]] (for Korean)
  • [[Foo(bar)|]] -> [[Foo(bar)|Foo]] (for Chinese and Japanese)

PS

Similarly, the handling for a comma is supported in MediaWiki, the examples are below:

  • [[Albany, New York|]] -> [[Albany, New York|Albany]]
  • [[Francis II, Holy Roman Emperor|]] -> [[Francis II, Holy Roman Emperor|Francis II]]

The comma in "Albany, New York" was used to separate parts of geographical references, and the comma in "Francis II, Holy Roman Emperor" was used to indicate identity.

There are equivalences of a comma in East Asian languages (, and 、), but in East Asian languages, they aren't used as those usages above. Thus, the treatment for East Asian commas aren't needed.


Version: 1.20.x
Severity: enhancement

Details

Reference
bz30149

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 21 2014, 11:58 PM
bzimport set Reference to bz30149.
bzimport added a subscriber: Unknown Object (MLST).

EN.WP.ST47 wrote:

A patch to make the space before the parens in the pipe trick optional. Parsertests included, patch against /phase3/

attachment bug30149.patch ignored as obsolete

EN.WP.ST47 wrote:

"[[Foo(bar)|]]" appears to correctly parse to "[[Foo(bar)|Foo]]" in latest SVN. However, "[[Foo(bar)|]]" parses to "[[Foo(bar)|Foo(bar)]]". I have attached a patch which should correct that, but the machine I wrote it on isn't actually running a wiki, so I have to pop over to linux to test it. This patch also adds two parsertests, one to test that the full width parens work, and one to test that standard parens with no space will also work. I have not added anything regarding the comma, since as the reporter said, they are used differently in those languages. I wasn't sure what to do with the [[Foo (bar), baz|]] option, should [[Foo(bar), baz|]] be allowed?

EN.WP.ST47 wrote:

First one had unicode mangled, trying from a different PC

For some reason my patch got mangled on upload. Trying again. Also, fixing some space in the parsertests, and making it legal to have [[Foo (bar)|]] (full width parens with space). This is tested, parsertests pass and looks good on my test wiki.

Attached:

yes0song wrote:

I'm not a developer, I can't test it. Sorry. Somebody test it, please.

I think [[Foo(bar), baz|]] -> [[Foo(bar), baz|Foo]] is not needed in East Asian languages as I wrote above.

(In reply to comment #2)

"[[Foo(bar)|]]" appears to correctly parse to "[[Foo(bar)|Foo]]" in latest SVN.
However, "[[Foo(bar)|]]" parses to "[[Foo(bar)|Foo(bar)]]". I have attached a
patch which should correct that, but the machine I wrote it on isn't actually
running a wiki, so I have to pop over to linux to test it. This patch also adds
two parsertests, one to test that the full width parens work, and one to test
that standard parens with no space will also work. I have not added anything
regarding the comma, since as the reporter said, they are used differently in
those languages. I wasn't sure what to do with the [[Foo (bar), baz|]] option,
should [[Foo(bar), baz|]] be allowed?