Page MenuHomePhabricator

zh-tw still gives simplified Chinese in link titles on Facebook and Google
Closed, DeclinedPublic

Description

Others will have to fill in the bug details as all this is over my head.
All I know is
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/56069
says

How sad that the first answer here is a "Not our problem :-)!"...

so it must be our problem.


Version: unspecified
Severity: normal
URL: http://news.gmane.org/group/gmane.science.linguistics.wikipedia.technical/thread=56032
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=26115

Details

Reference
bz31838

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:57 PM
bzimport set Reference to bz31838.
bzimport added a subscriber: Unknown Object (MLST).

Resolving invalid until someone points out on this bug what, specifically, we are supposed to do.

The suggestion in the thread was to not include rel="canonical" on zh-tw or zh-hk pages since they don't really fit the definitions listed here: https://www.google.com/support/webmasters/bin/answer.py?answer=189077. Specifically, M. Williamson said:

Umm what the link actually says is this:

"This is recommended in the following scenarios:

  • You translate only the template of your page, such as the navigation and footer, and keep the bulk of your content in a single language. This is common on pages that feature user-generated content.
  • Your page targets users in multiple regions (for example, en-us, en-uk, and en-ie), but each regional version differs only in small details, such as the currency used."

    Neither of these are true; the entire contents of the whole page are different (therefore the first scenario does not apply), and Simplified vs. Traditional is a non-trivial difference not at all analogous to "small details such as the currency used" (therefore the second scenario does not apply either).

Those are "recommendations", pure guidelines, and they are not an exhaustive list of precisely when that should be used.
The intention of that list is essentially to tell you that it's incorrect to use the hreflang pattern if the entire contents are human translated. In other words, they're saying that it's incorrect to use this pattern to point rel=canonical to en.wp, hreflang=de to de.wp, hreflang=ja to ja.wp, etc... because the content on various pages is not guaranteed to actually be the same thing because it's written by different communities and because manual updates can be desynced. Google does not want to send users two different pages when one is manually translated and may be out of date.
Conversion of language scripts used by the whole content is NOT mentioned on that page in a way to imply anything without asking directly.

All I know is showing Taiwan users Simplified Chinese previews on
Facebook links no matter how hard one tries to avoid creating them is
probably just as bad as showing Indian Hindi users Pakistani Urdu
previews, even though the languages might sound the same, when the users
see the wrong "alphabet' they say "I'm not going to click on that, that's
meant for people who live in a different region". They will not take
more than one second to decide and will click elsewhere.

node.ue wrote:

The point is that any link to the "canonical" version of the page has a relatively good chance of being partly unintelligible to readers. The entire contents of the page are different; as I stated in my e-mail, these are non-trivial differences. Imagine if linking to an English Wikipedia article displayed a link to a page in French...

At least be sure the two countries involved have at least established diplomatic relations.

Else it wipes out all the effort of the smaller country to edit Wikipedia if their hard work is shown as representing the larger country.

I should make something clear.
Doing a search for links from other sites to zh.wp gives me various forms of links:

If rel=canonical is removed, search engines will begin to consider the same article to be a different article depending on it's link. And that includes /wiki/ being considered a different article from the same article in the default variant.

This can have a very negative effect on zh.wp in searches. Because of the /wiki/ and default variant separation zh.wp may be penalized for appears to be duplicate content. Users may end up finding unintuitive multiple results in searches. zh.wp pages may start to have lower ranks in search engines as people link to different variants and paths in the url causing the incoming ranking for a single page to be split amongst multiple versions of itself giving them all lower rank.

This will also have a negative effect on the results users see. Instead of being based on what language a user has selected, search results will be based purely on what rankings pages have. In other words if most people link to a /zh/ page a user visiting Google with zh-tw will see search results linking to /zh/ because the /zh-tw/ page doesn't have as much ranking.

Some possible actions:

  • Remove rel=canonical and accept the number of issues this will cause for zh.wp. - This will require a discussion on zh.wp and community consensus. As this change is already possible with MW this bug will be closed as WORKSFORME and if the zh.wp community achieves a consensus a shell bug to change the setting of $wgCanonicalLanguageLinks on zh.wp can be opened.
  • Facebook and other search engines implementing rel=canonical can implement support for the rel=alternate hreflang= so that they always serve urls relevant to the visitor's language. This bug will be closed as INVALID and someone can open a bug report in some place relevant.
  • We could implement a Content-Language based variant redirection on the /wiki/ path so visiting users will be redirected automatically to their variant. - There may however be some technical reasons this may not be possible. Users following links from sites that specific a specific variant may still end up on a variant they don't want.
  • Implementing og:url pointing to the current variant's url may be possible. Facebook and Google's +1 button both seam to implement support for this so it's a possibility. - This however will mess with opengraph and have a similar effect as the search engine effect within the number of shares/likes and +1s a pae has (ie: If 25 people like one article on zh.wp, 3 of them doing so on the zh-tw page while the rest do so on the zh page. A user capable of seeing how many likes a page got will see 3 likes when visiting the zh-tw page when in reality that page got 25 likes). This will also not fix any issue in search engines.

By the way. rel=canonical was added by r75617 as an alternative option because there seamed to be issues with trying to implement conditional redirection. Relevant bug 21672.

I even selected Google's "Only Traditional Chinese Results".

And what do I get? Wikipedia's Simplified Chinese title, however with
Traditional Chinese content.

So fix those atrocious titles!

http://www.google.com.tw/search?q=趙少康&hl=zh-TW&ie=UTF-8&oe=UTF-8&prmd=ivns&source=lnt&tbs=lr:lang_1zh-TW&lr=lang_zh-TW&sa=X&ei=iDWiTuyIMKfgmAX75e2fCQ&ved=0CAkQpwUoAg

+ 網路
+ 所有中文網頁

--> + 繁體中文網頁

+ 台灣的網頁
+ 外文網頁翻譯版

繁體中文網頁

搜尋結果

  1. 趙少康- 维基百科,自由的百科全书

    您公開 +1 了這個項目。 復原

    趙少康(1950年11月16日-),生於台灣,中華民國前政治人物,在政壇上有「政治金童」之稱。曾代表新黨參選1994年台北市長選舉。 現為著名媒體人。父親為河南涉縣( ...

    生平 - 家庭 - 相關條目 - 参考資料 zh.wikipedia.org/zh-tw/趙少康 - 頁庫存檔 - 類似內容

Side note, relevant bug explicitly asking for that rel=alternate hreflang=* appears to be bug 27362.

link title is zh-hans or zh-hant, just zh-hans or zh-hant. This is not a problem