Page MenuHomePhabricator

Word-ending links do not work in or/hi, update linktrail rules
Closed, ResolvedPublic

Description

Author: ansumang

Description:
Hello, Word-ending links does not work for or.wikipedia.
Its only working for English characters ex. [[ଓଡିଆ]]n but not for Odia ex. [[ଓଡିଆ]]ପ.

Also on hn.wikipedia for hindi chars [[चेन्नई]]क [[चेन्नई]]अ [[चेन्नई]]कि ....


Version: unspecified
Severity: enhancement
URL: http://or.wikipedia.org

Details

Reference
bz36966

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:23 AM
bzimport set Reference to bz36966.

ansumang wrote:

Sorry, its hi.wikipedia not hn.wiki.

I was able to fix this on my local MediaWiki installation. I have tried for _hours_ now to get git/gerrit properly set up to make a patch or whatever, but get a cavalcade of various error messages and am now officially giving up.

The fixes I have made are these:

In languages/messages/MessagesHi.php, change Line 176 to this:

$linkTrail = "/^([a-zऀ-ॿ]+)(.*)$/sDu";

In languages/messages/MessagesOr.php, add this below Line 33:

$linkTrail = "/^([a-zଁ-୷]+)(.*)$/sDu";

These fixes may not account for punctuation and/or other symbols that should not be included in the linktrail. For Hindi I used the characters in the standard Devanagari Unicode block, and the same for Oriya. I am not sure which symbols are punctuation, so this may not be correct. But it's better than nothing, I think (hope).

Clarified summary.

As a coincidence, I've just updated docs on linktrail at [[m:Help:Links]] and [[mw:Help:Links#linktrail]], where did you/would you look for this sort of info to understand how it works and how it can be localised? This feature is very poorly documented...

(In reply to comment #3)

Clarified summary.

As a coincidence, I've just updated docs on linktrail at [[m:Help:Links]] and
[[mw:Help:Links#linktrail]], where did you/would you look for this sort of info
to understand how it works and how it can be localised? This feature is very
poorly documented...

It is a line in every languages/messages/MessagesXx.php file, starting with $linkTrail. It is defined as a simple regex, like in the fixes I posted above. For unicode languages, you have to add the minuscule "u" at the end of the regex.

(In reply to comment #4)

It is a line in every languages/messages/MessagesXx.php file, starting with
$linkTrail. It is defined as a simple regex, like in the fixes I posted above.
For unicode languages, you have to add the minuscule "u" at the end of the
regex.

Yes, I know this, but I'd like ansuman to give us some suggestions as where to put such info so that users find it.

(In reply to comment #2)

I was able to fix this on my local MediaWiki installation. I have tried for
_hours_ now to get git/gerrit properly set up to make a patch or whatever, but
get a cavalcade of various error messages and am now officially giving up.

The fixes I have made are these:

In languages/messages/MessagesHi.php, change Line 176 to this:

$linkTrail = "/^([a-zऀ-ॿ]+)(.*)$/sDu";

In languages/messages/MessagesOr.php, add this below Line 33:

$linkTrail = "/^([a-zଁ-୷]+)(.*)$/sDu";

These fixes may not account for punctuation and/or other symbols that should
not be included in the linktrail. For Hindi I used the characters in the
standard Devanagari Unicode block, and the same for Oriya. I am not sure which
symbols are punctuation, so this may not be correct. But it's better than
nothing, I think (hope).

These fixes did not work for me, is there anything else that needs to be done?

(In reply to comment #5)

(In reply to comment #4)

It is a line in every languages/messages/MessagesXx.php file, starting with
$linkTrail. It is defined as a simple regex, like in the fixes I posted above.
For unicode languages, you have to add the minuscule "u" at the end of the
regex.

Yes, I know this, but I'd like ansuman to give us some suggestions as where to
put such info so that users find it.

Niklas tells its intended to keep it so and make people request so that we don't break things.

So I guess this needs a volunteer to take the fix from comment 2 and put it into Gerrit. See http://www.mediawiki.org/wiki/Developer_access for anybody interested.

ansumang wrote:

(In reply to comment #5)

Yes, I know this, but I'd like ansuman to give us some suggestions as where
to
put such info so that users find it.

I am not sure how you want me to give suggestions? Let me elaborate what I think you mean.

Its simple, As we write "verb" "adverb" or "preposition" with Nouns in Odia, e.g. [[ଓଡ଼ିଆ]]ରେ , [[ଭାଷା]]ଗୁଡିକୁ . Often we write them together as one word. And the later part doesn't get linked to first part and it looks odd as it is seen in two different colors.

Is this what you wanted to know Nemo ?

Related URL: https://gerrit.wikimedia.org/r/65653 (Gerrit Change Ib1b233d227f33e77c212e67eee2aea64357e55ba)

(In reply to comment #9)

Related URL: https://gerrit.wikimedia.org/r/65653 (Gerrit Change
Ib1b233d227f33e77c212e67eee2aea64357e55ba)

The patch adds to linktrail *all* the characters listed in:
http://www.unicode.org/charts/PDF/U0900.pdf
http://www.unicode.org/charts/PDF/UA8E0.pdf
This includes, for instance, "। DEVANAGARI DANDA" and "॥ DEVANAGARI DOUBLE DANDA" ("Generic punctuation for scripts of India").

ansuman, is it ok? Please check those two PDF, otherwise we'll assume it's fine.

(In reply to comment #10)

This includes, for instance, "। DEVANAGARI DANDA" and "॥ DEVANAGARI DOUBLE
DANDA" ("Generic punctuation for scripts of India").

Excluded danda characters in latest patchset.

ansumang wrote:

Hi, Apologies for late response. Yes it's working in Odia Wikipedia now, I haven't checked all the characters and in other languages though. Thanks a lot Nemo, Santhosh T., Srikanth L., Jon, Andre. :)

(In reply to comment #10)

(In reply to comment #9)

Related URL: https://gerrit.wikimedia.org/r/65653 (Gerrit Change
Ib1b233d227f33e77c212e67eee2aea64357e55ba)

The patch adds to linktrail *all* the characters listed in:
http://www.unicode.org/charts/PDF/U0900.pdf
http://www.unicode.org/charts/PDF/UA8E0.pdf
This includes, for instance, "। DEVANAGARI DANDA" and "॥ DEVANAGARI DOUBLE
DANDA" ("Generic punctuation for scripts of India").

ansuman, is it ok? Please check those two PDF, otherwise we'll assume it's
fine.

Those PDF contain only Devanagari script, anyway it's working for Odia Wikipedia. Thanks.