Page MenuHomePhabricator

automatic unicode conversion for Malayalam makes it difficult to link to external sites using old unicode sequences; want a tag to supress conversion
Closed, DeclinedPublic

Description

Unicode equivalence fix for Language Malayalam (https://bugzilla.wikimedia.org/show_bug.cgi?id=22371) helps a lot in internal linking and searching. But it cause some problem in external linking. For example Kerala Governments encyclopedia project "Sarvavijnjana kosham" uses mixed versions of unicode and so sometimes linking there get not work properly. If possible please include a feature similar to <nowiki> to suppress automatic unicode version conversion.


Version: unspecified
Severity: enhancement

Details

Reference
bz25623

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:12 PM
bzimport set Reference to bz25623.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

Unicode equivalence fix for Language Malayalam
(https://bugzilla.wikimedia.org/show_bug.cgi?id=22371) helps a lot in internal
linking and searching. But it cause some problem in external linking. For
example Kerala Governments encyclopedia project "Sarvavijnjana kosham" uses
mixed versions of unicode and so sometimes linking there get not work properly.
If possible please include a feature similar to <nowiki> to suppress automatic
unicode version conversion.

This is not a fix, but as a work around you can try % encoding your urls. For example a url that contained the old way of writing (hopefully i get these characters right) MALAYALAM LETTER CHILLU LL (aka U+D33 U+D4D U+200D): http://example.com/ള്‍ Can be written in the wiki as http://example.com/%E0%B4%B3%E0%B5%8D%E2%80%8D and that should not be automatically converted to the new form (aka 'MALAYALAM LETTER CHILLU LL' (U+0D7E) which has the % encoding form of http://example.com/E0%B5%BE )

Note, I'm changing the component from Database to internationalization as I believe that internationalization is a better component for unicode normalization issues.

I will recommend using the url encoding workaround which works universally.

The fix for this bug would be very complicated and even have security implications.