Page MenuHomePhabricator

update support to Unicode 5.1.0
Closed, ResolvedPublic

Description

Unicode 5.1.0 is out http://www.unicode.org/versions/Unicode5.1.0/
Please update Unicode support to that version.

For example the characters Ɑ <U+2C6D> has been added as the upper case of ɑ <U+0251>, which would be useful for automatic titling.


Version: unspecified
Severity: enhancement

Details

Reference
bz13615

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:04 PM
bzimport set Reference to bz13615.
bzimport added a subscriber: Unknown Object (MLST).

r34417: Normalization data files updated to Unicode 5.1.0; passes the automated tests.

Seem to have long since lost the script I originally used to generate the Utf8Case.php mapping file, which appears not to have been updated since 2002 or so. :)
Made a new one and moved it into the UtfNormal sub-library.

Note a couple limitations:

  • Case mapping (still) uses only the 1:1 simple mappings. Any full or locale-specific mappings are ignored.
  • These case mappings are not used anyway when the PHP mbstring extension is available; mbstring's case conversion functions are used instead, with whatever version of Unicode support and whatever complex mapping support they may or may not have.
  • The generated Utf8Case.php file is not used directly -- you must also regenerate the serialized version in the 'serialized' directory after updating it to a new Unicode version.