Page MenuHomePhabricator

Scribunto needs sane Unicode string support
Closed, ResolvedPublic

Description

Scribunto's built-in string module works with bytestrings. So if you have something like "string.len('hüllo')", it will return 6. If you have something like "string.reverse('hüllo')", it will return "oll��h".

This is fine for a programming language, I guess, but particularly for a case like Scribunto (where template programmers are being targeted and there's Unicode everywhere), sane Unicode string handling _must_ come with the extension.

Victor Vasiliev has done some work on this already, I'm told, as a ustring module. There's a C part and a Lua part. I've no idea where the code is, but I'm told it's publicly available somewhere.


Version: unspecified
Severity: enhancement
URL: https://www.mediawiki.org/wiki/Extension:Scribunto/API_specification#ustring_API

Details

Reference
bz39646

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:09 AM
bzimport added a project: Scribunto.
bzimport set Reference to bz39646.
bzimport added a subscriber: Unknown Object (MLST).

C code is in SVN, right in luasandbox module. Lua code is in gerrit, but it needs more fixes.

Two points:

(1) http://scribunto.wmflabs.org has some version of a ustring module right now. Not sure how or why, though its function names are painfully abbreviated.

(2) Fran McCrory makes some very interesting points at https://www.mediawiki.org/w/index.php?diff=575869&oldid=575863 about using u'foo' syntax and whether it might make sense to do away with bytestrings altogether.

The code Victor wrote had a completely different API to the stock Lua string functions, and it wasn't possible to simulate it in pure Lua. So I disabled it before I deployed it. It's better for the functionality to be temporarily missing than to be stuck with a bad interface forever.

(In reply to comment #4)

The code Victor wrote had a completely different API to the stock Lua string
functions, and it wasn't possible to simulate it in pure Lua. So I disabled it
before I deployed it. It's better for the functionality to be temporarily
missing than to be stuck with a bad interface forever.

Thank you for explaining. That's fine and I completely agree. But it would saved me a ton of confusion if this had been made clearer (cf. bug 39655).