Page MenuHomePhabricator

Use sort collation config in JavaScript (jquery.tablesorter)
Closed, ResolvedPublic

Description

Split off from T2164, this is a tracking bug for areas that need improved client-side sorting.

Details

Reference
bz30674

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:51 PM
bzimport set Reference to bz30674.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 8732 has been marked as a duplicate of this bug. ***

Collation can already be adapted using: mw.config.set('tableSorterCollation',{'Ä':'A','Ö':'O','Ü':'U','ä':'a','ö':'o','ü':'u','ß':'ss'});

The question is, how do we do this automatically. Create a JS version of the Collation class ?

Easiest approach would probably to in the parser: read through the table, generate binary sortkeys, turn them into some non-binary form, put it in a data attribute.

The collation class (or rather the third party icu library used by it) is rather complex. I'm doubtful we could re-create it in javascript sanely. For example, it needs to do sorting on three different levels, be able to dynamically insert new "in-between" values, etc. We also don't even know what rules are being used at runtime (as the php bindings don't expose that, and it changes with version).

OTOH, I suppose it doesn't need to be exactly the same. Fixing just the really bad mismatches in sorting behaviour might be good enough.

This is actual problem for for various languages, for various scripting systems. In my opinion, needs to using CLDR instead custom tables and other client-side hacks.

In the https://phabricator.wikimedia.org/source/mediawiki/browse/master/resources/src/jquery/jquery.tablesorter.js currently used functions based on calling UTF charset directly:

function sortText( a, b ) {
  return ( ( a < b ) ? -1 : ( ( a > b ) ? 1 : 0 ) );
}

function sortTextDesc( a, b ) {
  return ( ( b < a ) ? -1 : ( ( b > a ) ? 1 : 0 ) );
}

In my opinion, these (or things around this) shoulds be replaced to calling CLDR data for built sorting index.

This is successful resolved for categories - see T162823 - and these same sequence should be used in the tablesorting.

PS: Maybe someone methods are exists in the https://github.com/rxaviers/cldrjs or https://github.com/globalizejs/globalize

The data required for collation is comparatively huge, somewhere on the order of megabytes I think. Even if there was a ready-to-use JavaScript library that implements the Unicode Collation Algorithm (which to my knowledge there isn't, but I'd love to be proved wrong), we couldn't reasonably ship it to the browser.

The alternative solution would be to precompute the sortkeys in PHP code, like @Bawolff already suggested above (and then we can compare the sortkeys using the naive method and get correct results). This would approximately double the amount of data sent to the user, which is not great but probably better than shipping the collation data in most cases. But the PHP parser doesn't currently know whether it's dealing with a sortable table or a regular boring one (and I'm not sure if it even knows the contents of a table cell while rendering its attributes, which would be required here). Overcoming these problems is surely possible, but it would be a non-trivial undertaking.

I think the current mw.config.set( 'tableSorterCollation', ... ) workaround is sufficient for most cases. For example, we've been using it on Polish Wikipedia for years (search in https://pl.wikipedia.org/wiki/MediaWiki:Common.js), and it's also feasible for more complicated cases like Serbian (https://sr.wikipedia.org/wiki/Медијавики:Common.js).

Heh, this may be good idea for monolingual projects, but not for multilingual sites, as Meta-Wiki, Wikimedia Commons or Wikidata (and also for small chapter's wikis, as [[wmru:]]).
Should be another solution.

And besides this, functions 'sortText' and 'sortTextDesc' are written incorrectly and should be changed in any case. Comparison based on the UTF code is incorrect for this purpose.

We can overload with https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare on platforms where available.

It should be noted however that this function can be significantly slower.

Not sure what needs to be done here, but... We use and have set Unicode collation for Categories (uca-cs for Czech Wikipedia). This is already configured for majority of Wikipedias. Why don't we use the same here?

@Dvorapa that makes use of MySQL logic to do the actually implementation of that a setting. We cannot make use of that in Javascript (well localeCompare in theory does, but most browsers haven't really implemented it and just fallback to a standard collection for most languages)

Change 517266 had a related patch set uploaded (by TheDJ; owner: TheDJ):
[mediawiki/core@master] Tablesorter: Use localeCompare

https://gerrit.wikimedia.org/r/517266

Change 517266 merged by jenkins-bot:
[mediawiki/core@master] Tablesorter: Use localeCompare

https://gerrit.wikimedia.org/r/517266

TheDJ claimed this task.
TheDJ removed a project: Patch-For-Review.

This should now be fixed if your browser supports localeCompare for the locale you are using in the content

@Dvorapa that makes use of MySQL logic to do the actually implementation of that a setting. We cannot make use of that in Javascript (well localeCompare in theory does, but most browsers haven't really implemented it and just fallback to a standard collection for most languages)

The logic is actually on the php side (php creates a binary sortkey which can be used using normal string comparison). In principle same logic could be used to create a sortkey added as a hidden attribute, supposing the value of the table cell is known to php during parse time. But that would probably increase the html size a bit and it might be difficult to determine a table cell's value in php

Edit: err, i kind of already said that. Havent read the bug recently and didnt realize that i was partially repeating my previous comment