Page MenuHomePhabricator

Implement central locale-specific, or tailored, sorting framework (tracking)
Open, MediumPublic

Details

Reference
bz30673
TitleReferenceAuthorSource BranchDest Branch
Source data should be read from files.repos/data-engineering/eventutilities-python!14gmodenaT326731-fix-ds-from-datamain
Add Mediawiki Event Enrichment to projects.json.repos/releng/gitlab-trusted-runner!14gmodenaadd-mediawiki-stream-enrichmentmain
Add initial blubber pipeline.repos/data-engineering/mediawiki-event-enrichment!3gmodenaT326731-dockerize-enrichment-appmain
pyflink should not be an install_requires dependencyrepos/data-engineering/eventutilities-python!13gmodenaT326731-drop-pyflink-run-depsmain
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
DeclinedNone
ResolvedNone
OpenNone
OpenFeatureNone
OpenNone
OpenFeatureNone
OpenFeatureNone
ResolvedNone
ResolvedNone
Resolvedmatmarex
OpenNone
ResolvedNone
Resolvedtstarling
ResolvedNone
Resolvedmatmarex
OpenFeatureNone
Resolvedmatmarex
OpenNone
DeclinedNone
OpenNone
OpenNone
ResolvedNone
DeclinedNone
Resolvedmatmarex
Resolvedmatmarex
OpenFeatureNone
OpenFeatureNone
DuplicateAmire80

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:51 PM
bzimport set Reference to bz30673.
bzimport added a subscriber: Unknown Object (MLST).

probably need this before we can start introducing locale-specific sorting in various other places.

Not before supporting the CLDR version of the DUCET in the "root" locale, because tailorings defined for each language are now defined on this new DUCET, instead of the standard DUCET like before.

Unicode anyway has said that the default DUCET would later reintegrate some changes made in the CLDR version, but NOT all of them (notably not the pseudo-collation elements and a few other things, because legacy impelmentations of UCA using the past DUCET would now be shocked by these pseudo-elements, due to their syntax, or because they use some references to codepoints assigned to non-characters, that have to be treated specially in the UCA implementation).

Note that UCA remains an updatable specification, as well as LDML and the CLDR data based on LDML (there's still an ongoing change in these specifications, currently in public review, plus at least 4 pending changes that are being discussed and may be added later).

Sorting of what? If this is about category (member) sorting, the bug summary should say so. If it's about another type of sorting, it should say that instead.

@MZMcBride: no this is not specific to category (member) sorting, but more general about how to implement and support tailorings. The other bugs concerns the integration of UCA, with the base "root" CLDR version (which should be evolved to use an alternate DUCET, in a separate bug, as defined now in the current version of CLDR, because it is different, and will remain different in some documented aspects).
The bug description is correct for me.

(In reply to comment #3)

Sorting of what? If this is about category (member) sorting, the bug summary
should say so. If it's about another type of sorting, it should say that
instead.

(Correct me if I'm wrong) - I think what this bug is about is adding localized sorting methods (so the sweedish wikipedia would use a different sort order then say english because sorting conventions differ between languages)

(In reply to comment #5)

(Correct me if I'm wrong) - I think what this bug is about is adding localized
sorting methods (so the sweedish wikipedia would use a different sort order
then say english because sorting conventions differ between languages)

I still don't really understand this. Sorting methods of what? Of pages at Special:AllPages? Of items in a table using JavaScript sorting code? Of category members on category description pages? Is this about adding a {{#sort:}} wikitext function? All of the above? None of the above? I think this bug still has an unclear scope. If it's split off from bug 164, it's probably (primarily) about sorting category members. If it's not, that's fine, but it'd be nice to figure out what's actually desired here (both code-wise and feature-wise).

(In reply to comment #6)

(In reply to comment #5)

(Correct me if I'm wrong) - I think what this bug is about is adding localized
sorting methods (so the sweedish wikipedia would use a different sort order
then say english because sorting conventions differ between languages)

I still don't really understand this. Sorting methods of what? Of pages at
Special:AllPages? Of items in a table using JavaScript sorting code? Of
category members on category description pages? Is this about adding a
{{#sort:}} wikitext function? All of the above? None of the above?

Both all and none of the above, or neither :p

In all seriousness though – afaik this ticket requests the implementation of a central sorting mechanism that is bound to a Language. This can then be utilized in various places and exported to different formats where needed (eg. something in SQL/PHP to sort AllPages and Category members, as well as JSON export for jquery.tablesorter perhaps, anything is possible but the applications are not in the scope of this bug)

(In reply to comment #3)

Sorting of what? If this is about category (member) sorting, the bug summary
should say so. If it's about another type of sorting, it should say that
instead.

My understanding when I opened this bug is that we need some mechanism to sort *anything* according to locale.

(In reply to comment #6)

If it's split off from bug 164, it's probably (primarily) about sorting
category members.

Sorry for the confusion. Bug 164 had branched off into subjects besides category sorting and this bug is part of the effort to have one issue per bug.

Bug 164 *started* on subjects besides category sorting -- "This problem is most visible in the rise of Categories, but present in every automatic list, like Allpages or the list of registered editors." It was later narrowed for some reason when only category sorting got implemented.

Since bug 164 has a billion comments and is just too noisy, this bug seems to be taking on its original role, rarefied explicitly to an overarching tracking bug.

Much of this core infrastructure is implemented by the Collation class, already committed for bug 164, but only hooked up to categories for actual use and storage, and with some limitations on what locales are available from a straight configuration.

Key things to do:

  • actually allow use of locale-specific collations _without writing additional code_ when the collation is already provided by our libraries (currently there is no way to request a locale collation from the intl extension, as only "root" can be selected without writing an extension to process a different key name)
  • (probably) make reasonably sanely automatic use of content-language locales, so site administrators don't have to set both a language *and* a collation for most default cases
  • make sure that non-category things that need to do locale sorting have access to it and make use of it (seems to be tracking bug 30672?)

Yes I agree that this bug should remain focused to what is in the servier-side MediaWiki software itself (including its default front-end in the HTML/JS/CSS code generation part; and the default SQL backend, or at least in its interface if MySQL is just one possible interface provided with MEdiaWiki, not excluding other interfaces to PostgreSQL for example).

Outside of this bug, we also need to find a clien-side collation with an installable set of javascripts, to perform client-side sorting where needed (the main application is for sorting tables locally on the client, without necessarily having to use a server-side query, but this couls still be an option with an additional server API that will service some sort requests for clients, using a XML request or (probably better) JSON request).

For now there does not exist any viable client-side library in Javascript that can perform tailorable collation, so developing an addition server-side API would be helpful (and integratable as a MediaWiki extension, or initially using a service deployed separately on some server with its own API): such server-side helpers do exist today, and it's not a big deal to adapt it with some javascript. It could even be experimented using the local PC of the client as the server (by developping a browser plugin to service the request, because it's easy to devlop such browser plugin based on CLDR, and with the javascript sensing the presence of this API on the PC).

With this javascript initially using the external "server", we can experiment on how to deplou the javascript by installing it in MediaWiki (it can even be experimented by storing the javascript on user pages, without any modification to the server-side MediaWiki software or installation).

In other words there are many ways to support this client-side mechanism witout immediately having to change MediaWiki (even if later MEdiaWiki is integrating one of the proposed methods, making it part of its distribution).

But the central part is the server-side : we still need to integrate in MEdiaWiki such collation, using a PHP Collation class used by the sorftware, and also exposed to MediaWiki pages with a parser function (notably one that will compute a collation key from a given string, and a locale identifier).

With this tool we can then develop various things to support a coherent collation with all server-side usages (including for servicing remote JSON requests made by clients in Javascript, for clients that won't want any plugin installed in their browser or any preconfiguration of their user pages to store some javascript sort helpers).