Expose method in Lua/Scribunto to fetch page titles from the prefix index
Open, HighPublicFeature
Actions

Assigned To

None

Authored By

	• MZMcBride
	Apr 11 2013, 8:41 PM

Description

Looking at https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual, I don't see a way currently to generate a list of pages based on prefix. For example, I wanted to write a module that would take each page listed at https://meta.wikimedia.org/wiki/Special:PrefixIndex/Global_message_delivery/Targets/ and generate output based on iterating over this generated list.

Rather than using a generated list, I was forced to specify each page title. This isn't great, as pages may be added or deleted and I don't want to update such a list by hand.

An equivalent to [[Special:PrefixIndex]] (or the MediaWiki API's list=allpages&apprefix=) inside Scribunto/Lua would be wonderful.

Details

Reference: bz47137

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open	Feature	None	T50176 Requested Scribunto/Lua built-in methods/functions (tracking)
		Open	Feature	None	T49137 Expose method in Lua/Scribunto to fetch page titles from the prefix index

Event Timeline

• bzimport raised the priority of this task from to Low.Nov 22 2014, 1:30 AM

• bzimport added a project: Scribunto.

• bzimport set Reference to bz47137.

• bzimport added a subscriber: Unknown Object (MLST).

• MZMcBride created this task.Apr 11 2013, 8:41 PM

darklama wrote:

I've provided an iterator solution, but it has the same limitations as
the prefixindex special page:

https//meta.wikimedia.org/wiki/Module:Subpages

I think an equivalent to the MediaWiki API's list=allpages&apprefix=
in iterator form inside Scribunto/Lua would be better though.

gryllida wrote:

Correct URL: https://meta.wikimedia.org/wiki/Module:Subpages

• MZMcBride mentioned this in T76434: Scribunto needs access to equivalent of Special:AllPages and Special:PrefixIndex.Dec 2 2014, 4:25 AM

Bumped priority up now that the way users did this before (unstrip) doesn't work anymore.

In T49137#800351, @Jackmcbarn wrote:

Bumped priority up now that the way users did this before (unstrip) doesn't work anymore.

I really don't care at all what the priority of this (or any) task is.

That said, it feels a bit strange for the sudden absence of this functionality to be considered high priority. The previous implementation (using transclusion) was pretty clearly a giant fragile hack. I think everyone involved knew that this hack was almost certainly going to break at some point as Special page transclusion was never considered a stable programmatic interface.

Rillke subscribed.Dec 10 2014, 7:03 PM

In T49137#800431, @MZMcBride wrote:

Special page transclusion was never considered a stable programmatic interface.

It was the only viable for achieving a goal in regard to maintenance work, performance and functionality. At Commons, we used it for listing language subpages (/af, /de, /nl, ...), so if is easier to implement this specific functionality, I'd be happy with that. Perhaps I should open a ticket for this specific request?

And talking about stable interfaces, I don't know how often I had to change my scripts making use of API queries because something changed in incompatible ways. Sometimes I was under the impression that gadgets doing screen scraping had less frequently to be updated.

onei subscribed.Mar 10 2015, 2:08 PM

Rical subscribed.Apr 10 2015, 10:07 AM

Danny_B subscribed.Jun 5 2015, 9:44 PM

In a multilingual module, I put translations of arguments names, categories and error messages in the submodule "module_name/I18N". Then the main_module can change without change translations in any language in any wiki. But without the module_name itself I cannot automatize that for any modules.

The present change could resolve that, giving at the same time "module_name/I18N" and "module_name". The change could be helped with a parameter to select a part of sub titles, which contain "I18N" in my case.

I could also ask a change "Get the module_name itself". But the present change is more general and can be used for a group of sub modules and their datas.

Perhaps, each new sub-titled page could record itself in a dedicated table in the "mother page". That could easy help to solve any tree of pages questions. For existing pages a bot could once build these tables.

In T49137#1343123, @Rical wrote:

Perhaps, each new sub-titled page could record itself in a dedicated table in the "mother page".

That sounds like the old problem from T17071: Wikibooks/Wikisource needs means to associate separate pages with books.

Sorry, I was not enough explicit. My proposition was only about sub pages like from Module:pages or from User:pages. About pages of books in wikisource, in Page: space, I don't know if the users of wikisource are interested. These pages are managed by the special Extension:Proofread_Page https://www.mediawiki.org/wiki/Extension:Proofread_Page which compares the text of one page of book in front of an image.

I'm wondering about how this feature would work with the current system of page protection, link tables and the expensive function count.

Every time this new prefixIndex function was used, we would have to have some way of tracking when a page with the prefix was created or deleted. When such a creation or deletion occurred, we would have to update all the transclusions of the page (probably a template) that used it, so presumably every page with the given prefix would have to count the template as a transclusion in the link table.

Now let's say this is a template with millions of transclusions. In this case, anyone creating or deleting a page that has the right prefix would trigger a re-rendering of all of these millions of pages. As things stand, there would be no kind of page protection preventing this, so the person doing the creating or deleting might not have any idea that their action was so expensive. It could also be used maliciously to put unnecessary strain on a site's servers. And while deletion is limited to admins, creation could potentially be done by anonymous users.

The previous workaround forced transclusions to update by simply disabling caching, but that's not an option, as it's even worse from a site-stability perspective. If we did that on a widely-transcluded template, it might actually bring the site down, as the pages would all have to be re-rendered on every page view.

Also, with this function, it would be possible to see whether a given page existed or not. If we treat this like the #ifexist parser function, then we would need to make it an expensive function. In fact, as you can check the existence of many pages at once, presumably we would need to make one prefixIndex call count as many expensive function calls. (As many as there are possible results that could be returned?)

I'm as keen as anyone else to see this feature implemented, but we need to think about how to deal with these questions first.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 17 2015, 3:53 AM

I'm a bit confused about the concerns you have here.

We already allow transclusion of Special:PrefixIndex.
We have a unique index on (page_namespace, page_title), so listing by prefix is cheap.
Scribunto/Lua modules are treated very similar to templates already and rely on the same cache invalidation infrastructure (links updates, etc.), as I understand it.

Long-term, it would be great if we could rely less on caching. The adoption of Scribunto modules over ParserFunctions templates, the deployment of HHVM, and other changes should get us closer to this goal eventually, I hope.

In my understanding, this function needs to work only when a page is created or renamed or deleted. Then a tree-table is updated for all pages up and down in the tree. Then each of these pages record the part of the table for all of it's sub-pages.
Later a scribunto function gives to the module the tree-table of the page, with no cost. Only access to any of these pages is expensive.

In T49137#1463102, @MZMcBride wrote:

We already allow transclusion of Special:PrefixIndex.

Although Special:PrefixIndex can be transcluded, its contents are stripped, meaning that modules can't parse it.

In T49137#1463321, @Rical wrote:

In my understanding, this function needs to work only when a page is created or renamed or deleted. Then a tree-table is updated for all pages up and down in the tree. Then each of these pages record the part of the table for all of it's sub-pages.
Later a scribunto function gives to the module the tree-table of the page, with no cost. Only access to any of these pages is expensive.

I don't understand what you mean here. Can you clarify this comment?

In T49137#1463522, @Jackmcbarn wrote:

In T49137#1463102, @MZMcBride wrote:

We already allow transclusion of Special:PrefixIndex.

Although Special:PrefixIndex can be transcluded, its contents are stripped, meaning that modules can't parse it.

Right. I was speaking generally here. That is, users can transclude {{Special:PrefixIndex/Foo}} into wiki pages and subsequent page deletions and creations don't cause the servers to explode. In general, listing pages by prefix is pretty cheap, so I'm not sure there would be a huge problem with the performance of Scribunto/Lua modules if this functionality existed.

The difference is that in Lua someone can try to write a loop instead of having to be satisfied with getting just the 200 pages {{Special:PrefixIndex/Foo}} will give you.

Danny_B added a subscriber: ori.Jul 19 2015, 5:19 PM

In T49137#1463321, I tried to describe the cheaper-for-use implemention, but I am not sure because I'm not a system coder.

Ricordisamoa subscribed.Dec 18 2015, 7:32 PM

Alkamid subscribed.Feb 20 2016, 3:19 PM

In T49137#1463321, @Rical wrote:

In my understanding, this function needs to work only when a page is created or renamed or deleted. Then a tree-table is updated for all pages up and down in the tree. Then each of these pages record the part of the table for all of it's sub-pages.
Later a scribunto function gives to the module the tree-table of the page, with no cost. Only access to any of these pages is expensive.

In fact this is more complex than that because the same saved page can generate dozens or hundreds distinct variants (based on the current user name, or the current language used, and they can all change based on current time if there are uses of magic words like {{CURRENTMINUTE}}, which forces the server-side caches (not just the browser caches) to be given a shorter expiration time (meaning that these pages will be parsed and generated again). MediaWiki sets a minimum expiration time for all pages (to avoid resource attacks), but does not limit the number of languages.

Multiply all these variants by the number of *source* subpages to iterate, this page can generate a huge charge on the server with thousands or tens of thousands page being flushed from the cache: if then a remote user attempts to load all these pages (without even needing to load them completely and wait for their completion), the server charge will suddenly explode (in terms of CPU, not much in term of disk I/O as the Wiki source pages are all the same, but still lot of I/O on the server frontend cache).

But I agree: we could still allow a scribunto parser to get of list a limited number of subpages (e.g. 200) within a range (just like when just transcluding a Prefixindex). This would allow creating pages with navigation buttons to get the next or previous range, overwhich a script could loop.

In my present application the bindmodules() function tries to require("Module:Author/I18N"), across pcall to not fail, for all modules and libraries and their alternative versions, like Module:MathRoman02 and their I18N submodules for i18n tables.

This have already an answer in the Module:Central.
Another MW answer is usefull only if it is not expensive.

I believe understand that MW do not work like a classic PC finder for the subpages. Could a such structure be a MW answer ?
Then a module could ask existing subpages from a give one, just 1 level below, then select some pages, then ask another level ... recursively but under the control of the module.

hoo subscribed.Jun 22 2016, 10:27 AM

Justinrleung subscribed.Dec 26 2017, 8:18 AM

Erutuon subscribed.Mar 9 2018, 1:12 AM

Wonnral subscribed.Oct 13 2018, 8:15 PM

Sebastian_Berlin-WMSE awarded a token.Mar 18 2019, 7:24 AM

Sebastian_Berlin-WMSE subscribed.

Zache subscribed.Nov 8 2019, 2:07 PM

Anomie merged a task: T237862: There should be a way to get a list of page's subpages in Scribunto.Nov 12 2019, 2:51 PM

Anomie added a subscriber: Dvorapa.

Dvorapa awarded a token.Nov 12 2019, 7:12 PM

MarioGom awarded a token.Apr 28 2020, 12:29 PM

MarioGom subscribed.

Aklapper removed subscribers: Anomie, • wikibugs-l-list.Oct 16 2020, 5:01 PM

Note that as of today (Saturday 2020-10-18), any attempt to sort the native language names returned by fetchLanguageNames() (after copying these names into a new sequenced table using table.insert(t, ...)) fails when calling table.sort(t, compare): this is a new bug of Scribunto that changed the interface of table and no longer accepts in parameter a standard comparison function (taking two native language names which should be strings). Visibly, fetchLanguageNames now no longer return standard strings, they are protected including in their references, and do not match the expected signature for the comparison function (this causes an internal error, inside sort(), where the comparison function is called with the wrong parameters, causing a invalid-type error).

A possible work around I tried was to force the conversion of each returned native language name into normal strings (e.g. by concatenating an empty string). I tried it, but this did not work. What is failing is really table.sort(t, compare) in its version tweaked by Scribunto and that no longer supports the table.sort method with *any* function compare(a, b) parameter. So table.sort is now incorrectly bound in the PHP code of Scribunto, it should be able to call a standard Lua function but fails; may be the bug is in the format of the object representing the sequence of strings returned by fetchLanguageNames(). (maybe this table is static and read-only, so it is no longer sortable as this would change the numeric keys; it may also be related to the open bug T49104 "Provide a method to create a non read-only copy of a mw.loadData result").

For now the work around is to use local ok, t2 = perror(function() table.sort(t, compare) end) so that it will catch the error without sorting the table, but this can cause serious issues in various modules depending on sort.

This is not critical if the sort function is just used for generating the UI or display, as elements will just show in an unsorted order, but it may be critical for modules that depend on sort to group together rows containing a column for native language name, e.g. to create aggregates (sums, means...).

So something changed recently in Scribunto. I was pinged to solve this bug in Commons, where Module:Language/List could not process the internal list languages returned by MediaWiki. I used perror to catch the failing sort, but then the returned table is not always fully sorted as it should be (though it's roughtly OK and still consistant in Commons where the languages list is unified and preprocessed with this module that will also cache the result because this is a simple but widely used function for various purposes where users expect to see a consistant ordering of languages everywhere to facilitate their navigation).

Lepticed7 awarded a token.Nov 5 2020, 6:48 AM

Lepticed7 rescinded a token.

Lepticed7 awarded a token.

Lepticed7 subscribed.

To limit the need for cache invalidations, maybe this function could be limited to subpages rather than arbitrary prefixes—for example the equivalent of Special:PrefixIndex/Global message delivery/Targets/ would be something like mw.title.new( 'Global message delivery/Targets' ).subpages. This way, when Global message delivery/Targets/bar/foo is created or deleted, only the subpage list queries for Global message delivery/Targets/bar, Global message delivery/Targets and Global message delivery should be invalidated, while using special pages, Special:PrefixIndex/Global message delivery/Targets/bar/fo also lists it.

Maybe even a (probably optional) depth limit can also be introduced, so that if one only wants to get translations (main use case on Commons), translations’ subpages aren’t listed (and, more importantly, the query doesn’t need to be invalidated if a such subpage is created/deleted). For example, mw.title.new( 'Global message delivery/Targets' ).subpages( 1 ) would return Global message delivery/Targets/bar, but not its subpage Global message delivery/Targets/bar/foo.

Alkamid unsubscribed.Feb 28 2021, 3:11 PM

lucamauri awarded a token.Mar 21 2021, 8:40 PM

lucamauri subscribed.

Xiplus subscribed.Mar 28 2021, 5:59 AM

Krinkle renamed this task from Add ability to generate a list of pages based on prefix to Scribunto/Lua to Expose method in Lua/Scribunto to fetch page titles from the prefix index.Feb 23 2022, 1:15 PM

Krinkle awarded a token.

Krinkle updated the task description. (Show Details)Feb 23 2022, 1:17 PM

Yodin subscribed.Mar 22 2022, 12:37 PM

Lectrician1 subscribed.Apr 21 2023, 12:37 PM

Nardog subscribed.Aug 18 2023, 12:37 AM

Frostly subscribed.Aug 19 2023, 10:01 AM

I have just discovered that table.sort(t) on a simple table of UTF-8 string, does NOT sort the table correctly (producing nearly random order).
In fact the bug is NOT in table.sort itself, but in the local implementation of the binary operator '<' that compares two strings.
According to the Lua documentation:

comparing two strings is performed in a "local-dependant" way by the implementation:
the operator '<' does NOT call any methamethod when values are '''both''' strings or are '''both''' numbers.
However in Scribunto, it looks like "some" strings are actual Lua strings, others are "metastrings" (if their value originate from the MetaWiki parser): when comparing strings, if one of them is a "metastring", the internal '<' operator is NOT always called and instead a meta-operator will be tried on the 2st value if its not a native Lua string, or on the 1st value otherwise: this gives inconsitant results, if Scrubunto implements this comparison differently than the Lua-internal comparison of native Lua strings.
But even when comparing pure Lua strings (whose value are generated from inside Lua, we get incorrect order).

Currently 'a' < 'b' works correctly only if 'a' and 'b' are strings containing only ASCII bytes. If there are any non-ASCII bytes, it seems that in some cases they are treated as if they were negative, or they get offseted by some internal constant value, and some others are reencoded (e.g. as if they were in IS 8859-1 and then transformed into UTF-8, changing the number of bytes).

Just try comparing any string with 'ÿ' (Latin small letter 'y' with dieresis, ISO 8859-1 = 255, treated as if it had the value -1, but in UTF-8 it is encoded as two bytes); some non-ASCII characters encoded as UTF-8 and using the same leading or trailing byte values but in different combinations are treated differently).

Visibly the string comparison used by the running Lua engine is not using any predictable locale. It is definitely not using a signed 8-bit binary comparison and not an unsigned 8-bit comparison. Most probably the server is running the Lua engine with an uninitialized locale (it should be set to "LC_ALL=en.UTF-8" to use an unsigned binary-8 order, not "LC_ALL=C", but that locale is not properly setup on the server (it does not even work correctly for just characters in the Unicode BMP, it is not even correct for characters lower than U+3FF that are encoded on 1 or 2 bytes in UTF-8).

Even if full collation (based on Unicode DUCET) is not implemented, at least we should have a consistant ordering working with opaque 8-bit binary bytes (either signed or unsigned: unsisgned whould be preferable for use with UTF-8 strings used everywhere in MediaWiki).

Initially I thought that table.sort() had a bug, but what table.sort() returns is fully constant with what string.operator'<' returns on two Lua native strings (there are additional bugs and inconsistancies when comparing with non Lua-native strings exposed by MediaWiki which are treated differently and use their own different ordering exposed by the Scribunto interface).

This is a very strange bug. I've tried to decipher the logic used only by native Lua strings, but found no usable logic. My opinion is that the current compilation of the Lua engine is severely broken in the internal code (written and compiled in C) used in the implementation, possibly broken C macros not working correctly with an unspecified or not properly initialized locale. For now string comparison ONLY works with pure ASCII and then sorting tables of strings is unusable if there are any non-ASCII bytes in them.

Note that this is NOT a problem of "sort stability": the same inconsistant order exists independatly of the fact that there may be duplicate string values or not. But it may be sensitive to the fact that these string values are "interned" or not (strings with small length are internalized, but not all: 'a' is interned as it is encoded in UTF-8 as 1 byte, but not 'ÿ' which is encoded as two bytes: there's apparently an incorrect assumption trying to "optimize" the case where two strings are "equal" (apparently to avoid scanning their bytes content), but it fails if one of them is interned but not the other.

As I cannot access the internal environment on which Lua is running (notably I don't know in which locale the Lua engine is running, and if that locale is properly installed on the server for the standard C libraries), I cannot do further progresses. Developers having access to these details on the server should investigate and debug it (you don't need to test table.sort(), first make tests on pure internal strings that you can build in Lua, and test the results of operator '<'.

In these tests, you can use a basic set of strings, notably the set of predefined HTML5 entities (encoded as UTF-8), that currently cannot be compared correctly: this set (encoded in UTF-8 and Unicode NFC form) is enough to detect this string comparison bug: it contains various ASCII characters, many extended Latin letters coded on 2 or 3 bytes, some Greek letters encoded on 3 bytes, some maths symbols encoded on 3 bytes. This set also contains some C0 and C1 controls, a few non-spacing diacritics (not associated in pairs with a base character), and some entities encoded as 2 code points for a total of 4 to 6 bytes.

If you want I can bring you such a table of test strings, it is small enough, but contains many cases for practical use (notably in HTML5 where it is standardized and stable). "table.sort()" is unable to sort them correctly, but even if you implement your own sort in Lua (using the string.operator'<' in your own algorithm) you get the same bad result and you fix it by NOT using string.operator'<' but by scanning string bytes in your own loop (this fix works but is considerably slower than the builtin string.operator'<', which is fast but definitely broken in Scribunto on Wikimedia servers).

I've done the same thing on my own standalone installation of Lua on my PC, and I never get this caveat. So the bug is in the Scribunto's builtin implementation of Lua installed on Wikimedia servers: it was not properly compiled or not properly installed.

In T49137#9165140, @Verdy_p wrote:

I have just discovered that table.sort(t) on a simple table of UTF-8 string, does NOT sort the table correctly (producing nearly random order).

This is apparently caused by a bug in the strcoll C function in the C.UTF-8 locale in the old version of glibc available on the Wikimedia servers, as described in T193096#4161287. Lua's table.sort uses the less-than (<) operator if no comparison function is provided (https://www.lua.org/source/5.1/ltablib.c.html#sort_comp), and the less-than operator uses strcoll when comparing two strings (https://www.lua.org/source/5.1/lvm.c.html#l_strcmp). In English Wiktionary, a module that regularly encounters non-BMP characters (> U+FFFF) uses byte-by-byte comparison, which is equivalent to code point comparison for UTF-8.

In T49137#9165140, @Verdy_p wrote:

there are additional bugs and inconsistancies when comparing with non Lua-native strings exposed by MediaWiki which are treated differently and use their own different ordering exposed by the Scribunto interface

What differences have you found? There is only one string type in vanilla Lua with one behavior of the < operator (assuming C locale is not changed during the running of the code). The only way to have a non-vanilla string type is to use full userdata with a metatable, and the Scribunto manual states that no userdata is used.

Do we need to compile Lua with the implementation of string.operator'<' calling "strcoll()" in an attempt to use a collation that never works correctly in any locale on Wikimedia servers? Can it be forced to use binary comparison (comparing only unsigned bytes: this consdition would change the results, sorting all ASCII before all non-ASCII, or the reverse, but it does not matter much, even if using unsigned char would be preferable to sort after ASCII all the non-ASCII bytes, i.e. all the rest of the BMP encoded in UTF-8 and other planes)?

I mean, if no compilation is possible, can Lua run in a locale where strcoll() performs an 8-bit binary comparison only (which would still be suitable for UTF-8, even if it's not trying to be smart, but at least gives consistant results). For now, strcoll() in UTF-8 with this bugged version of C libraries (or incorrect installation of the system support) is pure non-sense. There should exist a basic "C" locale in which strcoll() should work (I don't know if in your environment it uses signed or unsigned "char").

The fact that we app a Lua function to table.sort() of not does not change things, if we still use '<' to compare strings in that Lua function (so what table.sort does is correct).

In some past (still at the beginning of 2021), table.sort() had a bug and we could not even pass a Lua function to it (it thrown various errors that we needed to catch in Lua with pcall(), to get at least the unsorted table, but apparently this has been fixed).

But having to compare strings without using '<' but with using a Lua loop on their bytes() is really too slow (a small optimization of this case is to use a regexp find() to see if both both strings are plain ASCII, so that we can use '<', but otherwise we need to perform the scanning loop on their bytes()).

To do that, the only need would be to set a single environment variable LC_ALL in the user shell environment from which the Lua engine is run. On servers, with a shell you can get the list of supported locales with "localectl list-locales" (if your Linux is based on SystemD) or otherwise one of:

cat /usr/share/i18n/SUPPORTED.
locale -a
nlsinfo
ls /usr/lib/nls/loc
ls /usr/lib/locale
ls /usr/lib/nls
ls /usr/share/locale

I'm not sure which of these system paths the C libraries compiled within your Lua engine are using. Avoid "en.UTF-8" locale that does not work for strcoll(): change that locale, or fix it with a newer installation of C libaries and recompile Lua to use it!

Dans T49137#2207582, @Verdy_p a écrit :

But I agree: we could still allow a scribunto parser to get of list a limited number of subpages (e.g. 200) within a range (just like when just transcluding a Prefixindex). This would allow creating pages with navigation buttons to get the next or previous range, overwhich a script could loop.

One of the most common use for iterating subpages is to get the list of translations. If pages are translated using the Translate extension it generates an index that is already retrieved and displayed using the "<languages/>" tag, but it only generates a list from a page that has in the same base page name as the list to display (we cannot use another specified base name in parameter of this tag) and we cannot control the format of the generated navbox. The other problem is that now list of languages can include more than 500 entries, and we already get a limit exhausted on the number of costly parser function calls (notably on Commons and Wikidata, that are multilingual and have to support many languages).

For now the only transitional solution I see is to use Wikidata to store the index of all translated pages, and then Scribunto to make an external query via the Wikidata API. But it is an external request, that is slow and needs a cache that needs to be refreshed, and it is quite slow (we already have performance problems with Wikidata Infoboxes, and various pages where the Infobox cannot complete the many queries performed). The bad thing is that this requires multiple editing in Wikidata in addition to translated pages, and Wikidata may as well become out of sync.

As time passes, more and more pages are listed with parser function calls limit exhausted, or become out of sync (solved by using costly bots to force refreshes of long lists of pages and performing huge amounts of queries on multiple wiki servers): Commons and Wikidata tend to become quite slow (and I don't know what will happen next for supporting Wikifunctions in the context of the Abstract Wikipedia to support even more languages on many more topics).

The currrent management of what is considered "costly" is based on old limitations that may find a way with better implementations. "Costly" parser fucntion calls use a too basic single metric, limitations should be better tuned with fine grained measurements. But for now the server does not track very usable statistics for effective measurements of real costs and for hotspotting in the implementation what needs to be fixed to reduve the effective costs. Hard wired limits based on old assumptions and later no maintenance and global project tracking are quit bad: those limits stay in effect when the real costs are no longer where they were (and as time passes, the various tricky workaround used tend to become even more costly on servers than what was prevented by the old existing limits, that should not have such a long lifetime but should need to be reevaluated or replaced by other more effective limits).

One test to perform in Scribunto on Commons:

local p = require('Module:HTMLEntities')

local byte = string.byte

local function lowerthan(a, b)
  if type(a) ~= 'string' or type(b) ~= 'string' then
    return a < b -- raises the appropriate error
  end
  local i = 1
  while true do
    local u, v = byte(a, i), byte(b, i)
    if not(u and v) then
      return not(u or not v)
    elseif u ~= v then
      return u < v
    end
    i = i + 1
  end
end

local t={}; for _, v in pairs(p.entities) do table.insert(t, v); end
table.sort(t)
mw.logObject(table.concat(t, ', '))

local t={}; for _, v in pairs(p.entities) do table.insert(t, v); end
table.sort(t, function(a,b) return a < b; end)
mw.logObject(table.concat(t, ', '))

local t={}; for _, v in pairs(p.entities) do table.insert(t, v); end
table.sort(t, lowerthan)
mw.logObject(table.concat(t, ', '))

The two first logged results are identical, but their sort is completely incoherent (showing a proof that the effective comparison made is NOT binary, but based on a reencoding of bytes using an unknown or uninitialized locale, where a few non-ASCII bytes of them get mapped to negative values and others are just given random positive values >=128, leaving ASCII bytes in the middle; UTF-8 is definitely not used as the effective locale and it is also not signed or unsigned 8-bit binary, and not a ISO8859-1 locale). Run this test on different Wiki servers, you get another sort order. Currently on Commons I get this:

"𝒫, 𝓁, 𝓂, 𝒽, 𝒹, 𝒰, 𝓇, 𝓊, 𝓏, 𝓉, 𝓎, 𝓋, 𝔄, 𝒯, 𝒥, 𝚲, 𝒪, 𝔭, 𝔬, 𝒬, 𝛘, 𝛖, 𝛆, 𝒟, 𝛒, 𝔲, 𝔇, 𝔉, 𝔣, 𝔥, 𝔟, 𝔞, 𝟋, 𝔪, 𝔮, 𝔼, 𝔫, 𝔳, 𝔱, 𝔈, 𝔜, 𝔘, 𝔏, 𝕥, 𝔍, 𝔊, 𝔚, 𝔑, 𝔔, 𝔒, 𝔗, 𝔖, 𝔾, 𝔰, 𝔹, 𝛗, 𝕐, 𝕓, 𝕒, 𝕗, 𝕏, 𝛡, 𝔅, 𝟊, 𝕪, 𝛂, 𝕫, 𝕕, 𝕚, 𝕡, 𝛍, 𝕧, 𝕨, 𝕛, 𝕣, 𝕝, 𝕞, 𝕟, 𝕠, 𝔯, 𝛅, 𝔛, 𝔨, 𝔩, 𝕂, 𝔧, 𝕆, 𝔻, 𝔽, 𝔤, 𝔦, 𝔸, 𝔙, 𝕃, 𝔶, 𝔢, 𝕋, 𝔡, 𝔠, 𝕄, 𝔎, 𝔷, 𝔓, 𝔵, 𝔴, 𝕀, 𝔐, 𝕦, 𝒦, 𝛜, 𝛇, 𝒢, 𝒩, 𝒞, 𝒜, 𝓌, 𝛏, 𝓍, 𝛌, 𝕁, 𝛔, 𝛎, 𝛋, 𝛑, 𝛕, 𝛞, 𝛙, 𝚷, 𝛟, 𝛊, 𝛚, 𝓆, 𝒳, 𝒲, 𝒻, 𝒸, 𝒴, 𝒱, 𝒵, 𝛠, 𝒮, 𝓅, 𝒷, 𝛀, 𝒾, 𝛝, 𝓈, 𝒶, 𝒿, 𝚼, 𝓀, 𝛉, 𝓃, 𝛄, 𝛃, 𝕢, 𝕘, 𝕙, 𝕖, 𝚫, 𝚿, 𝚽, 𝕤, 𝚪, 𝕜, 𝕩, 𝚺, 𝕔, 𝚯, 𝛈, 𝚵, 𝕍, 𝕌, 𝛓, 𝕎, 𝕊, 	, \
, !, \", \", #, $, %, &, &, ', (, ), *, *, +, ,, ., /, :, ;, <, <, <⃒, =, =⃥, >, >, >⃒, ?, @, [, [, \\, ], ], ^, _, _, `, `, fj, {, {, |, |, |, }, },  ,  , ¡, ¢, £, ¤, ¥, ¦, §, ¨, ¨, ¨, ¨, ©, ©, ª, «, ¬, , ®, ®, ®, ¯, ¯, °, ±, ±, ±, ², ³, ´, ´, µ, ¶, ·, ·, ·, ¸, ¸, ¹, º, », ¼, ½, ½, ¾, ¿, À, Á, Â, Ã, Ä, Å, Å, Æ, Ç, È, É, Ê, Ë, Ì, Í, Î, Ï, Ð, Ñ, Ò, Ó, Ô, Õ, Ö, ×, Ø, Ù, Ú, Û, Ü, Ý, Þ, ß, à, á, â, ã, ä, å, æ, ç, è, é, ê, ë, ì, í, î, ï, ð, ñ, ò, ó, ô, õ, ö, ÷, ÷, ø, ù, ú, û, ü, ý, þ, Ā, ā, Ă, ă, Ą, ą, Ć, ć, Ĉ, ĉ, Ċ, ċ, Č, č, Ď, ď, Đ, đ, Ē, ē, Ė, ė, Ę, ę, Ě, ě, Ĝ, ĝ, Ğ, ğ, Ġ, ġ, Ģ, Ĥ, ĥ, Ħ, ħ, Ĩ, ĩ, Ī, ī, Į, į, İ, ı, ı, Ĳ, ĳ, Ĵ, ĵ, Ķ, ķ, ĸ, Ĺ, ĺ, Ļ, ļ, Ľ, ľ, Ŀ, ŀ, Ł, ł, Ń, ń, Ņ, ņ, Ň, ň, ŉ, Ŋ, ŋ, Ō, ō, Ő, ő, Œ, œ, Ŕ, ŕ, Ŗ, ŗ, Ř, ř, Ś, ś, Ŝ, ŝ, Ş, ş, Š, š, Ţ, ţ, Ť, ť, Ŧ, ŧ, Ũ, ũ, Ū, ū, Ŭ, ŭ, Ů, ů, Ű, ű, Ų, ų, Ŵ, ŵ, Ŷ, ŷ, Ÿ, Ź, ź, Ż, ż, Ž, ž, ƒ, Ƶ, ǵ, ȷ, ˆ, ˇ, ˇ, ˘, ˘, ˙, ˙, ˚, ˛, ˜, ˜, ˝, ˝, ̑, Ά, Έ, Ή, Ί, Ό, Ύ, Ώ, ΐ, Α, Α, Β, Β, Γ, Γ, Δ, Δ, Ε, Ε, Ζ, Ζ, Η, Η, Θ, Θ, Ι, Ι, Κ, Κ, Λ, Λ, Μ, Μ, Ν, Ν, Ξ, Ξ, Ο, Ο, Π, Π, Ρ, Ρ, Σ, Σ, Τ, Τ, Υ, Υ, Φ, Φ, Χ, Χ, Ψ, Ψ, Ω, Ω, Ω, Ϊ, Ϋ, ά, έ, ή, ί, ΰ, α, α, β, β, γ, γ, δ, δ, ε, ε, ε, ζ, ζ, η, η, θ, θ, ι, ι, κ, κ, λ, λ, μ, μ, ν, ν, ξ, ξ, ο, ο, π, π, ρ, ρ, ς, ς, ς, ς, σ, σ, τ, τ, υ, υ, υ, φ, φ, χ, χ, ψ, ψ, ω, ω, ϊ, ϋ, ό, ύ, ώ, ϑ, ϑ, ϑ, ϒ, ϒ, ϕ, ϕ, ϕ, ϖ, ϖ, Ϝ, ϝ, ϝ, ϰ, ϰ, ϱ, ϱ, ϵ, ϵ, ϵ, ϶, ϶, Ё, Ђ, Ѓ, Є, Ѕ, І, Ї, Ј, Љ, Њ, Ћ, Ќ, Ў, Џ, А, Б, В, Г, Д, Е, Ж, З, И, Й, К, Л, М, Н, О, П, Р, С, Т, У, Ф, Х, Ц, Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я, а, б, в, г, д, е, ж, з, и, й, к, л, м, н, о, п, р, с, т, у, ф, х, ц, ч, ш, щ, ъ, ы, ь, э, ю, я, ё, ђ, ѓ, є, ѕ, і, ї, ј, љ, њ, ћ, ќ, ў, џ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  , , , , , , ‌, ‍, ‎, ‏, ‐, ‐, –, —, ―, ‖, ‖, ‘, ‘, ’, ’, ’, ‚, ‚, “, “, ”, ”, ”, „, „, †, ‡, ‡, •, •, ‥, …, …, ‰, ‱, ′, ″, ‴, ‵, ‵, ‹, ›, ‾, ‾, ⁁, ⁃, ⁄, ⁏, ⁗,  ,   , ⁠, ⁡, ⁡, ⁢, ⁢, ⁣, ⁣, €, ⃛, ⃛, ⃜, ℂ, ℂ, ℅, ℊ, ℋ, ℋ, ℋ, ℌ, ℌ, ℍ, ℍ, ℎ, ℏ, ℏ, ℏ, ℏ, ℐ, ℐ, ℑ, ℑ, ℑ, ℑ, ℒ, ℒ, ℒ, ℓ, ℕ, ℕ, №, ℗, ℘, ℘, ℙ, ℙ, ℚ, ℚ, ℛ, ℛ, ℜ, ℜ, ℜ, ℜ, ℝ, ℝ, ℞, ™, ™, ℤ, ℤ, ℧, ℨ, ℨ, ℩, ℬ, ℬ, ℬ, ℭ, ℭ, ℯ, ℰ, ℰ, ℱ, ℱ, ℳ, ℳ, ℳ, ℴ, ℴ, ℴ, ℵ, ℵ, ℶ, ℷ, ℸ, ⅅ, ⅅ, ⅆ, ⅆ, ⅇ, ⅇ, ⅇ, ⅈ, ⅈ, ⅓, ⅔, ⅕, ⅖, ⅗, ⅘, ⅙, ⅚, ⅛, ⅜, ⅝, ⅞, ←, ←, ←, ←, ←, ↑, ↑, ↑, ↑, →, →, →, →, →, ↓, ↓, ↓, ↓, ↔, ↔, ↔, ↕, ↕, ↕, ↖, ↖, ↖, ↗, ↗, ↗, ↘, ↘, ↘, ↙, ↙, ↙, ↚, ↚, ↛, ↛, ↝, ↝, ↝̸, ↞, ↞, ↟, ↠, ↠, ↡, ↢, ↢, ↣, ↣, ↤, ↤, ↥, ↥, ↦, ↦, ↦, ↧, ↧, ↩, ↩, ↪, ↪, ↫, ↫, ↬, ↬, ↭, ↭, ↮, ↮, ↰, ↰, ↱, ↱, ↲, ↳, ↵, ↶, ↶, ↷, ↷, ↺, ↺, ↻, ↻, ↼, ↼, ↼, ↽, ↽, ↽, ↾, ↾, ↾, ↿, ↿, ↿, ⇀, ⇀, ⇀, ⇁, ⇁, ⇁, ⇂, ⇂, ⇂, ⇃, ⇃, ⇃, ⇄, ⇄, ⇄, ⇅, ⇅, ⇆, ⇆, ⇆, ⇇, ⇇, ⇈, ⇈, ⇉, ⇉, ⇊, ⇊, ⇋, ⇋, ⇋, ⇌, ⇌, ⇌, ⇍, ⇍, ⇎, ⇎, ⇏, ⇏, ⇐, ⇐, ⇐, ⇑, ⇑, ⇑, ⇒, ⇒, ⇒, ⇒, ⇓, ⇓, ⇓, ⇔, ⇔, ⇔, ⇔, ⇕, ⇕, ⇕, ⇖, ⇗, ⇘, ⇙, ⇚, ⇚, ⇛, ⇛, ⇝, ⇤, ⇤, ⇥, ⇥, ⇵, ⇵, ⇽, ⇾, ∀, ∀, ∁, ∁, ∂, ∂, ∂̸, ∃, ∃, ∄, ∄, ∄, ∅, ∅, ∅, ∅, ∇, ∇, ∈, ∈, ∈, ∈, ∉, ∉, ∉, ∋, ∋, ∋, ∋, ∌, ∌, ∌, ∏, ∏, ∐, ∐, ∑, ∑, −, ∓, ∓, ∓, ∔, ∔, ∖, ∖, ∖, ∖, ∖, ∗, ∘, ∘, √, √, ∝, ∝, ∝, ∝, ∝, ∞, ∟, ∠, ∠, ∠⃒, ∡, ∡, ∢, ∣, ∣, ∣, ∣, ∤, ∤, ∤, ∤, ∥, ∥, ∥, ∥, ∥, ∦, ∦, ∦, ∦, ∦, ∧, ∧, ∨, ∨, ∩, ∩︀, ∪, ∪︀, ∫, ∫, ∬, ∭, ∭, ∮, ∮, ∮, ∯, ∯, ∰, ∱, ∲, ∲, ∳, ∳, ∴, ∴, ∴, ∵, ∵, ∵, ∶, ∷, ∷, ∸, ∸, ∺, ∻, ∼, ∼, ∼, ∼, ∼⃒, ∽, ∽, ∽̱, ∾, ∾, ∾̳, ∿, ≀, ≀, ≀, ≁, ≁, ≂, ≂, ≂, ≂̸, ≂̸, ≃, ≃, ≃, ≄, ≄, ≄, ≅, ≅, ≆, ≇, ≇, ≈, ≈, ≈, ≈, ≈, ≈, ≉, ≉, ≉, ≊, ≊, ≋, ≋̸, ≌, ≌, ≍, ≍, ≍⃒, ≎, ≎, ≎, ≎̸, ≎̸, ≏, ≏, ≏, ≏̸, ≏̸, ≐, ≐, ≐, ≐̸, ≑, ≑, ≒, ≒, ≓, ≓, ≔, ≔, ≔, ≕, ≕, ≖, ≖, ≗, ≗, ≙, ≚, ≜, ≜, ≟, ≟, ≠, ≠, ≡, ≡, ≡⃥, ≢, ≢, ≤, ≤, ≤⃒, ≥, ≥, ≥, ≥⃒, ≦, ≦, ≦, ≦̸, ≦̸, ≧, ≧, ≧, ≧̸, ≧̸, ≧̸, ≨, ≨, ≨︀, ≨︀, ≩, ≩, ≩︀, ≩︀, ≪, ≪, ≪, ≪̸, ≪̸, ≪⃒, ≫, ≫, ≫, ≫̸, ≫̸, ≫⃒, ≬, ≬, ≭, ≮, ≮, ≮, ≯, ≯, ≯, ≰, ≰, ≰, ≱, ≱, ≱, ≲, ≲, ≲, ≳, ≳, ≳, ≴, ≴, ≵, ≵, ≶, ≶, ≶, ≷, ≷, ≷, ≸, ≸, ≹, ≹, ≺, ≺, ≺, ≻, ≻, ≻, ≼, ≼, ≼, ≽, ≽, ≽, ≾, ≾, ≾, ≿, ≿, ≿, ≿̸, ⊀, ⊀, ⊀, ⊁, ⊁, ⊁, ⊂, ⊂, ⊂⃒, ⊂⃒, ⊂⃒, ⊃, ⊃, ⊃, ⊃⃒, ⊃⃒, ⊃⃒, ⊄, ⊅, ⊆, ⊆, ⊆, ⊇, ⊇, ⊇, ⊈, ⊈, ⊈, ⊉, ⊉, ⊉, ⊊, ⊊, ⊊︀, ⊊︀, ⊋, ⊋, ⊋︀, ⊋︀, ⊍, ⊎, ⊎, ⊏, ⊏, ⊏, ⊏̸, ⊐, ⊐, ⊐, ⊐̸, ⊑, ⊑, ⊑, ⊒, ⊒, ⊒, ⊓, ⊓, ⊓︀, ⊔, ⊔, ⊔︀, ⊕, ⊕, ⊖, ⊖, ⊗, ⊗, ⊘, ⊙, ⊙, ⊚, ⊚, ⊛, ⊛, ⊝, ⊝, ⊞, ⊞, ⊟, ⊟, ⊠, ⊠, ⊡, ⊡, ⊢, ⊢, ⊣, ⊣, ⊤, ⊤, ⊥, ⊥, ⊥, ⊥, ⊧, ⊨, ⊨, ⊩, ⊪, ⊫, ⊬, ⊭, ⊮, ⊯, ⊰, ⊲, ⊲, ⊲, ⊳, ⊳, ⊳, ⊴, ⊴, ⊴, ⊴⃒, ⊵, ⊵, ⊵, ⊵⃒, ⊶, ⊷, ⊸, ⊸, ⊹, ⊺, ⊺, ⊻, ⊽, ⊾, ⊿, ⋀, ⋀, ⋀, ⋁, ⋁, ⋁, ⋂, ⋂, ⋂, ⋃, ⋃, ⋃, ⋄, ⋄, ⋄, ⋅, ⋆, ⋆, ⋇, ⋇, ⋈, ⋉, ⋊, ⋋, ⋋, ⋌, ⋌, ⋍, ⋍, ⋎, ⋎, ⋏, ⋏, ⋐, ⋐, ⋑, ⋑, ⋒, ⋓, ⋔, ⋔, ⋕, ⋖, ⋖, ⋗, ⋗, ⋘, ⋘̸, ⋙, ⋙, ⋙̸, ⋚, ⋚, ⋚, ⋚︀, ⋛, ⋛, ⋛, ⋛︀, ⋞, ⋞, ⋟, ⋟, ⋠, ⋠, ⋡, ⋡, ⋢, ⋢, ⋣, ⋣, ⋦, ⋧, ⋨, ⋨, ⋩, ⋩, ⋪, ⋪, ⋪, ⋫, ⋫, ⋫, ⋬, ⋬, ⋬, ⋭, ⋭, ⋭, ⋮, ⋯, ⋰, ⋱, ⋲, ⋳, ⋴, ⋵, ⋵̸, ⋶, ⋷, ⋹, ⋹̸, ⋺, ⋻, ⋼, ⋽, ⋾, ⌅, ⌅, ⌆, ⌆, ⌈, ⌈, ⌉, ⌉, ⌊, ⌊, ⌋, ⌋, ⌌, ⌍, ⌎, ⌏, ⌐, ⌒, ⌓, ⌕, ⌖, ⌜, ⌜, ⌝, ⌝, ⌞, ⌞, ⌟, ⌟, ⌢, ⌢, ⌣, ⌣, ⌭, ⌮, ⌶, ⌽, ⌿, ⍼, ⎰, ⎰, ⎱, ⎱, ⎴, ⎴, ⎵, ⎵, ⎶, ⏜, ⏝, ⏞, ⏟, ⏢, ⏧, ␣, Ⓢ, Ⓢ, ─, ─, │, ┌, ┐, └, ┘, ├, ┤, ┬, ┴, ┼, ═, ║, ╒, ╓, ╔, ╕, ╖, ╗, ╘, ╙, ╚, ╛, ╜, ╝, ╞, ╟, ╠, ╡, ╢, ╣, ╤, ╥, ╦, ╧, ╨, ╩, ╪, ╫, ╬, ▀, ▄, █, ░, ▒, ▓, □, □, □, ▪, ▪, ▪, ▪, ▫, ▭, ▮, ▱, △, △, ▴, ▴, ▵, ▵, ▸, ▸, ▹, ▹, ▽, ▽, ▾, ▾, ▿, ▿, ◂, ◂, ◃, ◃, ◊, ◊, ○, ◬, ◯, ◯, ◸, ◹, ◺, ◻, ◼, ★, ★, ☆, ☎, ♀, ♂, ♠, ♠, ♣, ♣, ♥, ♥, ♦, ♦, ♪, ♭, ♮, ♮, ♯, ✓, ✓, ✗, ✠, ✠, ✶, ❘, ❲, ❳, ⟈, ⟉, ⟦, ⟦, ⟧, ⟧, ⟨, ⟨, ⟨, ⟩, ⟩, ⟩, ⟪, ⟫, ⟬, ⟭, ⟵, ⟵, ⟵, ⟶, ⟶, ⟶, ⟷, ⟷, ⟷, ⟸, ⟸, ⟸, ⟹, ⟹, ⟹, ⟺, ⟺, ⟺, ⟼, ⟼, ⤂, ⤃, ⤄, ⤅, ⤌, ⤍, ⤍, ⤎, ⤏, ⤏, ⤐, ⤐, ⤑, ⤒, ⤓, ⤖, ⤙, ⤚, ⤛, ⤜, ⤝, ⤞, ⤟, ⤠, ⤣, ⤤, ⤥, ⤥, ⤦, ⤦, ⤧, ⤨, ⤨, ⤩, ⤩, ⤪, ⤳, ⤳̸, ⤵, ⤶, ⤷, ⤸, ⤹, ⤼, ⤽, ⥅, ⥈, ⥉, ⥊, ⥋, ⥎, ⥏, ⥐, ⥑, ⥒, ⥓, ⥔, ⥕, ⥖, ⥗, ⥘, ⥙, ⥚, ⥛, ⥜, ⥝, ⥞, ⥟, ⥠, ⥡, ⥢, ⥣, ⥤, ⥥, ⥦, ⥧, ⥨, ⥩, ⥪, ⥫, ⥬, ⥭, ⥮, ⥮, ⥯, ⥯, ⥰, ⥱, ⥲, ⥳, ⥴, ⥵, ⥶, ⥸, ⥹, ⥻, ⥼, ⥽, ⥾, ⦅, ⦆, ⦋, ⦌, ⦍, ⦎, ⦏, ⦐, ⦑, ⦒, ⦓, ⦔, ⦕, ⦖, ⦚, ⦜, ⦝, ⦤, ⦥, ⦦, ⦧, ⦨, ⦩, ⦪, ⦫, ⦬, ⦭, ⦮, ⦯, ⦰, ⦱, ⦲, ⦳, ⦴, ⦵, ⦶, ⦷, ⦹, ⦻, ⦼, ⦾, ⦿, ⧀, ⧁, ⧂, ⧃, ⧄, ⧅, ⧉, ⧍, ⧎, ⧏, ⧏̸, ⧐, ⧐̸, ⧜, ⧝, ⧞, ⧣, ⧤, ⧥, ⧫, ⧫, ⧴, ⧶, ⨀, ⨀, ⨁, ⨁, ⨂, ⨂, ⨄, ⨄, ⨆, ⨆, ⨌, ⨌, ⨍, ⨐, ⨑, ⨒, ⨓, ⨔, ⨕, ⨖, ⨗, ⨢, ⨣, ⨤, ⨥, ⨦, ⨧, ⨩, ⨪, ⨭, ⨮, ⨯, ⨰, ⨱, ⨳, ⨴, ⨵, ⨶, ⨷, ⨸, ⨹, ⨺, ⨻, ⨼, ⨼, ⨿, ⩀, ⩂, ⩃, ⩄, ⩅, ⩆, ⩇, ⩈, ⩉, ⩊, ⩋, ⩌, ⩍, ⩐, ⩓, ⩔, ⩕, ⩖, ⩗, ⩘, ⩚, ⩛, ⩜, ⩝, ⩟, ⩦, ⩪, ⩭, ⩭̸, ⩮, ⩯, ⩰, ⩰̸, ⩱, ⩲, ⩳, ⩴, ⩵, ⩷, ⩷, ⩸, ⩹, ⩺, ⩻, ⩼, ⩽, ⩽, ⩽, ⩽̸, ⩽̸, ⩽̸, ⩾, ⩾, ⩾, ⩾̸, ⩾̸, ⩾̸, ⩿, ⪀, ⪁, ⪂, ⪃, ⪄, ⪅, ⪅, ⪆, ⪆, ⪇, ⪇, ⪈, ⪈, ⪉, ⪉, ⪊, ⪊, ⪋, ⪋, ⪌, ⪌, ⪍, ⪎, ⪏, ⪐, ⪑, ⪒, ⪓, ⪔, ⪕, ⪕, ⪖, ⪖, ⪗, ⪘, ⪙, ⪚, ⪝, ⪞, ⪟, ⪠, ⪡, ⪡̸, ⪢, ⪢̸, ⪤, ⪥, ⪦, ⪧, ⪨, ⪩, ⪪, ⪫, ⪬, ⪬︀, ⪭, ⪭︀, ⪮, ⪯, ⪯, ⪯, ⪯̸, ⪯̸, ⪯̸, ⪰, ⪰, ⪰, ⪰̸, ⪰̸, ⪰̸, ⪳, ⪴, ⪵, ⪵, ⪶, ⪶, ⪷, ⪷, ⪸, ⪸, ⪹, ⪹, ⪺, ⪺, ⪻, ⪼, ⪽, ⪾, ⪿, ⫀, ⫁, ⫂, ⫃, ⫄, ⫅, ⫅, ⫅̸, ⫅̸, ⫆, ⫆, ⫆̸, ⫆̸, ⫇, ⫈, ⫋, ⫋, ⫋︀, ⫋︀, ⫌, ⫌, ⫌︀, ⫌︀, ⫏, ⫐, ⫑, ⫒, ⫓, ⫔, ⫕, ⫖, ⫗, ⫘, ⫙, ⫚, ⫛, ⫤, ⫤, ⫦, ⫧, ⫨, ⫩, ⫫, ⫬, ⫭, ⫮, ⫯, ⫰, ⫱, ⫲, ⫳, ⫽, ⫽⃥, ﬀ, ﬁ, ﬂ, ﬃ, ﬄ, ⥿, ⟿, ⇿, ÿ"

The third result using this 'lowerthan' function fixes the broken implementation by default of '<' operator on strings (by implementing a binary comparison of bytes). I correctly get the unsigned 8-bit sort order (where UTF-8 encoded strings used in Wikimedia wikis are sorted equivalently to their numeric codepoints, so it starts by the tabulation and the newline)

"	, \
, !, \", \", #, $, %, &, &, ', (, ), *, *, +, ,, ., /, :, ;, <, <, <⃒, =, =⃥, >, >, >⃒, ?, @, [, [, \\, ], ], ^, _, _, `, `, fj, {, {, |, |, |, }, },  ,  , ¡, ¢, £, ¤, ¥, ¦, §, ¨, ¨, ¨, ¨, ©, ©, ª, «, ¬, , ®, ®, ®, ¯, ¯, °, ±, ±, ±, ², ³, ´, ´, µ, ¶, ·, ·, ·, ¸, ¸, ¹, º, », ¼, ½, ½, ¾, ¿, À, Á, Â, Ã, Ä, Å, Å, Æ, Ç, È, É, Ê, Ë, Ì, Í, Î, Ï, Ð, Ñ, Ò, Ó, Ô, Õ, Ö, ×, Ø, Ù, Ú, Û, Ü, Ý, Þ, ß, à, á, â, ã, ä, å, æ, ç, è, é, ê, ë, ì, í, î, ï, ð, ñ, ò, ó, ô, õ, ö, ÷, ÷, ø, ù, ú, û, ü, ý, þ, ÿ, Ā, ā, Ă, ă, Ą, ą, Ć, ć, Ĉ, ĉ, Ċ, ċ, Č, č, Ď, ď, Đ, đ, Ē, ē, Ė, ė, Ę, ę, Ě, ě, Ĝ, ĝ, Ğ, ğ, Ġ, ġ, Ģ, Ĥ, ĥ, Ħ, ħ, Ĩ, ĩ, Ī, ī, Į, į, İ, ı, ı, Ĳ, ĳ, Ĵ, ĵ, Ķ, ķ, ĸ, Ĺ, ĺ, Ļ, ļ, Ľ, ľ, Ŀ, ŀ, Ł, ł, Ń, ń, Ņ, ņ, Ň, ň, ŉ, Ŋ, ŋ, Ō, ō, Ő, ő, Œ, œ, Ŕ, ŕ, Ŗ, ŗ, Ř, ř, Ś, ś, Ŝ, ŝ, Ş, ş, Š, š, Ţ, ţ, Ť, ť, Ŧ, ŧ, Ũ, ũ, Ū, ū, Ŭ, ŭ, Ů, ů, Ű, ű, Ų, ų, Ŵ, ŵ, Ŷ, ŷ, Ÿ, Ź, ź, Ż, ż, Ž, ž, ƒ, Ƶ, ǵ, ȷ, ˆ, ˇ, ˇ, ˘, ˘, ˙, ˙, ˚, ˛, ˜, ˜, ˝, ˝, ̑, Ά, Έ, Ή, Ί, Ό, Ύ, Ώ, ΐ, Α, Α, Β, Β, Γ, Γ, Δ, Δ, Ε, Ε, Ζ, Ζ, Η, Η, Θ, Θ, Ι, Ι, Κ, Κ, Λ, Λ, Μ, Μ, Ν, Ν, Ξ, Ξ, Ο, Ο, Π, Π, Ρ, Ρ, Σ, Σ, Τ, Τ, Υ, Υ, Φ, Φ, Χ, Χ, Ψ, Ψ, Ω, Ω, Ω, Ϊ, Ϋ, ά, έ, ή, ί, ΰ, α, α, β, β, γ, γ, δ, δ, ε, ε, ε, ζ, ζ, η, η, θ, θ, ι, ι, κ, κ, λ, λ, μ, μ, ν, ν, ξ, ξ, ο, ο, π, π, ρ, ρ, ς, ς, ς, ς, σ, σ, τ, τ, υ, υ, υ, φ, φ, χ, χ, ψ, ψ, ω, ω, ϊ, ϋ, ό, ύ, ώ, ϑ, ϑ, ϑ, ϒ, ϒ, ϕ, ϕ, ϕ, ϖ, ϖ, Ϝ, ϝ, ϝ, ϰ, ϰ, ϱ, ϱ, ϵ, ϵ, ϵ, ϶, ϶, Ё, Ђ, Ѓ, Є, Ѕ, І, Ї, Ј, Љ, Њ, Ћ, Ќ, Ў, Џ, А, Б, В, Г, Д, Е, Ж, З, И, Й, К, Л, М, Н, О, П, Р, С, Т, У, Ф, Х, Ц, Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я, а, б, в, г, д, е, ж, з, и, й, к, л, м, н, о, п, р, с, т, у, ф, х, ц, ч, ш, щ, ъ, ы, ь, э, ю, я, ё, ђ, ѓ, є, ѕ, і, ї, ј, љ, њ, ћ, ќ, ў, џ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  , , , , , , ‌, ‍, ‎, ‏, ‐, ‐, –, —, ―, ‖, ‖, ‘, ‘, ’, ’, ’, ‚, ‚, “, “, ”, ”, ”, „, „, †, ‡, ‡, •, •, ‥, …, …, ‰, ‱, ′, ″, ‴, ‵, ‵, ‹, ›, ‾, ‾, ⁁, ⁃, ⁄, ⁏, ⁗,  ,   , ⁠, ⁡, ⁡, ⁢, ⁢, ⁣, ⁣, €, ⃛, ⃛, ⃜, ℂ, ℂ, ℅, ℊ, ℋ, ℋ, ℋ, ℌ, ℌ, ℍ, ℍ, ℎ, ℏ, ℏ, ℏ, ℏ, ℐ, ℐ, ℑ, ℑ, ℑ, ℑ, ℒ, ℒ, ℒ, ℓ, ℕ, ℕ, №, ℗, ℘, ℘, ℙ, ℙ, ℚ, ℚ, ℛ, ℛ, ℜ, ℜ, ℜ, ℜ, ℝ, ℝ, ℞, ™, ™, ℤ, ℤ, ℧, ℨ, ℨ, ℩, ℬ, ℬ, ℬ, ℭ, ℭ, ℯ, ℰ, ℰ, ℱ, ℱ, ℳ, ℳ, ℳ, ℴ, ℴ, ℴ, ℵ, ℵ, ℶ, ℷ, ℸ, ⅅ, ⅅ, ⅆ, ⅆ, ⅇ, ⅇ, ⅇ, ⅈ, ⅈ, ⅓, ⅔, ⅕, ⅖, ⅗, ⅘, ⅙, ⅚, ⅛, ⅜, ⅝, ⅞, ←, ←, ←, ←, ←, ↑, ↑, ↑, ↑, →, →, →, →, →, ↓, ↓, ↓, ↓, ↔, ↔, ↔, ↕, ↕, ↕, ↖, ↖, ↖, ↗, ↗, ↗, ↘, ↘, ↘, ↙, ↙, ↙, ↚, ↚, ↛, ↛, ↝, ↝, ↝̸, ↞, ↞, ↟, ↠, ↠, ↡, ↢, ↢, ↣, ↣, ↤, ↤, ↥, ↥, ↦, ↦, ↦, ↧, ↧, ↩, ↩, ↪, ↪, ↫, ↫, ↬, ↬, ↭, ↭, ↮, ↮, ↰, ↰, ↱, ↱, ↲, ↳, ↵, ↶, ↶, ↷, ↷, ↺, ↺, ↻, ↻, ↼, ↼, ↼, ↽, ↽, ↽, ↾, ↾, ↾, ↿, ↿, ↿, ⇀, ⇀, ⇀, ⇁, ⇁, ⇁, ⇂, ⇂, ⇂, ⇃, ⇃, ⇃, ⇄, ⇄, ⇄, ⇅, ⇅, ⇆, ⇆, ⇆, ⇇, ⇇, ⇈, ⇈, ⇉, ⇉, ⇊, ⇊, ⇋, ⇋, ⇋, ⇌, ⇌, ⇌, ⇍, ⇍, ⇎, ⇎, ⇏, ⇏, ⇐, ⇐, ⇐, ⇑, ⇑, ⇑, ⇒, ⇒, ⇒, ⇒, ⇓, ⇓, ⇓, ⇔, ⇔, ⇔, ⇔, ⇕, ⇕, ⇕, ⇖, ⇗, ⇘, ⇙, ⇚, ⇚, ⇛, ⇛, ⇝, ⇤, ⇤, ⇥, ⇥, ⇵, ⇵, ⇽, ⇾, ⇿, ∀, ∀, ∁, ∁, ∂, ∂, ∂̸, ∃, ∃, ∄, ∄, ∄, ∅, ∅, ∅, ∅, ∇, ∇, ∈, ∈, ∈, ∈, ∉, ∉, ∉, ∋, ∋, ∋, ∋, ∌, ∌, ∌, ∏, ∏, ∐, ∐, ∑, ∑, −, ∓, ∓, ∓, ∔, ∔, ∖, ∖, ∖, ∖, ∖, ∗, ∘, ∘, √, √, ∝, ∝, ∝, ∝, ∝, ∞, ∟, ∠, ∠, ∠⃒, ∡, ∡, ∢, ∣, ∣, ∣, ∣, ∤, ∤, ∤, ∤, ∥, ∥, ∥, ∥, ∥, ∦, ∦, ∦, ∦, ∦, ∧, ∧, ∨, ∨, ∩, ∩︀, ∪, ∪︀, ∫, ∫, ∬, ∭, ∭, ∮, ∮, ∮, ∯, ∯, ∰, ∱, ∲, ∲, ∳, ∳, ∴, ∴, ∴, ∵, ∵, ∵, ∶, ∷, ∷, ∸, ∸, ∺, ∻, ∼, ∼, ∼, ∼, ∼⃒, ∽, ∽, ∽̱, ∾, ∾, ∾̳, ∿, ≀, ≀, ≀, ≁, ≁, ≂, ≂, ≂, ≂̸, ≂̸, ≃, ≃, ≃, ≄, ≄, ≄, ≅, ≅, ≆, ≇, ≇, ≈, ≈, ≈, ≈, ≈, ≈, ≉, ≉, ≉, ≊, ≊, ≋, ≋̸, ≌, ≌, ≍, ≍, ≍⃒, ≎, ≎, ≎, ≎̸, ≎̸, ≏, ≏, ≏, ≏̸, ≏̸, ≐, ≐, ≐, ≐̸, ≑, ≑, ≒, ≒, ≓, ≓, ≔, ≔, ≔, ≕, ≕, ≖, ≖, ≗, ≗, ≙, ≚, ≜, ≜, ≟, ≟, ≠, ≠, ≡, ≡, ≡⃥, ≢, ≢, ≤, ≤, ≤⃒, ≥, ≥, ≥, ≥⃒, ≦, ≦, ≦, ≦̸, ≦̸, ≧, ≧, ≧, ≧̸, ≧̸, ≧̸, ≨, ≨, ≨︀, ≨︀, ≩, ≩, ≩︀, ≩︀, ≪, ≪, ≪, ≪̸, ≪̸, ≪⃒, ≫, ≫, ≫, ≫̸, ≫̸, ≫⃒, ≬, ≬, ≭, ≮, ≮, ≮, ≯, ≯, ≯, ≰, ≰, ≰, ≱, ≱, ≱, ≲, ≲, ≲, ≳, ≳, ≳, ≴, ≴, ≵, ≵, ≶, ≶, ≶, ≷, ≷, ≷, ≸, ≸, ≹, ≹, ≺, ≺, ≺, ≻, ≻, ≻, ≼, ≼, ≼, ≽, ≽, ≽, ≾, ≾, ≾, ≿, ≿, ≿, ≿̸, ⊀, ⊀, ⊀, ⊁, ⊁, ⊁, ⊂, ⊂, ⊂⃒, ⊂⃒, ⊂⃒, ⊃, ⊃, ⊃, ⊃⃒, ⊃⃒, ⊃⃒, ⊄, ⊅, ⊆, ⊆, ⊆, ⊇, ⊇, ⊇, ⊈, ⊈, ⊈, ⊉, ⊉, ⊉, ⊊, ⊊, ⊊︀, ⊊︀, ⊋, ⊋, ⊋︀, ⊋︀, ⊍, ⊎, ⊎, ⊏, ⊏, ⊏, ⊏̸, ⊐, ⊐, ⊐, ⊐̸, ⊑, ⊑, ⊑, ⊒, ⊒, ⊒, ⊓, ⊓, ⊓︀, ⊔, ⊔, ⊔︀, ⊕, ⊕, ⊖, ⊖, ⊗, ⊗, ⊘, ⊙, ⊙, ⊚, ⊚, ⊛, ⊛, ⊝, ⊝, ⊞, ⊞, ⊟, ⊟, ⊠, ⊠, ⊡, ⊡, ⊢, ⊢, ⊣, ⊣, ⊤, ⊤, ⊥, ⊥, ⊥, ⊥, ⊧, ⊨, ⊨, ⊩, ⊪, ⊫, ⊬, ⊭, ⊮, ⊯, ⊰, ⊲, ⊲, ⊲, ⊳, ⊳, ⊳, ⊴, ⊴, ⊴, ⊴⃒, ⊵, ⊵, ⊵, ⊵⃒, ⊶, ⊷, ⊸, ⊸, ⊹, ⊺, ⊺, ⊻, ⊽, ⊾, ⊿, ⋀, ⋀, ⋀, ⋁, ⋁, ⋁, ⋂, ⋂, ⋂, ⋃, ⋃, ⋃, ⋄, ⋄, ⋄, ⋅, ⋆, ⋆, ⋇, ⋇, ⋈, ⋉, ⋊, ⋋, ⋋, ⋌, ⋌, ⋍, ⋍, ⋎, ⋎, ⋏, ⋏, ⋐, ⋐, ⋑, ⋑, ⋒, ⋓, ⋔, ⋔, ⋕, ⋖, ⋖, ⋗, ⋗, ⋘, ⋘̸, ⋙, ⋙, ⋙̸, ⋚, ⋚, ⋚, ⋚︀, ⋛, ⋛, ⋛, ⋛︀, ⋞, ⋞, ⋟, ⋟, ⋠, ⋠, ⋡, ⋡, ⋢, ⋢, ⋣, ⋣, ⋦, ⋧, ⋨, ⋨, ⋩, ⋩, ⋪, ⋪, ⋪, ⋫, ⋫, ⋫, ⋬, ⋬, ⋬, ⋭, ⋭, ⋭, ⋮, ⋯, ⋰, ⋱, ⋲, ⋳, ⋴, ⋵, ⋵̸, ⋶, ⋷, ⋹, ⋹̸, ⋺, ⋻, ⋼, ⋽, ⋾, ⌅, ⌅, ⌆, ⌆, ⌈, ⌈, ⌉, ⌉, ⌊, ⌊, ⌋, ⌋, ⌌, ⌍, ⌎, ⌏, ⌐, ⌒, ⌓, ⌕, ⌖, ⌜, ⌜, ⌝, ⌝, ⌞, ⌞, ⌟, ⌟, ⌢, ⌢, ⌣, ⌣, ⌭, ⌮, ⌶, ⌽, ⌿, ⍼, ⎰, ⎰, ⎱, ⎱, ⎴, ⎴, ⎵, ⎵, ⎶, ⏜, ⏝, ⏞, ⏟, ⏢, ⏧, ␣, Ⓢ, Ⓢ, ─, ─, │, ┌, ┐, └, ┘, ├, ┤, ┬, ┴, ┼, ═, ║, ╒, ╓, ╔, ╕, ╖, ╗, ╘, ╙, ╚, ╛, ╜, ╝, ╞, ╟, ╠, ╡, ╢, ╣, ╤, ╥, ╦, ╧, ╨, ╩, ╪, ╫, ╬, ▀, ▄, █, ░, ▒, ▓, □, □, □, ▪, ▪, ▪, ▪, ▫, ▭, ▮, ▱, △, △, ▴, ▴, ▵, ▵, ▸, ▸, ▹, ▹, ▽, ▽, ▾, ▾, ▿, ▿, ◂, ◂, ◃, ◃, ◊, ◊, ○, ◬, ◯, ◯, ◸, ◹, ◺, ◻, ◼, ★, ★, ☆, ☎, ♀, ♂, ♠, ♠, ♣, ♣, ♥, ♥, ♦, ♦, ♪, ♭, ♮, ♮, ♯, ✓, ✓, ✗, ✠, ✠, ✶, ❘, ❲, ❳, ⟈, ⟉, ⟦, ⟦, ⟧, ⟧, ⟨, ⟨, ⟨, ⟩, ⟩, ⟩, ⟪, ⟫, ⟬, ⟭, ⟵, ⟵, ⟵, ⟶, ⟶, ⟶, ⟷, ⟷, ⟷, ⟸, ⟸, ⟸, ⟹, ⟹, ⟹, ⟺, ⟺, ⟺, ⟼, ⟼, ⟿, ⤂, ⤃, ⤄, ⤅, ⤌, ⤍, ⤍, ⤎, ⤏, ⤏, ⤐, ⤐, ⤑, ⤒, ⤓, ⤖, ⤙, ⤚, ⤛, ⤜, ⤝, ⤞, ⤟, ⤠, ⤣, ⤤, ⤥, ⤥, ⤦, ⤦, ⤧, ⤨, ⤨, ⤩, ⤩, ⤪, ⤳, ⤳̸, ⤵, ⤶, ⤷, ⤸, ⤹, ⤼, ⤽, ⥅, ⥈, ⥉, ⥊, ⥋, ⥎, ⥏, ⥐, ⥑, ⥒, ⥓, ⥔, ⥕, ⥖, ⥗, ⥘, ⥙, ⥚, ⥛, ⥜, ⥝, ⥞, ⥟, ⥠, ⥡, ⥢, ⥣, ⥤, ⥥, ⥦, ⥧, ⥨, ⥩, ⥪, ⥫, ⥬, ⥭, ⥮, ⥮, ⥯, ⥯, ⥰, ⥱, ⥲, ⥳, ⥴, ⥵, ⥶, ⥸, ⥹, ⥻, ⥼, ⥽, ⥾, ⥿, ⦅, ⦆, ⦋, ⦌, ⦍, ⦎, ⦏, ⦐, ⦑, ⦒, ⦓, ⦔, ⦕, ⦖, ⦚, ⦜, ⦝, ⦤, ⦥, ⦦, ⦧, ⦨, ⦩, ⦪, ⦫, ⦬, ⦭, ⦮, ⦯, ⦰, ⦱, ⦲, ⦳, ⦴, ⦵, ⦶, ⦷, ⦹, ⦻, ⦼, ⦾, ⦿, ⧀, ⧁, ⧂, ⧃, ⧄, ⧅, ⧉, ⧍, ⧎, ⧏, ⧏̸, ⧐, ⧐̸, ⧜, ⧝, ⧞, ⧣, ⧤, ⧥, ⧫, ⧫, ⧴, ⧶, ⨀, ⨀, ⨁, ⨁, ⨂, ⨂, ⨄, ⨄, ⨆, ⨆, ⨌, ⨌, ⨍, ⨐, ⨑, ⨒, ⨓, ⨔, ⨕, ⨖, ⨗, ⨢, ⨣, ⨤, ⨥, ⨦, ⨧, ⨩, ⨪, ⨭, ⨮, ⨯, ⨰, ⨱, ⨳, ⨴, ⨵, ⨶, ⨷, ⨸, ⨹, ⨺, ⨻, ⨼, ⨼, ⨿, ⩀, ⩂, ⩃, ⩄, ⩅, ⩆, ⩇, ⩈, ⩉, ⩊, ⩋, ⩌, ⩍, ⩐, ⩓, ⩔, ⩕, ⩖, ⩗, ⩘, ⩚, ⩛, ⩜, ⩝, ⩟, ⩦, ⩪, ⩭, ⩭̸, ⩮, ⩯, ⩰, ⩰̸, ⩱, ⩲, ⩳, ⩴, ⩵, ⩷, ⩷, ⩸, ⩹, ⩺, ⩻, ⩼, ⩽, ⩽, ⩽, ⩽̸, ⩽̸, ⩽̸, ⩾, ⩾, ⩾, ⩾̸, ⩾̸, ⩾̸, ⩿, ⪀, ⪁, ⪂, ⪃, ⪄, ⪅, ⪅, ⪆, ⪆, ⪇, ⪇, ⪈, ⪈, ⪉, ⪉, ⪊, ⪊, ⪋, ⪋, ⪌, ⪌, ⪍, ⪎, ⪏, ⪐, ⪑, ⪒, ⪓, ⪔, ⪕, ⪕, ⪖, ⪖, ⪗, ⪘, ⪙, ⪚, ⪝, ⪞, ⪟, ⪠, ⪡, ⪡̸, ⪢, ⪢̸, ⪤, ⪥, ⪦, ⪧, ⪨, ⪩, ⪪, ⪫, ⪬, ⪬︀, ⪭, ⪭︀, ⪮, ⪯, ⪯, ⪯, ⪯̸, ⪯̸, ⪯̸, ⪰, ⪰, ⪰, ⪰̸, ⪰̸, ⪰̸, ⪳, ⪴, ⪵, ⪵, ⪶, ⪶, ⪷, ⪷, ⪸, ⪸, ⪹, ⪹, ⪺, ⪺, ⪻, ⪼, ⪽, ⪾, ⪿, ⫀, ⫁, ⫂, ⫃, ⫄, ⫅, ⫅, ⫅̸, ⫅̸, ⫆, ⫆, ⫆̸, ⫆̸, ⫇, ⫈, ⫋, ⫋, ⫋︀, ⫋︀, ⫌, ⫌, ⫌︀, ⫌︀, ⫏, ⫐, ⫑, ⫒, ⫓, ⫔, ⫕, ⫖, ⫗, ⫘, ⫙, ⫚, ⫛, ⫤, ⫤, ⫦, ⫧, ⫨, ⫩, ⫫, ⫬, ⫭, ⫮, ⫯, ⫰, ⫱, ⫲, ⫳, ⫽, ⫽⃥, ﬀ, ﬁ, ﬂ, ﬃ, ﬄ, 𝒜, 𝒞, 𝒟, 𝒢, 𝒥, 𝒦, 𝒩, 𝒪, 𝒫, 𝒬, 𝒮, 𝒯, 𝒰, 𝒱, 𝒲, 𝒳, 𝒴, 𝒵, 𝒶, 𝒷, 𝒸, 𝒹, 𝒻, 𝒽, 𝒾, 𝒿, 𝓀, 𝓁, 𝓂, 𝓃, 𝓅, 𝓆, 𝓇, 𝓈, 𝓉, 𝓊, 𝓋, 𝓌, 𝓍, 𝓎, 𝓏, 𝔄, 𝔅, 𝔇, 𝔈, 𝔉, 𝔊, 𝔍, 𝔎, 𝔏, 𝔐, 𝔑, 𝔒, 𝔓, 𝔔, 𝔖, 𝔗, 𝔘, 𝔙, 𝔚, 𝔛, 𝔜, 𝔞, 𝔟, 𝔠, 𝔡, 𝔢, 𝔣, 𝔤, 𝔥, 𝔦, 𝔧, 𝔨, 𝔩, 𝔪, 𝔫, 𝔬, 𝔭, 𝔮, 𝔯, 𝔰, 𝔱, 𝔲, 𝔳, 𝔴, 𝔵, 𝔶, 𝔷, 𝔸, 𝔹, 𝔻, 𝔼, 𝔽, 𝔾, 𝕀, 𝕁, 𝕂, 𝕃, 𝕄, 𝕆, 𝕊, 𝕋, 𝕌, 𝕍, 𝕎, 𝕏, 𝕐, 𝕒, 𝕓, 𝕔, 𝕕, 𝕖, 𝕗, 𝕘, 𝕙, 𝕚, 𝕛, 𝕜, 𝕝, 𝕞, 𝕟, 𝕠, 𝕡, 𝕢, 𝕣, 𝕤, 𝕥, 𝕦, 𝕧, 𝕨, 𝕩, 𝕪, 𝕫, 𝚪, 𝚫, 𝚯, 𝚲, 𝚵, 𝚷, 𝚺, 𝚼, 𝚽, 𝚿, 𝛀, 𝛂, 𝛃, 𝛄, 𝛅, 𝛆, 𝛇, 𝛈, 𝛉, 𝛊, 𝛋, 𝛌, 𝛍, 𝛎, 𝛏, 𝛑, 𝛒, 𝛓, 𝛔, 𝛕, 𝛖, 𝛗, 𝛘, 𝛙, 𝛚, 𝛜, 𝛝, 𝛞, 𝛟, 𝛠, 𝛡, 𝟊, 𝟋"

(Note that there are duplicate strings, this is normal in this simplified test: the table shows the backward index of HTML5 entities and these mapped strings come from several equivalent named entities defined, not shown here; I could have removed duplicates before sorting; but this does not matter here and is still a useful test, showing that table.sort() is not the cause of the problem, as it correctly sorts table even with duplicates, but the only cause is the incorrect implementation of the builtin '<' operator on vanilla Lua strings).

Unfortunately, it is impossible to fix it in the Lua string library using a metafunction for the builtin operator '<', because that metafunction is never called when both its parameters are vanilla strings. The only possible fix is to use the equivalent in C, by setting the correct locale for strcoll(), or recompile Lua with a macro so that strcoll() is replaced by strcmp() on (unsigned char*) strings.

Related old bugs of strcoll() in glibc (prior to version 2.35 that may eventually fix it: the "LC_COLLATE=C.UTF-8", or any collation for UTF-8-based locales has never worked correctly before start of 2022, only the "C" locale was coherent):

See also this old CVE-tracked bug, since 2012 (in summary: DON'T USE strcoll() WITH GLIBC! Provide a safe and stable alternative, like what Oracle provides in its database engines with its "binary8" locale, or use strcmp() instead of strcoll(); the current patch in Glibc fixes it partly for the new builtin "'C.UTF-8" locale and only this one, but it is unstable, as it is not possible to create a stable collation ni UTF-8, as Unicode and CLDR versions are evolving and we don't know on which versions of Unicode and CLDR/DUCET it is based; it may not even match what is used in the wiki dabatase engines; and neither Lua nor Scribunto can provide a control on locales and their collation for the builtin '<' Lua operator on strings):

https://bst.cisco.com/quickview/bug/CSCuc22096 [D9902] GNU glibc strcoll() Function Remote Buffer Overflow
https://sourceware.org/bugzilla/show_bug.cgi?id=14547 Bug 14547 (CVE-2012-4412) - strcoll integer / buffer overflow (CVE-2012-4412, CVE-2012-4424)

If we ever need a locale-sensitive collation, we should not use the Glibc, but ICU, and not for the default implementation of the Lua '<' operator on strings (which should be locale-neutral and stable in Scribunto for use on all wiki servers), but only as a separate function (possibly integrated in the "mw.ustring" or "mw.text" modules specifically tuned for full and stable support of UTF-8).

Lepticed7 unsubscribed.Sep 14 2023, 4:47 PM

mirror-kt subscribed.Dec 28 2023, 12:12 AM

onei unsubscribed.Feb 11 2024, 8:11 PM

Expose method in Lua/Scribunto to fetch page titles from the prefix indexOpen, HighPublicFeatureActions

Description

Details

Related ObjectsSearch...

Event Timeline

Expose method in Lua/Scribunto to fetch page titles from the prefix index
Open, HighPublicFeature
Actions

Related Objects
Search...