Page MenuHomePhabricator

Introduce Special:Suffixindex
Open, LowPublicFeature

Description

Author: eep

Description:
Is this possible? Some requests/discussions about it (and the problems caused by NOT having it):

http://meta.wikimedia.org/wiki/MediaWiki_Ideal#Suffixindex
http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_%28disambiguation_pages%29/Archive_29#All_pages_beginning_with...

Of course, wildcard search (such as implemented in http://www.mediawiki.org/wiki/Extension:Wildcard_search ) would probably work too, but it would be nice if MediaWiki had that functionality by default.


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Wikipedia_talk:Special:PrefixIndex#SuffixIndex

Details

Reference
bz10808

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:52 PM
bzimport set Reference to bz10808.
bzimport added a subscriber: Unknown Object (MLST).

No DB support for doing this efficiently.

eep wrote:

Well, how about changing the DB then? :)

eep wrote:

Using Dynamic Page List (http://www.mediawiki.org/wiki/Extension:DynamicPageList ), a simple single-line query can do this (and MUCH more): {{#dpl:namespace=|titlematch=%{{PAGENAME}}}}

Why can't MediaWiki do this by default? Doesn't seem like it would be that hard to do...

Because it's very expensive, requiring a scan of up to several million page entries. It's only efficient on a very small wiki.

eep wrote:

I just don't get how a prefix index can somehow be magically OK yet a suffix index is soo expensive? Limit the results to x per query or something if it's THAT expensive. There are most definitely (and certainly) ways of limiting database queries so they're not so "expensive"...

A prefix index is just an index. Because of the way indexes work in the database, doing an exact-prefix match comes "for free".

We didn't have to do *anything* to get the prefix index. It was already there.

Limiting results would not have any useful effect -- to be efficient, you have to limit the amount of data you search through. And that requires having a sensible index for it.

To do a _suffix_ index, we'd have to:

  1. Create a second indexed field with a *reversed* copy of the content
  1. Populate that field throughout the entire database -- an expensive one-time setup operation which will require a few hours' work, careful coordination, and some downtime for the site.
  1. Keep that field around forever.

While it's possible to do it, there's not a lot of interest in it and it's a minor feature at best for a big change, so you'll probably not see people tripping over themselves to implement it right away.

eep wrote:

On the way to a proper and decent search engine then?

robchur wrote:

*** Bug 11395 has been marked as a duplicate of this bug. ***

Re-opening this. There are legitimate use cases for this (like Wiktionary, cf. bug 17999). Nothing's really holding this up except a developer willing to implement it (i.e., no reason to leave this as LATER).

public.wiki wrote:

*** Bug 21864 has been marked as a duplicate of this bug. ***

public.wiki wrote:

*** Bug 21864 has been marked as a duplicate of this bug. ***

From an outsider's point of view it looks like an oversight to only have [[Special:PrefixIndex]].

On it you could put mention the reason for no complimentary [[Special:SuffixIndex]] there for people to see...

Tons of times when I wanted to use it...

"Let's find all the "... Agency"s on my wiki", etc.

bugs wrote:

(In reply to comment #12)

"Let's find all the "... Agency"s on my wiki", etc.

...or "pages in Chinese (PAGE/zh)". :-)

I wonder if the page_props table could be used for this. The current schema has the following unique index:

CREATE UNIQUE INDEX /*i*/pp_page_propname ON /*_*/page_props (pp_page,pp_propname);

If a second index were added on (pp_propname, pp_value), you could store the reversed page title with a "suffixindex" key fairly easily, couldn't you? Except that pp_value is a blob... bleh.

Because someone (like me) could look for this feature here, I leave here a link to a Toolserver tool that can be used to get such titles (thank you MZMcBride), for instance http://toolserver.org/~mzmcbride/yanker/?db=enwiktionary_p&list=pages&category=&page=&pattern=.%2Bps%24&namespace=0&limit=5000&sort=&wrapper=%241

So say we want a list of All things ending with "ed" - would you want that list in alphabetical order? (If the answer is yes this would be really difficult to do, if no then less difficult). Since if you want it in alphabetical order we'd basically need an index sorting it in both directions since first we jump to everything ending in "ed", and then we need to sort from the other end (normal direction) of the word. Otherwise in our suffix search (for say the suffix "ed") we'd have sorting like "yielded, zigzagged, recovered, edited, amortized" (Note how the third last letter of each word is alphabetically ordered)

I suppose one could create a table that for each page name had a partially reversed page name, and to what point its reversed, for all possible points-

For example for the page Fred (with page_id 1) we'd have:

page_id | page_reversed_point | page_reversed
--------+---------------------+--------------
1 | 1 | dFre
1 | 2 | edFr
1 | 3 | redF
1 | 4 | Fred
--------+---------------------+--------------

So each page would have up to 255 entries depending on how long the title is. (because the user could search for a suffix between 1 to 255 letters long). However, having such a table would be INSANE ;)

Thus unless SuffixIndex would be useful with results in non-alphabetical order (which seems unlikely) this should be wontfixed as unfeasible imo.

(In reply to comment #16)

Thus unless SuffixIndex would be useful with results in non-alphabetical order
(which seems unlikely) this should be wontfixed as unfeasible imo.

Alphabetical order doesn't seem necessary at all to me.

(In reply to comment #16)

Thus unless SuffixIndex would be useful with results in non-alphabetical order
(which seems unlikely) this should be wontfixed as unfeasible imo.

I don't think it's that infeasible to add a page.page_title_reversed column and index it (possibly in a page_title_reversed,page_title pair).

(In reply to comment #18)

(In reply to comment #16)

Thus unless SuffixIndex would be useful with results in non-alphabetical order
(which seems unlikely) this should be wontfixed as unfeasible imo.

I don't think it's that infeasible to add a page.page_title_reversed column and
index it (possibly in a page_title_reversed,page_title pair).

My point was that would result in a non-alphabeticaly ordered suffix list. If that's ok then it is potentially feasible to do, if a non-alphabetical list is not acceptable, it would not be possible to reasonably do this (As far as I can tell).

Just add an SQL "ORDER BY" clause at the end.

(In reply to comment #20)

Just add an SQL "ORDER BY" clause at the end.

And it will be sorted by reverse title. Not quite useful.

*** Bug 2168 has been marked as a duplicate of this bug. ***

There is at least some DB support for doing this efficiently.

In PostgreSQL, since 9.1, there is a built-in function for reversing strings: http://www.postgresql.org/docs/9.2/static/functions-string.html

In 9.0 and earlier, the function can be built in PL/pgSQL (search for it :)

It is also possible to build indexes on expressions at least since 7.4: http://www.postgresql.org/docs/7.4/static/indexes-expressional.html (didn't find any older documentation)

(In reply to comment #23)

There is at least some DB support for doing this efficiently.

In PostgreSQL, since 9.1, there is a built-in function for reversing strings:
http://www.postgresql.org/docs/9.2/static/functions-string.html

In 9.0 and earlier, the function can be built in PL/pgSQL (search for it :)

It is also possible to build indexes on expressions at least since 7.4:
http://www.postgresql.org/docs/7.4/static/indexes-expressional.html (didn't
find any older documentation)

We use mysql (technically mariadb) not postgress

(In reply to comment #24)

We use mysql (technically mariadb) not postgress

It would be nice if PostgreSQL support also had some things *more* in MediaWiki, isn't it possible to add a feature depending on the used DB?

(In reply to comment #25)

(In reply to comment #24)

We use mysql (technically mariadb) not postgress

It would be nice if PostgreSQL support also had some things *more* in
MediaWiki, isn't it possible to add a feature depending on the used DB?

Its possible, but something we don't generally do (in core anyways. Extensions can do what they want).

Typically our db compatibility policy is that it must work on mysql, as mysql is our primary supported db platform.


Adding a fancy index like that in postgress, is probably not that much different from adding an extra field in mysql with the title reversed.

Tgr subscribed.

This would make more sense as a CirrusSearch feature request. ElasticSearch supports wildcards at any position in the search term, and CirrusSearch already supports them, so the only step missing is correctly interpreting combined conditions such as intitle:*foo, as far as I can see.

In T12808#1218828, @Tgr wrote:

This would make more sense as a CirrusSearch feature request. ElasticSearch supports wildcards at any position in the search term, and CirrusSearch already supports them, so the only step missing is correctly interpreting combined conditions such as intitle:*foo, as far as I can see.

Regex works now in intitle.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM