Page MenuHomePhabricator

Option to sort search results by size, number of words and date in advanced
Closed, ResolvedPublic

Description

Since the search results give the size and date of latest modification (at the time of the indexing), I suppose the results can be sorted by size and date. So it's a request to add those options for advanced searches, below Search by namespaces:

Sort by:
relevance (the default) . size . date of latest modification

Both would be helpful in particular for certain maintenance tasks, sorting by size for stubs and short pages, and sorting by date of latest modification to detect pages not edited for a long time for example.


Version: unspecified
Severity: enhancement

Details

Reference
bz21139

Related Objects

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:56 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz21139.
bzimport added a subscriber: Unknown Object (MLST).

Robert, is that feasible for the Lucene index?

To do it in the MySQL backend we'd need to add those fields to searchindex (or else build a temporary table w/ results -- anything to avoid joining searchindex and page which is the Performance Kiss Of Death) and I'm still not sure how efficient that would run.

rd232 wrote:

Any update on feasibility of sorting search results?

nilesh wrote:

Can Lucene 3.4.0 (at least) be used with MediaWiki? Are there any problems/limitations?

It has support for index-time joins that denormalize relational database tables so that we can simply search over those tables in real time without any joining overhead. Please check the docs for the same for v 3.6.0 here - http://lucene.apache.org/core/3_6_0/api/contrib-join/org/apache/lucene/search/join/package-summary.html

What do you think about using it?

(In reply to comment #3)

Can Lucene 3.4.0 (at least) be used with MediaWiki? Are there any
problems/limitations?

https://wikitech.wikimedia.org/wiki/Search has info about the search infra and what's used with what. Let's please keep this bug report specific and focused.

Cenarium renamed this task from Option to sort search results by size and date in advanced to Option to sort search results by size, number of words and date in advanced.Dec 14 2014, 2:31 AM
Cenarium added a project: CirrusSearch.
Cenarium set Security to None.

Added project CirrusSearch since it's the new search engine used by wikimedia wikis. Also added number of words as criteria since it is also shown on search results.

Just noticed T11519, I'll fill a separate bug for date of latest modification, and another for multimedia search options.

Oops, not a duplicate, this is a request for sorting the results, not for filtering them. Note that for multimedia searches, size should mean file size.

We've added some of this sorting capability since this was filed, but page size is not among those updates. Since that's captured in another ticket, we'll close this one.

See the sort parameter to the search api, added in the last few months: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=search&srsearch=qqq&srinterwiki=1&srsort=relevance

Available sorts include last edited date, incoming link counts and relevance. Soon also sorting by creation date. We could plausibly add sorting by word count but we would need to expand on the use case.

EBjune claimed this task.