Page MenuHomePhabricator

Improving search for templates
Closed, DeclinedPublic

Description

Author: andy

Description:
A search for, say, "{{Authority control}}" should be treated as a search for "Template:Authority control" (or at least return that template page ahead of other search results).

Similarly, a search for "{{Authority" should return all templates beginning "Template:Authority"


Version: unspecified
Severity: enhancement

Details

Reference
bz32655

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 12:00 AM
bzimport set Reference to bz32655.
bzimport added a subscriber: Unknown Object (MLST).

orenbochman wrote:

Thanks for the suggestion. I am looking into smarter indexing of wikisource and will consider this feature in the design.

To give reliable results for such queries in the search engine:

  1. The indexer would need to carry out template expansions of MediaWiki source.

1.1 Access to all the templates.
1.2 Re-implimentation of magic words, parser function math and logic operators.
1.3 Reindexing all dependent pages whenever a template changes.

  1. The parser then have to further analyse the wiki source to tokenize templates.
  2. What is a good information architecture for storing Template annotations would be index in a correct order without creating gaps in surrounding text.

3.1 Use 0 position increment for wiki text, or
3.2 Template information could be stored in a separate field, or
3.3 GATE type multi_document source,unified diff of wiki-source & HTML-output
3.4 Replace document term position vector with a DAWG term tree, or
3 Template annotation tokens would need to be amiable to prefix queries.

  1. a Lucene query to retrieve template (exact|prefix|category).
  2. a sensible ranking mechanism for such results.
  3. a UI modification to allow exact source search.

rainman wrote:

This seems like a massive effort that will need lots of maintenance.

Why not store expanded wikitext somewhere in the database (or reference to some of the caching layers) and then query that instead of normal wikitext via OAI?

[Merging "MediaWiki extensions/Lucene Search" into "Wikimedia/lucene-search2", see bug 46542. You can filter bugmail for: search-component-merge-20130326 ]

Someone who types "{{Authority control}}" as a search query is likely to be a very experienced user. They'll know what the template namespace is. I don't think it really makes sense to add specific, custom functionality for this when we already have an advanced search option to search template namespace.

Also, Lucene is reaching the end of its life and I'm the process of clearing out old bugs like this. Any future feature requests for search should be filed in MediaWiki Extensions -> CirrusSearch.

Changing to RESOLVED WONTFIX.