Page MenuHomePhabricator

CirrusSearch doesn't find terms that appear in wikitext but not in rendered text
Closed, ResolvedPublic

Description

+++ This bug was initially created as a clone of Bug #54502 +++

<p858snake|l> ^d: search isn't looking inside templates anymore? https://www.mediawiki.org/w/index.php?search=AllowImageTag&button=&title=Special%3ASearch (used to find a result)
<p858snake|l> also it doesn't find the page, unless you search its exact title either... https://www.mediawiki.org/w/index.php?search=wgAllowImageTag&button=&title=Special%3ASearch
<^d> Hmm, should. Let's have a looksee.
<p858snake|l> it should find https://www.mediawiki.org/wiki/Manual:$wgAllowExternalImages#See_also at least
<^d> Right. File a bug, that looks...wrong.


Version: unspecified
Severity: normal

Details

Reference
bz54503

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 2:13 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz54503.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #1)

Works for me:
Searching for
https://www.mediawiki.org/w/index.
php?search=wgAllowImageTag&button=&title=Special%3ASearch
finds https://www.mediawiki.org/wiki/Manual:$wgAllowImageTag as expected.

Please reopen if it stops working again.

well of course it does, if you search its full name minus the NS.

try

https://www.mediawiki.org/w/index.php?title=Special%3ASearch&profile=default&search=AllowImageTag&fulltext=Search

Sorry, I had trouble parsing the IRC conversation into a bug. So if you search with Lucene search (https://www.mediawiki.org/w/index.php?search=AllowImageTag&button=&title=Special%3ASearch&srbackend=LuceneSearch) you get two results:
Manual:$wgAllowExternalImages and Manual:$wgAllowImageTag. With CirrusSearch you get no results.

LuceneSearch finds both pages because the term AllowImageTag exists inside a template. In both cases AllowImageTag is a parameter passed to the wg template. CirrusSearch doesn't find either one because when the template is expanded it doesn't contain the string AllowImageTag. This is the expected behaviour for both extensions.

I think the part of the IRC conversation about CirrusSearch not finding pages with an exact title is really describing this behaviour.

I'm not really sure how to prioritize this given that searching the rendered version of the article rather than the wikitext that built it is an explicit design goal and one that lots of folks are excited about. CirrusSearch certainly won't switch that off.

We could solve this bug in a bunch of ways, none of which I'm particularly happy with:

  1. Parse camelCase text specially, splitting terms on case change. This would fix this particular problem and may even be useful for MW.org but doesn't solve all template contents searching problems and doesn't make sense outside of MW.org.
  2. Index wikitext in another field and query against it when querying against text. Only show wikitext highlights if there aren't any matches in the rendered text. This has a ton of overhead because we're doubling up on the largest field and would create confusing highlighting for readers but only in the case where the search wouldn't have found anything anyway.
  3. Index template text next to its expanded version in wiki. This would make highlighting look hideous and would be really hard to implement and would bloat the page text mightily.
  4. Allow term fragmenting hints to be inserted into wikitext that aren't rendered but are passed to search. This creates a bunch of work for the community and a bunch of development work but provides a general solution to the problem. It certainly could be abused though.

Moving to low with feature requests. I know this is a parity thing, but it is the opposite of the core "we index the expanded templates" feature of CirrusSearch.

I'm going to dupe this to bug 43652 for allowing free querying of the search index (which will also contain unparsed wikitext).

  • This bug has been marked as a duplicate of bug 43652 ***