Page MenuHomePhabricator

Postgres - Searches that include a number as a term always return no results
Closed, DeclinedPublic

Description

Author: reed

Description:
I couldn't find anything about this in the MediaWiki manual, or in Bugzilla, in the Release Notes for 1.13, or in Google:

If I include a number (a numeral) in a search query, or a word that starts with a number, I always get 0 results, even though I know I have pages that have that number in them. For example, if I have a wiki page about 64-bit PC architecture, I can search for "bit" and "pc" and "bit pc" and "pc bit" and see the page in the results, but if I search for "64 bit" or "64bit" or "64bit PC" or "64 bit architecture" or "pc 64 bit architecture" I always get 0 results.

Thanks

  • MediaWiki: 1.9.1
  • PHP: 5.2.3 (apache)
  • PostgreSQL: PostgreSQL 8.2.4 on x86_64-unknown-openbsd4.2, compiled by GCC cc (GCC) 3.3.5 (propolice)

Extensions

Parser hooks
ParserFunctions (version 1.1.1) Enhance parser with logical functions Tim Starling

Other
ConfirmEdit Simple captcha implementation Brion Vibber

Extension functions
wfSetupParserFunctions, ceSetup and efReCaptcha

Parser extension tags
<pre>

Parser function hooks
if, ifeq, switch, ifexist, ifexpr, iferror, expr, time, timel, rel2abs, titleparts, int, ns, urlencode, lcfirst, ucfirst, lc, uc, localurl, localurle, fullurl, fullurle, formatnum, grammar, plural, numberofpages, numberofusers, numberofarticles, numberoffiles, numberofadmins, language, padleft, padright, anchorencode, special, defaultsort and displaytitle

Hooks
Hook name Subscribed by
AbortLogin (ReCaptcha, confirmUserLogin)
AbortNewAccount (ReCaptcha, confirmUserCreate)
EditFilter (ReCaptcha, confirmEdit)
LanguageGetMagic wfAddCustomVariableLang and wfParserFunctionsLanguageGetMagic
LoginAuthenticateAudit (ReCaptcha, triggerUserLogin)
MagicWordMagicWords wfAddCustomVariable
MagicWordwgVariableIDs wfAddCustomVariableID
ParserClearState (ExtParserFunctions, clearState)
ParserGetVariableValueSwitch wfGetCustomVariable
UserCreateForm (ReCaptcha, injectUserCreate)
UserLoginForm (ReCaptcha, injectUserLogin)


Version: 1.9.x
Severity: minor
OS: OpenBSD
Platform: PC

Details

Reference
bz15229

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:17 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz15229.
bzimport added a subscriber: Unknown Object (MLST).
  • This bug has been marked as a duplicate of bug 42 ***

Un-duping since this is actually a Postgres issue (different search engine backend).

reed wrote:

Looks like Wikipedia uses Extension:MWSearch and Lucene right? So that would be a workaround for this bug (and bug 42?). Are there any other search engines available other than Lucene and the basic SQL search?

Works fine for me. Is there anything showing up in the Postgres error logs? You could also try running the search commands manually by looking at SearchPostgres.php or changing a query extracted from the Postgres logs. As an aside, you should upgrade to 8.2.9 (it's a quick binary drop-in).

Actually, the best debugging may be to set log_statement = 'all' in your postgresql.conf, HUP, and grab the exact query it's running when you try one of the no-result queries, then run it via psql to see what's going on. Try PostgreSQL on freenode if that doesn't yield anything obvious.

reed wrote:

I don't see anything that jumps out at me in the log file after setting log_statement = 'all', unless the to_tsquery procedure is doing something weird with the number. The server admin is going to upgrade everything (MediaWiki, PostgreSQL, even new hardware I think) and I'll see what happens then.

reed wrote:

Here's another clue:

I can use ">64" or "<64" in my query and it works (where pages contain the number 64)! Do ">" or "<" have special meaning in a search?

reed wrote:

By the way, here's an excerpt from the PostgreSQL log when we use log_statement = 'all' and I search for "64 bit"

LOG: statement: SELECT /* User::loadFromDatabase */ * FROM mwuser WHERE user_id = '4' LIMIT 1
LOG: statement: SELECT /* User::loadFromDatabase */ ug_group FROM user_groups WHERE ug_user = '4'
LOG: statement: SELECT /* MediaWikiBagOStuff::_doquery */ value,exptime FROM objectcache WHERE keyname='wikidb:messages-hash'
LOG: statement: SELECT /* MediaWikiBagOStuff::_doquery */ value,exptime FROM objectcache WHERE keyname='wikidb:messages'
LOG: statement: SELECT /* Parser::replaceLinkHolders */ page_id, page_namespace, page_title FROM page WHERE (page_namespace=0 AND page_title IN('64_bit'))
LOG: statement: SELECT /* Parser::replaceLinkHolders */ page_id, page_namespace, page_title FROM page WHERE (page_namespace=0 AND page_title IN('Contact_Us', 'Categories'))
LOG: statement: SELECT page_id, page_namespace, page_title, old_text AS page_text, rank(titlevector, to_tsquery('default','+64&+bit')) AS rnk FROM page p, revision r, pagecontent c WHERE p.page_latest = r.rev_id AND r.rev_text_id = c.old_id AND titlevector @@ to_tsquery('default','+64&+bit') AND page_is_redirect = 0 AND page_namespace IN (0) ORDER BY rnk DESC, page_id DESC LIMIT 20 OFFSET 0
LOG: statement: SELECT page_id, page_namespace, page_title, old_text AS page_text, rank(titlevector, to_tsquery('default','+64&+bit')) AS rnk FROM page p, revision r, pagecontent c WHERE p.page_latest = r.rev_id AND r.rev_text_id = c.old_id AND textvector @@ to_tsquery('default','+64&+bit') AND page_is_redirect = 0 AND page_namespace IN (0) ORDER BY rnk DESC, page_id DESC LIMIT 20 OFFSET 0
LOG: statement: SELECT /* Job::pop */ * FROM job ORDER BY job_id LIMIT 1
LOG: statement: SELECT /* LinkBatch::doQuery */ page_id, page_namespace, page_title FROM page WHERE (page_namespace=2 AND page_title IN ('ReedHedges')) OR (page_namespace=3 AND page_title IN ('ReedHedges')) OR (page_namespace=0 AND page_title IN ('Search'))
LOG: statement: SELECT /* User::checkNewtalk */ user_id FROM user_newtalk WHERE user_id = '4' LIMIT 1

And when I search for ">64 bit":

LOG: statement: SELECT /* User::loadFromDatabase */ * FROM mwuser WHERE user_id = '4' LIMIT 1
LOG: statement: SELECT /* User::loadFromDatabase */ ug_group FROM user_groups WHERE ug_user = '4'
LOG: statement: SELECT /* MediaWikiBagOStuff::_doquery */ value,exptime FROM objectcache WHERE keyname='wikidb:messages-hash'
LOG: statement: SELECT /* MediaWikiBagOStuff::_doquery */ value,exptime FROM objectcache WHERE keyname='wikidb:messages'
LOG: statement: SELECT /* Parser::replaceLinkHolders */ page_id, page_namespace, page_title FROM page WHERE (page_namespace=0 AND page_title IN('Contact_Us', 'Categories'))
LOG: statement: SELECT page_id, page_namespace, page_title, old_text AS page_text, rank(titlevector, to_tsquery('default','>64&+bit')) AS rnk FROM page p, revision r, pagecontent c WHERE p.page_latest = r.rev_id AND r.rev_text_id = c.old_id AND titlevector @@ to_tsquery('default','>64&+bit') AND page_is_redirect = 0 AND page_namespace IN (0) ORDER BY rnk DESC, page_id DESC LIMIT 20 OFFSET 0
LOG: statement: SELECT page_id, page_namespace, page_title, old_text AS page_text, rank(titlevector, to_tsquery('default','>64&+bit')) AS rnk FROM page p, revision r, pagecontent c WHERE p.page_latest = r.rev_id AND r.rev_text_id = c.old_id AND textvector @@ to_tsquery('default','>64&+bit') AND page_is_redirect = 0 AND page_namespace IN (0) ORDER BY rnk DESC, page_id DESC LIMIT 20 OFFSET 0
LOG: statement: SELECT /* Revision::fetchRow */ page_namespace,page_title,page_latest,rev_id,rev_page,rev_text_id,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_timestamp,rev_deleted FROM page,revision WHERE (rev_id=page_latest) AND (page_id=rev_page) AND page_namespace = '0' AND page_title = 'Installing_Debian_Packages_on_64-bit_OS' LIMIT 1
LOG: statement: SELECT /* Revision::loadText */ old_text,old_flags FROM pagecontent WHERE old_id = '1020' LIMIT 1
LOG: statement: SELECT /* Revision::fetchRow */ page_namespace,page_title,page_latest,rev_id,rev_page,rev_text_id,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_timestamp,rev_deleted FROM page,revision WHERE (rev_id=page_latest) AND (page_id=rev_page) AND page_namespace = '0' AND page_title = 'Installing_Debian_Packages_on_64-bit_OS' LIMIT 1
LOG: statement: SELECT /* Revision::loadText */ old_text,old_flags FROM pagecontent WHERE old_id = '1020' LIMIT 1
LOG: statement: SELECT /* Revision::fetchRow */ page_namespace,page_title,page_latest,rev_id,rev_page,rev_text_id,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_timestamp,rev_deleted FROM page,revision WHERE (rev_id=page_latest) AND (page_id=rev_page) AND page_namespace = '0' AND page_title = 'RB_(SAY)_example_code' LIMIT 1
LOG: statement: SELECT /* Revision::loadText */ old_text,old_flags FROM pagecontent WHERE old_id = '430' LIMIT 1
LOG: statement: SELECT /* Job::pop */ * FROM job ORDER BY job_id LIMIT 1
LOG: statement: SELECT /* LinkBatch::doQuery */ page_id, page_namespace, page_title FROM page WHERE (page_namespace=2 AND page_title IN ('ReedHedges')) OR (page_namespace=3 AND page_title IN ('ReedHedges')) OR (page_namespace=0 AND page_title IN ('Search'))
LOG: statement: SELECT /* User::checkNewtalk */ user_id FROM user_newtalk WHERE user_id = '4' LIMIT 1

reed wrote:

Note the different arguments to to_tsquery(): to_tsquery('default','+64&+bit') vs. to_tsquery('default','>64&+bit').

reed wrote:

I upgraded to MediaWiki 1.13.1 and the problem seems to have gone away. (Curious though what fix between 1.9 and 1.13 did it though, couldn't find anything in the release notes for any of the intervening versions...)

reed wrote:

I am closing this report with WORKSFORME, since it *appears* to be fixed, though it would be good to find out what changed between MediaWiki 1.9 and 1.13.1 that would have fixed this problem, and include it in the release notes/change logs (or clarify the wording there if it is included).

Jdforrester-WMF subscribed.

Migrating from the old tracking task to a tag for PostgreSQL-related tasks.