Page MenuHomePhabricator

German Wikipedia API search: zero results
Closed, ResolvedPublic

Description

Compare https://de.wikipedia.org/w/api.php?action=query&list=search&srsearch=wikiproject+spam and https://de.wikipedia.org/w/index.php?title=Spezial%3ASuche&search=wikiproject+spam&go=Go : the API search returns zero results (promising three) while the GUI search returns two.

There's also a spurious sroffset query continuation (not sure if this is a separate bug).

I'm guessing this is a WMF specific issue, as it works exactly as expected on en.wp: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=wikiproject+spam https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=wikiproject+spam&go=Go.


Version: unspecified
Severity: normal

Details

Reference
bz32256

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:05 AM
bzimport set Reference to bz32256.
bzimport added a subscriber: Unknown Object (MLST).

Damned odd, but I can confirm the results. API is giving:

<?xml version="1.0"?>
<api>

<query>
  <searchinfo totalhits="3" suggestion="wiki project spam" />
  <search />
</query>
<query-continue>
  <search sroffset="10" />
</query-continue>

</api>

whereas the search UI page returns these two results when search namespace 0:

Maybe there's a bogus result also in there and the UI and API search front-ends are handling it differently (UI by skipping one item, API by breaking on them all)?

Probably needs some lower-level debugging seeing the actual results from the Lucene server.

Actual looping mechanisms in ApiQuerySearch and SpecialSearch look similar (while loop around SearchResultSet::next()) and I _think_ shouldn't tromp on each other.

rainman wrote:

curl http://search6:8123/search/dewiki/wikiproject+spam?version=2

3
0.03831976 0 Benutzerin_Diskussion%3AMrsMyer%2FArchiv%2F2008
0.02220609 0 OpenStreetMap
0.018330451 0 Kritik_an_Wikipedia

Thus it seems the first result is a false positive, since it is from a wrong namespace. This page is at it's latest version (2011-10-23T00:46:30Z) but is wrongly put into main namespace (the format above is: score namespace pagename).

I believe this is because the lucene backend doesn't know about feminine namespace names as they are not listed in XML dump header files.

I think the immediate cause for the different search results is line 132 in http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/api/ApiQuerySearch.php?view=markup : $result should be advanced before continuing, otherwise the API will just loop $limit times in vain.

Add

$result = $matches->next();

before that "continue" and the API and Special:Search should again return the same results.

(In reply to comment #4)

I think the immediate cause for the different search results is line 132 in
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/api/ApiQuerySearch.php?view=markup
: $result should be advanced before continuing, otherwise the API will just
loop $limit times in vain.

Add

$result = $matches->next();

before that "continue" and the API and Special:Search should again return the
same results.

Well spotted. Fixed in r102537, deployment underway.

(In reply to comment #5)

Well spotted. Fixed in r102537, deployment underway.

Deployed now, and https://de.wikipedia.org/w/api.php?action=query&list=search&srsearch=wikiproject+spam works (other than the fact that totalhits still reports 3, because the user talk page with the feminine namespace is counted but not shown). Yay!