Page MenuHomePhabricator

Search results incorrect for <4-letter words after update from <1.13
Closed, DeclinedPublic

Description

On my MW 1.14 (without Lucene) the search now results only crap when you include words with 4 or less letters. in 1.1q3 all was fine. Exampes:

Searching "the word" will return only talk pages when "word is found".

Searching "the" will find "the" (since when are 3 letter words indexed?) and result a few articles but mostly talk pages again.

Maybe there is more critical stuff I just didn't find yet.


Version: 1.14.x
Severity: minor

Details

Reference
bz17733

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:36 PM
bzimport added a project: MediaWiki-Search.
bzimport set Reference to bz17733.
bzimport added a subscriber: Unknown Object (MLST).

I simply guess the new search wasn't tested without Lucene.

The second error at least is expected behaviour.

We want to index three-letter words (see bug 7726).

I'm not sure about the first. Is it that quotes aren't working as expected? Can you link to a test case? I'm not sure how to interpret the test case you provided.

Must be something about category and article names. Searching other ns page titles works. Maybe a wrong setting? Buggy anyway then imo. WIth every new version MW seems to care care less about backward compatibility.

About the 3 letters indexed: This was not mentioned in the release notes of 1.14 http://svn.wikimedia.org/svnroot/mediawiki/tags/REL1_14_0/phase3/RELEASE-NOTES ? Some wikis use a note that you can't search 3 or less letter words...

ok it has some odd behaviour. ran updateSearchIndex.php yesterday. today it finds the examples from above :)

ayg wrote:

(In reply to comment #6)

About the 3 letters indexed: This was not mentioned in the release notes of
1.14
http://svn.wikimedia.org/svnroot/mediawiki/tags/REL1_14_0/phase3/RELEASE-NOTES
?

  • (bug 7726) Searches for words less than 4 characters now work without requiring customization of MySQL server settings

ayg wrote:

If rebuilding the search index is necessary for search to work correctly, this should be done by update.php. Reopening.

Well, it was like this: When I searched "club fg" I only got Talk page results. I ran the rebuild script and it indexeed 95% talk pages. I defined start and end to cover all times of my wiki and it indexed the whole bunch. Nothing seemed to be changed in the results till I checked again today.

Now "club fg" find the category but not the pages.

Still http://www.mixesdb.com/db/index.php/Special:Search?search=Centro+Fly&fulltext=Search doesn't find http://www.mixesdb.com/db/index.php/Category:Centro_Fly while trying "Centro" finds it (seems that the "fly" is breaking the search).

I use Extension:GoToCategory so don't bother trying my search using "Go".

The ultimate test is "in the mix" http://www.mixesdb.com/db/index.php/Special:Search?search=in+the+mix&fulltext=Search should find tons of pages from http://www.mixesdb.com/db/index.php/Category:Centro_Fly

Would be good to have other 1.14 sites without Lucene to check if it's only me.

I think the problem is 99,9% on my side. Please keep it closed till I finished checking :)

ayg wrote:

If you have to manually run the search index rebuild script, that's not a problem on your side, that's a problem with the stated upgrade procedure.

I had some restrictions errors when I tried to run the rebuild script in my new 1.14 directory.
Smart as I am I copied the 2 rebuild script files from the 1.14 to the 1.13 maintenance directory and ran it from there. In my logic the indexing procedure to the table would be correct? Guess not:
After running the 1.14 script in my 1.13 directory the index is updated and "club fg" not found. When I change the Club FG article page it is found. This tells me the way i ran the rebuild script was not correct. Need to get rid of the restriction errors first to properly run it in my 1.14 directory. THEN I can tell if it works ;)

**When I change the Club FG category page

ayg wrote:

You should not have to be doing this. It's a bug in the 1.14 release that should be fixed in the next point release if possible. This should all happen automatically when you run update.php. Reopening.

For the record: I was not able to run maintenance script cos of the changes to the command line scripts now using realpath() which requires safe_mode to be turned off for command line.

The search seems to work now after I ran maintenance/rebuildtextindex.php instead of only updateSearchIndex.php.

Given link seems to work fine at present. Rebuild would only be needed for newly indexed things; without reindexing behavior would be exactly as before (ignored words would remain ignored).

Reopening -- further consideration after a post on mailing list reminds me that in fact some words will behave differently.

If padding keeps the word off the ignore list, then it becomes a required word -- which won't be found in old pages that had the word but not the padded word. Upgrade procedure possibly should be updated, or at least a reference to rebuildtextindex slipped into UPGRADE.

Aklapper changed the task status from Open to Stalled.Feb 7 2022, 11:30 AM
Aklapper added a project: TestMe.
Aklapper edited subscribers, added: Aklapper; removed: wikibugs-l-list.

Does anyone know if this is still an issue, as T9726 is resolved? Proposing to close nowadays.

MPhamWMF subscribed.

If this is still an issue on the latest mediawiki version, this ticket can be reopened