CirrusSearch renders every page in the search results probably just to tell the user how many bytes are in it
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Manybubbles
	Oct 10 2013, 9:07 PM

Description

+++ This bug was initially created as a clone of Bug #55590 +++

Bug 55590 was "discovered" by CirrusSearch's overzealous rendering so I'm cloning it to make this one. That bug caused crashing which is bad the crashing happens without CirrusSearch. Its just that CirrusSearch casts a wide net (due to this bug) and 55590 throws a bomb in the net so the search results page blows up.

The part of the backtrace that matters here:
#17 /usr/local/apache/common-local/php-1.22wmf21/includes/search/SearchEngine.php(868): CirrusSearch->getTextFromContent(Object(Title), Object(WikitextContent))
#18 /usr/local/apache/common-local/php-1.22wmf21/includes/search/SearchEngine.php(954): SearchResult->initText()
#19 /usr/local/apache/common-local/php-1.22wmf21/includes/specials/SpecialSearch.php(651): SearchResult->getByteSize()
#20 /usr/local/apache/common-local/php-1.22wmf21/includes/specials/SpecialSearch.php(543): SpecialSearch->showHit(Object(CirrusSearchResult), Array)

Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=55750

Details

Reference: bz55592

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• Manybubbles	T57592 CirrusSearch renders every page in the search results probably just to tell the user how many bytes are in it
		Resolved		None	T57590 Searching for 'key metrics' causes internal error on mediawiki

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 2:26 AM

• bzimport added a project: CirrusSearch.

• bzimport set Reference to bz55592.

• Manybubbles created this task.Oct 10 2013, 9:07 PM

This makes showing results really really really slow.

Got started on this but I have to stop for the night. We already have the number of bytes in the article in elasticsearch (called textLen) but it isn't stored (so it has to be retrieved from the source, slowing down queries). I'd like to store both the number of bytes and the number of words directly in Elasticsearch. I think it is worth overriding these methods to stop the rendering and return textLen for both, deprecate textLen, and replace it with text_bytes and text_words. The next step would be to reindex. Then stop using textLen and stop writing it. On the next reindex it won't be recreated.

Ultimately I'd like to let Elasticsearch figure out the word length on its own but I'm not sure how to do that at this point. str_word_count will have to do for now.

Change 89832 had a related patch set uploaded by Manybubbles:
Include wordCount and byteSize in result

https://gerrit.wikimedia.org/r/89832

Change 89832 merged by jenkins-bot:
Include wordCount and byteSize in result

https://gerrit.wikimedia.org/r/89832

• Deskana moved this task from Inbox to Resolved/Invalid/Declined/Legacy on the CirrusSearch board.Apr 20 2015, 4:11 AM

CirrusSearch renders every page in the search results probably just to tell the user how many bytes are in itClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

CirrusSearch renders every page in the search results probably just to tell the user how many bytes are in it
Closed, ResolvedPublic
Actions

Related Objects
Search...