We think CirrusSearch is doing two page parses on a page update and don't like that. We should do something about that but we're not sure what yet.
Version: unspecified
Severity: normal
We think CirrusSearch is doing two page parses on a page update and don't like that. We should do something about that but we're not sure what yet.
Version: unspecified
Severity: normal
So I think I more or less tracked it down. It can happen in two places:
To fix this once and for all, I think I've got a solution in mind: I'm going to write a class that handles grabbing parser output and the parser cache so Cirrus won't have to care.
At the very least, we'd only end up with one parse per page in a given process.
I like the idea. I think it'll be less important when we get Bug 572316 because there will be fewer paths to take into the update process.
So, the situation is far better than it was now that bug 57316 is fixed. There's tons of refactoring for that and other things have improved this.
We pass ParserOutput around everywhere now, and don't stampede the cache like we used to. We still toss the EditPage's output like I said in comment 1 (1), but we don't care anymore since we're using the job queue (and passing the output through the job queue would be bad).
Comment 1 (2) is fixed now
Other than bulk indexing jobs (where we bump the batch size to 10-ish), we shouldn't be doing any more than 1 parse per job now, which was the goal.
Marking this FIXED since we don't have a MOSTLY-FIXED-BUT-DONT-CARE :)
(In reply to comment #4)
Comment 1 (2) is fixed now
Pressed enter too soon. Meant to say:
Comment 1 (2) is fixed now with the job queue as well. We still index recursively, but since it's in the job queue we're not doing multiple in-process parses.