Page MenuHomePhabricator

Increase the cached set size for query pages
Closed, DeclinedPublic

Details

Reference
bz3149

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 8:46 PM
bzimport set Reference to bz3149.
bzimport added a subscriber: Unknown Object (MLST).

See also bug <a href="http://bugzilla.wikipedia.org/show_bug.cgi?id=2415">2415</a>, re: lonelypages. The same is also true for ancientpages... static dumps of the full lists, along with a quick count of the # of matches for those queries, would be very helpful; the latter on the Special:Statistics page.

kellen wrote:

On the lonelypages bug, Ashar says that the 1000 page limit is "set to make it
faster" which doesn't make any sense. How is retrieving records 1-1000 somehow
faster than 1001-2000 out of a single table? And even if this is faster, I am
willing to wait around for a special page if that helps me get some actual work
done.

Also, for wikibooks it's not as simple as for WP. On wikibooks you're working on
a specific subject area/book and you should not be categorizing other books'
pages without knowledge of their conventions, etc. So, if I want to find the
_cookbook_ pages that are uncategorized, I just can't, because they are further
down than 1000.

kellen wrote:

Is anybody going to take this one on or at least comment on the bug? This seems
like something that would be easy to fix.

robchur wrote:

The issue is about raising the default limit on cached special page queries to
increase the size of the set. While it's trivial to tweak, the question is - do
we want to, and how much of a performance hit (we've got to run the queries
periodically, remember, and they take time) are we looking at? So someone
involved in Wikimedia server administration has to make a decision.

kellen wrote:

For wikibooks, could we just turn off the limiting completely? The special pages
aren't heavily used and we only have ~15,000 modules. Alternatively, could we
turn off caching? With the relatively small number of modules and categories I
doubt it would be a huge performance hit. Right now Uncategorizedpages is
basically useless for wikibooks.

robchur wrote:

*** Bug 8450 has been marked as a duplicate of this bug. ***

cohesion wrote:

This is somewhat important for images also, since the toolserver is down we have
no way of knowing which images are completely untagged. The uncategorized can
act as a proxy for this usually. We haven't really been keeping up with this,
but the number of them is probably pretty high now, seeing as 1000 barely gets
through the b's. For this application though we wouldn't need it biweekly,
bimonthly or even monthly would be fine.

shunpiker wrote:

I submitted a patch for Bug 2415 which could also fix this problem if the LIMIT
is turned off for this query.

The LIMIT may not substantially reduce the cost of the cache-building queries
since it is only applied after the "heavy lifting" (full table scans, joins,
etc). This can be confirmed by comparing the stats after running the queries
with and without the LIMITs. (Note that the cache-building query LIMIT is
indirectly making reads on the querycache less expensive -- because it is
keeping it small.)

(See also Bug 4699 for a discussion of the problems of using LIMIT.)

jeluf wrote:

No, the size can't be increased. It's not only the query that's expensive, but the insert is expensive, too. Use the toolserver for such requests.