Page MenuHomePhabricator

ApiQueryRandom allows slow queries for nonexistant page_namespace
Closed, ResolvedPublic

Description

Variations on the following query get sniped for running too long on enwiki slaves:

SELECT /* RandomPage::selectRandomPageFromDB */ page_title, page_namespace FROM page WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1;

The above is nice and fast if the page_namespace value exists, but turns into a massive index scan if not (like 446!).

ApiQueryRandom should check for a valid page_namespace first.


Version: 1.21.x
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=58324

Details

Reference
bz57285

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:26 AM
bzimport set Reference to bz57285.
bzimport added a subscriber: Unknown Object (MLST).

Actually, I picked on ApiQueryRandom because it *seems* to be the source of the queries.

SpecialRandompage apparently already limits page_namespace values using MWNamespace::getContentNamespaces(), and nothing else in mediawiki-core seems to fit the profile. Perhaps a Developer can say more...

The example query above had a FORCE INDEX (page_random) removed during testing:

mysql wmf db1043 root enwiki> explain SELECT page_title, page_namespace FROM page force index (page_random) WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: page type: range

possible_keys: page_random

    key: page_random
key_len: 8
    ref: NULL
   rows: 15821483
  Extra: Using index condition; Using where

1 row in set (0.25 sec)

Simply removing the force allows invalid namespaces to be optimized away:

mysql wmf db1043 root enwiki> explain SELECT page_title, page_namespace FROM page WHERE page_namespace = '446' AND page_is_redirect = '0' AND (page_random >= 0) ORDER BY page_random LIMIT 1\G

  • 1. row ******* id: 1 select_type: SIMPLE table: page type: ref

possible_keys: name_title,page_random,page_redirect_namespace_len

    key: name_title
key_len: 4
    ref: const
   rows: 1
  Extra: Using where; Using filesort

1 row in set (0.25 sec)

This is actually a problem with both RandomPage::selectRandomPageFromDB and ApiQueryRandom::runQuery; both are showing up in slow logs with large index scans.

Seems like the FORCE INDEX (page_random) isn't necessary for the mediawiki pseudo-random approach to function. The second example above allows a filesort but this is much lower impact than the massive index scan. Consider removing the force so that more appropriate indexes can be chosen (or added).

(In reply to comment #0)

ApiQueryRandom should check for a valid page_namespace first.

Note that 446 is a valid page_namespace, it's "Education Program" (there's a handy chart on [[WP:NS]]). I wonder if this is related to bug 58324?

Ah ok, thanks. I went on distinct enwiki page_namespace values. That makes removing the forced index the only solution.

Gerrit change 102880 fixed the "empty namespace" issue, so this should be closed unless you want to keep it open to address this comment left on that change:

page_random may need some reindexing too, either (page_namespace, page_random)
for best effect or (page_random, page_namespace) for lesser effect while still
avoiding filesort and reusing exiting page_random index in case anything else
still forces it.