Page MenuHomePhabricator

ApiQueryAllUsers-related queries causing lag on Wikidata
Closed, DeclinedPublic

Description

We just had an issue where hung queries from the Web API were causing massive lag on one of the Wikidata servers. Example query snippet:


SELECT /* ApiQueryAllUsers::execute */ ipb_deleted,COUNT(*) AS recentedits,user_name

Lag as reported by https://www.wikidata.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= was high:


<?xml version="1.0"?>
<api>

<query>
  <dbrepllag>
    <db host="db1058" lag="0" />
    <db host="db1005" lag="0" />
    <db host="db1026" lag="26432" />
    <db host="db1021" lag="0" />
  </dbrepllag>
</query>

</api>

Server log entry about killing the queries: https://wikitech.wikimedia.org/w/index.php?title=Server_Admin_Log&diff=80744&oldid=80742&diffonly=1.

As soon as the queries were killed, lag decreased.

ApiQueryAllUsers needs investigation.


Version: 1.22.0
Severity: normal

Details

Reference
bz52979

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:54 AM
bzimport set Reference to bz52979.
bzimport added a subscriber: Unknown Object (MLST).

Do we actually know what the problem query string was? The COUNT(*) AS recentedits looks slightly suspicious

Depending on how bad this is, it might be worth disabling the module on wikidata for the time being.

For now I have a pt-kill job running to snipe these queries on s5 slaves after 10 mins. Will gather some examples.

No more queries sniped yet.

However, the example wikidata query in the comment 3 gerrit changeset showed MySQL changing execution plans between wikis due to quite different index cardinality values. ANALYZE TABLE helped there, so this might be a maintenance issue.

When did you do the ANALYZE TABLE? I was already seeing the sane results yesterday on wikidatawiki, using sql.php from terbium.

ANALYZE ran ~1h before comment 4.

Were you connecting to the wikidatawiki master? Or one of the slaves?

Master, apparently, since I didn't specify the "slave" option to sql.php (in fact, I didn't know about that option until just now when I checked the code to find out what it did; I'll have to remember to use that in the future).

The query itself is listed in https://gerrit.wikimedia.org/r/#/c/79761/ but that patch is abandoned.

Is there anything further to be done in this ticket?
Or is this WORKSFORME now?

Is there anything further to be done in this ticket, as the patch is abandoned? Or is this WORKSFORME now?

Let's WORKSFORME this one and chalk it up to a bad query plan per comment 4. People can reopen if it's still a problem.