Page MenuHomePhabricator

Select of revisions for stub history files does not explicitly order revisions
Closed, ResolvedPublic

Description

In trunk, the query run in Export.php in dumpFrom() (used for generating stub history files) is

SELECT * FROM page INNER JOIN revision ON ((page_id=rev_page)) WHERE page_id >= 1157 AND page_id < 1158 ORDER BY page_id ASC;

Revisions don't get explicitly ordered. This results in the order changing from one dump to another.

Example:

el.wiktionary dumps, page name υγεία, page id 1157, revid 1432 timestamp 2005-02-27T15:34:30Z either appears first in the revisions listed in the stubs-meta-history file because it has the earliest timestamp, or 4th because it's 4th if revisions are sorted by revid.

Smallest revid for that page is actually 1153 with timestamp 2005-02-27T15:34:45Z.

In fact the order seems to be chosen randomly depending on when the search is run:

elwiktionary-20100401-stub-meta-history.xml.gz -- revid 1432 is first
elwiktionary-20100505-stub-meta-history.xml.gz -- revid 1432 is 4th, 1153 is first
elwiktionary-20110123-stub-meta-history.xml.gz -- revid 1432 is first

Need to go through the code and make sure every such query has an explicit
order for revisions.

Also... need to find out why bigger revid has earlier timestamp (since in theory revids get assigned in order as used).


Version: unspecified
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:23 PM
bzimport set Reference to bz27112.

Note that mismatched ordering might really confuse the dump prefetching code, which IIRC assumes a consistent order of (page_id, rev_id).

There are many perfectly 100% legit reasons for a later rev_id to have an earlier timestamp, chief among them:

  • import
  • restore from deletion (for things deleted prior to rev_id being stored in archive table, at least)

as well as mundane things such as server clock skew.

So does anyone on this bug have any objections if I explicitly ORDER BY rev_id ASC at the end of that query? Is that going to kill performance or break some subtle thing in the dumps as they are now?

the explain doesn't look good, yuck:

explain SELECT * FROM page INNER JOIN revision ON ((page_id=rev_page)) WHERE page_id >= 1157 AND page_id < 1158 ORDER BY page_id ASC, revision.rev_id ASC;
+----+-------------+----------+-------+------------------------+---------+---------+---------------------+------+----------------------------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+----------+-------+------------------------+---------+---------+---------------------+------+----------------------------------------------+

1SIMPLEpagerangePRIMARYPRIMARY4NULL1Using where; Using temporary; Using filesort
1SIMPLErevisionrefPRIMARY,page_timestampPRIMARY4elwiki.page.page_id9

+----+-------------+----------+-------+------------------------+---------+---------+---------------------+------+----------------

Prolly need to specify which key or something in here...

I assume the 'yuck' refers to temporary/filesort. Since the ORDER clause contains columns from both tables MySQL can't use an index for ordering.

revisions has an index on rev_page,rev_id so doing both WHERE and ORDER on rev_* fields gives:

explain SELECT * FROM revision JOIN page ON rev_page=page_id WHERE rev_page >= 1157 and rev_page < 1158 ORDER BY rev_page, rev_id\G

  • 1. row ******* id: 1 select_type: SIMPLE table: revision type: range

possible_keys: PRIMARY,page_timestamp

    key: PRIMARY
key_len: 4
    ref: NULL
   rows: 1
  Extra: Using where
  • 2. row ******* id: 1 select_type: SIMPLE table: page type: eq_ref

possible_keys: PRIMARY

    key: PRIMARY
key_len: 4
    ref: enwiki.revision.rev_page
   rows: 1
  Extra:

Will the page_id range always be so small?

Using a larger range causes the query optimizer to switch table join order and fall back on temporary/filesort again. We could force it with STRAIGHT_JOIN but at the risk of hitting far more rows...

It's possible to request a dump by arbitrary page range (and in fact that feature is used), so no, we can't guarantee that the range will be small, unfortunately.

We really only need this for the stub dumps; maybe I can tailor the queries in that special case (yuck but less yuck than temporary/filesort).

Nemo_bis renamed this task from select of revisions for stub history files does not explicitly order revisions to Select of revisions for stub history files does not explicitly order revisions.Apr 9 2015, 7:31 AM
Nemo_bis lowered the priority of this task from Medium to Low.
Nemo_bis set Security to None.
ArielGlenn raised the priority of this task from Low to High.Jun 16 2016, 12:22 PM

This is now a high priority bug, as the revid order for wikidata is pretty close to random any more, for the stub dumps, which breaks any notion of prefetching, causing most (over 90%!) of revision content to be retrieved directly from the database. That's clearly untenable and must be fixed before the next full run (beginning of July).

Stubs are dumped by page range right now, currently between 100k and 500k pages at once, depending on the size of the wiki. This can be adjusted as needed, but we can't make it ridiculously small (i.e. 1000 pages at once would mean a ridiculous number of queries to get through all of en wiki). I suppose the 'right value' depends on the number of revisions contained in a given set of pages.

@jcrespo can you give me an estimate upper bound to avoid the filesort? Or can you give me hints on how to figure it out myself?

jcrespo moved this task from Triage to In progress on the DBA board.

I think we can get acceptable performance numbers if we sell "our souls" a little- aka introducing some complexity that will require more maintenance.

@Springle mentioned this above- mariadb is not doing the right query plan regarding inner join and index selection (this is NOT an isolated event, this has happened several times for our queries). To be fair, it is not an easy option, if it is better to use indexes for ordering or for filtering.

However, while using special keywords is a really bad idea for interactive queries, because all of a sudden you can end up with a horrible performance, I think it is more acceptable for batch queries like this, which if all of a sudden, they become very slow, we can just cancel the job and do some of the maintenance I talked before.

In a nutshell, I think that if we transform the original query:

MariaDB db1053.eqiad.wmnet enwiki > EXPLAIN SELECT * FROM revision JOIN page ON rev_page=page_id WHERE rev_page >= 6000000 and rev_page < 6001000\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 647
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ref
possible_keys: rev_page_id,page_timestamp
          key: rev_page_id
      key_len: 4
          ref: enwiki.page.page_id
         rows: 8
        Extra: 
2 rows in set (0.00 sec)

MariaDB db1053.eqiad.wmnet enwiki > pager cat > /dev/null; SELECT * FROM revision JOIN page ON rev_page=page_id WHERE rev_page >= 6000000 and rev_page < 6001000\G
PAGER set to 'cat > /dev/null'
21272 rows in set (59.40 sec)

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 650   |
| Handler_read_last          | 0     |
| Handler_read_next          | 21921 |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)

Into:

MariaDB db1053.eqiad.wmnet enwiki > pager cat > /dev/null; SELECT STRAIGHT_JOIN * FROM revision IGNORE INDEX(page_timestamp) JOIN page ON rev_page=pa
ge_id WHERE rev_page >= 5000000 and rev_page < 5001000 ORDER BY rev_page, rev_id\G
PAGER set to 'cat > /dev/null'
19736 rows in set (1 min 5.61 sec)

MariaDB db1053.eqiad.wmnet enwiki > EXPLAIN SELECT STRAIGHT_JOIN * FROM revision IGNORE INDEX(page_timestamp) JOIN page ON rev_page=page_id WHERE rev
_page >= 5000000 and rev_page < 5001000 ORDER BY rev_page, rev_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: range
possible_keys: rev_page_id
          key: rev_page_id
      key_len: 4
          ref: NULL
         rows: 39062
        Extra: Using index condition
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.revision.rev_page
         rows: 1
        Extra:
2 rows in set (0.00 sec)

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 21288 |
| Handler_icp_match          | 21288 |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 666   |
| Handler_read_last          | 0     |
| Handler_read_next          | 21288 |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)

(adding the STRAIGHT_JOIN and ignoring the wrong index) We can get acceptable results- in order, and with acceptable overhead in rows read (but no filesort).

I would change the code to allow this (it is allowed by the mediawiki ORM), then test it informally on a few representative wikis, with low and high number of pages to confirm it is working.

Regarding performance, it took me 1 minute to check 10000 pages (20000 revisions returned) or 7 minutes to return 100000 pages (175000 revisions returned). I expect these number to improve soon with T131363- revision is the slowest table to navigate as it will be mostly on disk.

Thanks a lot for this.

Because the number of revisions per page varies, with e.g. the first 30k pages in en wikipedia having 24 million revisions, I'll have to do some preliminary queries to make good guesses (SELECT COUNT(...)) first. Let me put some code together and see how it looks.

I've done a bunch of timing tests and explains (still shaking my head at the query optimizer) but still more to go; these were on en wp but I need to do wikidata and make some guesses about growth. I'm working up a MW patch which I expect to test and stuff into gerrit later today.

Are those test positive or negative?

They mean that I need to be very careful about page ranges; it's a sort of sliding scale. I don't have to worry about that in the MediaWiki code but the caller most definitely does, and that would be my python scripts.

Change 295387 had a related patch set uploaded (by ArielGlenn):
add option to XML dump stubs of page ranges with explicit rev_id ordering

https://gerrit.wikimedia.org/r/295387

The above patchset is now ready for review. Any takers?

Change 295387 merged by jenkins-bot:
add option to XML dump stubs of page ranges with explicit rev_id ordering

https://gerrit.wikimedia.org/r/295387

Change 296614 had a related patch set uploaded (by ArielGlenn):
add option to XML dump stubs of page ranges with explicit rev_id ordering

https://gerrit.wikimedia.org/r/296614

Change 296614 merged by jenkins-bot:
add option to XML dump stubs of page ranges with explicit rev_id ordering

https://gerrit.wikimedia.org/r/296614

Thanks to MaxSem and Chad, this is now deployed for 1.28.0-wmf.7 and 1.28.0-wmf.8. It *should* impact only dumps of stubs with specified page ranges, as a safety factor. I'll leave this open for now til we see how everything pans out.

I need to go through the logs for the page content dumps for the last run and see which projects have low prefetch ratios; these would benefit from applying this change, if possible.

List of wikis with more than 10 million revs and low prefetch rates below. Need to make sure the indices on these wikis support the ordered query before changing the config.

*** 2018-03-07 04:11:54: cswiki (ID 8630) 1076616 pages (7.9|264957.6/sec all|curr), 15061805 revs (110.7|198.1/sec all|curr), 26.8%|26.8% prefetched (all|curr), ETA 2018-03-07 06:20:41 [max 15917202]
*** 2018-03-08 04:39:53: huwiki (ID 56636) 1240699 pages (5.8|2231505.6/sec all|curr), 18605142 revs (87.3|255.4/sec all|curr), 26.6%|26.6% prefetched (all|curr), ETA 2018-03-08 08:09:56 [max 19705391]
*** 2018-03-09 06:15:55: enwiktionary (ID 60480) 6068340 pages (27.9|3347174.4/sec all|curr), 47894667 revs (220.0|367.9/sec all|curr), 31.9%|31.9% prefetched (all|curr), ETA 2018-03-09 07:47:48 [max 49107678]
*** 2018-03-07 07:51:54: fiwiki (ID 37993) 1167695 pages (8.2|580511.3/sec all|curr), 16377485 revs (114.8|241.1/sec all|curr), 31.9%|31.9% prefetched (all|curr), ETA 2018-03-07 09:41:20 [max 17131078]
*** 2018-03-06 23:38:43: rowiki (ID 17014) 2008233 pages (22.7|825979.9/sec all|curr), 11239659 revs (127.0|271.0/sec all|curr), 38.4%|38.4% prefetched (all|curr), ETA 2018-03-07 01:15:35 [max 11977617]
*** 2018-03-07 10:34:16: trwiki (ID 62285) 1549319 pages (10.6|3934359.6/sec all|curr), 17456077 revs (119.1|195.5/sec all|curr), 38.9%|38.9% prefetched (all|curr), ETA 2018-03-07 15:31:01 [max 19576489]
*** 2018-03-09 03:46:33: ukwiki (ID 19740) 2331245 pages (10.0|549833.9/sec all|curr), 21229789 revs (90.8|186.1/sec all|curr), 34.9%|34.9% prefetched (all|curr), ETA 2018-03-09 06:42:40 [max 22189668]
*** 2018-03-10 04:40:20: svwiki (ID 34459) 7646681 pages (31.5|2657353.7/sec all|curr), 41298920 revs (170.1|319.7/sec all|curr), 46.2%|46.2% prefetched (all|curr), ETA 2018-03-10 06:44:13 [max 42562868]
*** 2018-03-07 02:05:22: idwiki (ID 33510) 2305917 pages (28.8|581109.1/sec all|curr), 13032875 revs (163.0|220.5/sec all|curr), 51.2%|51.2% prefetched (all|curr), ETA 2018-03-07 03:18:23 [max 13746788]

Mae a custom dblist file with the above wikis, ran the below and grepped the output files:

python ./onallwikis.py -c /home/ariel/wikidump.conf.atg -f "{w}-rev_schema.gz" -o '/mnt/dumpsdata2/crontest/schemas-{d}' --verbose -q "'show create table revision;'"

All of these wikis have the rev_page_id index. I'll do a test of the page content dump for some small number of pages, for each of these wikis with the order by setting set in the config, to be sure I didn't miss something.

Change 431533 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] dump stubs with ascending rev ids per page for certain wikis

https://gerrit.wikimedia.org/r/431533

Tested:

for wikiname in $list; do
> /usr/bin/php7.0 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=$wikiname --full --stub --report=1000 --output=file:"/mnt/dumpsdata2/crontest/temp/${wikiname}-stubs-stuff" --orderrevs --start=1  --end 10
> done

Output looks good, no whines, and the above commands were generated by dryruns against the config with orderrevs set.

Change 431533 merged by ArielGlenn:
[operations/puppet@production] dump stubs with ascending rev ids per page for certain wikis

https://gerrit.wikimedia.org/r/431533

Change 444838 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] make stub dumps on a couple more wikis ordered by revisions within pages

https://gerrit.wikimedia.org/r/444838

Change 444838 merged by ArielGlenn:
[operations/puppet@production] make stub dumps on a couple more wikis ordered by revisions within pages

https://gerrit.wikimedia.org/r/444838

Change 444841 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn):
[operations/puppet@production] one more wiki with order by revs for stubs dump

https://gerrit.wikimedia.org/r/444841

Change 444841 merged by ArielGlenn:
[operations/puppet@production] one more wiki with order by revs for stubs dump

https://gerrit.wikimedia.org/r/444841

Welp. Due to https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/459885/ which unilaterally changed this everywhere, probably unintentionally, this can be closed.