Page MenuHomePhabricator

CodeReview doesn't show ancient revisions for a path
Closed, ResolvedPublic

Description

See example link, there were more commits under that path. Reproduceable on other paths, too.


Version: unspecified
Severity: major
URL: http://www.mediawiki.org/w/index.php?path=%2Ftrunk%2Fextensions%2FCite%2Fcite_text&title=Special%3ACode%2FMediaWiki%2Fpath

Details

Reference
bz23720

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:02 PM
bzimport set Reference to bz23720.
bzimport added a subscriber: Unknown Object (MLST).

As it turned out, it's on purpose, see CodeRevision::getPathConds():

			// performance
			'cp_rev_id > ' . ( $this->mRepo->getLastStoredRev() - 20000 ),

Marking this as dependent on bug 24479 which should address DB performance.

mysql> describe SELECT cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM mw_code_paths INNER JOIN mw_code_rev ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN mw_code_comment ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id)) WHERE cp_repo_id = '3' AND (cp_path LIKE '/trunk/extensions/Cite/cite\_text%' ) AND (cp_rev_id > 58352) GROUP BY cp_rev_id ORDER BY cp_rev_id;
+----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+

1SIMPLEmw_code_revrangePRIMARY,cr_repo_id,cr_repo_author,cr_idPRIMARY8NULL21804Using where; Using temporary; Using filesort
1SIMPLEmw_code_pathsrefPRIMARYPRIMARY8const,wikidb.mw_code_rev.cr_id2Using where; Using index
1SIMPLEmw_code_commentrefcc_repo_id,cc_repo_timecc_repo_id8wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id1Using index

+----+-------------+-----------------+-------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+----------------------------------------------+
3 rows in set (0.00 sec)

mysql> describe SELECT cp_rev_id,cr_id,cr_repo_id,cr_status,COUNT(DISTINCT cc_id) AS comments,cr_path,cr_message,cr_author,cr_timestamp FROM mw_code_paths INNER JOIN mw_code_rev ON ((cr_repo_id = cp_repo_id AND cr_id = cp_rev_id)) LEFT JOIN mw_code_comment ON ((cc_repo_id = cp_repo_id AND cc_rev_id = cp_rev_id)) WHERE cp_repo_id = '3' AND (cp_path LIKE '/trunk/extensions/Cite/cite\_text%' ) GROUP BY cp_rev_id ORDER BY cp_rev_id;
+----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+

1SIMPLEmw_code_revrefPRIMARY,cr_repo_id,cr_repo_author,cr_idPRIMARY4const49838Using temporary; Using filesort
1SIMPLEmw_code_pathsrefPRIMARYPRIMARY8const,wikidb.mw_code_rev.cr_id2Using where; Using index
1SIMPLEmw_code_commentrefcc_repo_id,cc_repo_timecc_repo_id8wikidb.mw_code_paths.cp_repo_id,wikidb.mw_code_rev.cr_id1Using index

+----+-------------+-----------------+------+-----------------------------------------+------------+---------+----------------------------------------------------------+-------+---------------------------------+
3 rows in set (0.00 sec)

ayg wrote:

As Roan suggests, this should be fixed by storing paths more redundantly, so the LIKE can be made into an equality check. Otherwise there's no way to get it to play nice with the grouping/ordering.

(In reply to comment #3)

As Roan suggests, this should be fixed by storing paths more redundantly, so
the LIKE can be made into an equality check. Otherwise there's no way to get
it to play nice with the grouping/ordering.

Elaborating on my suggestion on IRC:

Currently we only store the path of each changed file in the paths table. If instead we also stored the paths of all of their ancestors, we could simply do WHERE cp_path='/trunk/phase3' instead of WHERE cp_path LIKE '/trunk/phase3%' , which will search directories recursively.

For example, for a commit that touches ApiQueryBase.php and MessagesEn.php, we'd store the following paths:

/trunk/phase3/includes/api/ApiQueryBase.php
/trunk/phase3/includes/api
/trunk/phase3/includes
/trunk/phase3/languages/messages/MessagesEn.php
/trunk/phase3/languages/messages
/trunk/phase3/languages
/trunk/phase3
/trunk

This would improve query performance at the expense of table size.

Interesting. We'd also need to presumably back populate this for the old revs...

No database changes, just some more coding changes to split stuff down...

(In reply to comment #5)

Interesting. We'd also need to presumably back populate this for the old
revs...

Yes. I could do that locally and measure the size increase that way.

No database changes, just some more coding changes to split stuff down...

Exactly.

r79379, r79381, r79382, r79384...

Before:
mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3 AND cp_rev_id > 1 AND cp_rev_id < 1000;
+----------+

count(*)

+----------+

2580

+----------+
1 row in set (0.02 sec)

After:

mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3 AND cp_rev_id > 1 AND cp_rev_id < 1000;
+----------+

count(*)

+----------+

6600

+----------+
1 row in set (0.00 sec)

After the first 1000 done

mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3;
+----------+

count(*)

+----------+

516313

+----------+
1 row in set (0.26 sec)

After the whole repo (78352 locally)

mysql> select count(*) from mw_code_paths WHERE cp_repo_id = 3;
+----------+

count(*)

+----------+

988825

+----------+
1 row in set (1.93 sec)

Not really fast, but hardly slow either. Took a few minutes to execute