Page MenuHomePhabricator

User edit counts (user.user_editcount field) is often wrong
Open, LowPublic

Description

Example user: [[User:Joao]]. On the Toolserver's copy of enwiki_p:

mysql> SELECT user.user_editcount FROM user WHERE user_name="Joao"\G
*************************** 1. row ***************************
user_editcount: 266
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM revision WHERE rev_user_text = "Joao" GROUP BY rev_user_text\G
*************************** 1. row ***************************
COUNT(*): 265
1 row in set (0.03 sec)

mysql> SELECT COUNT(*) FROM archive WHERE ar_user_text = "Joao" GROUP BY ar_user_text\G
*************************** 1. row ***************************
COUNT(*): 35
1 row in set (0.01 sec)

This isn't an anomaly. Many users, esp. users with higher edit counts, have inaccurate values stored. The values don't match the number of deleted or live contributions.

Part of the problem seems to stem from the fact that the initEditCount.php maintenance script doesn't account for deleted contributions.

We're currently advertising an edit count (in Special:Preferences and elsewhere) that isn't accurate.

Details

Reference
bz19311

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:44 PM
bzimport set Reference to bz19311.
bzimport added a subscriber: Unknown Object (MLST).

I think there are at least two problem, which generate the difference between the "normal" edit counters on the toolserver and the user_editcount field. The first thing seems to be the problem with deleted edits. As the poster of this bug writes, the initEditCount.php doesn't account for deleted contributions. This is the correct behavior, as all edit counters don't count these. But after initializing user_editcount, only incEditCount() in User.php seems to be called, which increases user_editcount. But when a page is deleted, user_editcount is not decreased. So user_editcount is the number of all edits a user did (deleted and not deleted) minus all deleted edits up to the time, initEditCount() was called.

The second thing is an older bug, which results in having deleted revisions in the revisions table, which should be in the archive table. Therefore all edit counters check, if the rev_page id exists in the page table (this is from de.wikipedia):

SELECT count(*) FROM revision WHERE rev_user=10276;
-> 39702
SELECT count(*) FROM revision, page WHERE rev_user=10276 AND rev_page=page_id;
-> 39688

The 14 edits are from 2005/2006.
SELECT * FROM revision WHERE rev_user=10276 AND rev_page NOT IN(SELECT page_id FROM page);

I don't know if this bug exists anymore, but it doesn't seem so, because the last one for me was from March 2006. These were newly created redirects (mostly by moving a page), which were deleted later, but the moving message wasn't moved to archive. Because I think the bug was fixed, maybe a maintenance script would be good, moving all revisions with a rev_page id, which is not in the page table to the archive table.

soxred93 wrote:

It may be quite possible to...

a) create a maintenance script that replaces every user_editcount field with the result of SELECT COUNT(*) AS count FROM revision WHERE rev_user_text = 'Example';
b) set the function in the User class which gets the edit count to just do that SQL query.

However, for users with a large number of edits, this is very slow. This may be out of our reach. Might this be possible?

(In reply to comment #2)

a) create a maintenance script that replaces every user_editcount field with
the result of SELECT COUNT(*) AS count FROM revision WHERE rev_user_text =
'Example';

This is essentially what initEditCount.php does: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/initEditCount.php?view=markup

b) set the function in the User class which gets the edit count to just do that
SQL query.

Way too expensive. Even with the index on rev_user_text, you're talking about millions of rows with some of these users. The value must be stored so that it can be easily retrieved for things like creating the 'edit' links or not (autoconfirm checks this field). There might be other creative ways of updating it, though, like every time a user logs in.

RESOLVED INVALID?

[[mw:Manual:User table]]:

user_editcount

Count of edits and edit-like actions.
*NOT* intended to be an accurate copy of COUNT(*) WHERE rev_user=user_id. May contain NULL for old accounts if batch-update scripts haven't been run, as well as listing deleted edits and other myriad ways it could be out of sync. Execute the script initEditCount.php to update this table column.
Meant primarily for heuristic checks to give an impression of whether the account has been used much.

(In reply to comment #4)

I don't think this is invalid. Just because its not perfect now doesn't mean we can't do better.

But first of all perhaps we should add "approximently" to the edit counter on prefs

Here is a plan:

1 Add a row user_deletedcount
2 Modify the delete code to add to user_deletedcount and subtract from user_editcount
3 Modify the undelete code similarly
4 Re-initialise both rows
5 Modify edit count display code to display the numbers we want it to