Page MenuHomePhabricator

Mistake in old user contributions diff size
Closed, ResolvedPublic

Description

Author: daniel_money

Description:
There appears to be a bug in reporting the size of diffs in old user contributions. Both http://en.wikipedia.org/w/index.php?title=Special:Contributions/Dpmuk&dir=prev&target=Dpmuk and http://en.wikipedia.org/w/index.php?title=Special:Contributions/Goodvac&dir=prev&target=Goodvac show this (pretty much any large change is an error, which should be obvious by clicking on the diff). It would appear that showing a single diff in the page history has a similar problem, e.g. http://en.wikipedia.org/w/index.php?title=House_of_Burgesses&dir=prev&offset=20071119010057&limit=1&action=history but along with other versions it's fine, e.g. http://en.wikipedia.org/w/index.php?title=House_of_Burgesses&dir=prev&offset=20071117205851&limit=5&action=history . This only appears to affect old contributions.


Version: unspecified
Severity: minor

Details

Reference
bz34922

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:20 AM
bzimport set Reference to bz34922.
bzimport added a subscriber: Unknown Object (MLST).

Special:Contributions is getting the size from the parent rev by using the rev_parent_id field in the database. Maybe in this case the parent_id is not populate.

(In reply to comment #1)

Special:Contributions is getting the size from the parent rev by using the
rev_parent_id field in the database. Maybe in this case the parent_id is not
populate.

Seems to be bang on the money. rev_parent_id is not fully populated, and each of these bad diff sizes correlates to NULL. Page creations (presumably what this is conflating NULL with) in fact have 0 as their parent_id.

As for the fix, I think a good pro-tem measure would be to not display the erroneous character changes, and then to look at properly populating the parent_id field. Trying to seek backwards to do generate the parent_id on the fly seems very inefficient. (Incidentally, the watchlist and RC look at the RC table, which has old_bytes and new_bytes. Unfortunately we shan't be able to replicate that behaviour here.)

Actually upon further examination on a test wiki, merely change rev_parent_id to NULL does not seem to artificially create the problem. So it seems that correlation does not imply causation...

Per the wrong diff size on https://en.wikipedia.org/w/index.php?title=Milton_Keynes&action=history&year=&month=-1&tagfilter=possible+vandalism (reported on en.wp's VPT), this seems to be a wider problem, where the history page simply does not try hard enough to establish the previous revision.

(In reply to comment #4)

Per the wrong diff size on
https://en.wikipedia.org/w/index.php?title=Milton_Keynes&action=history&year=&month=-1&tagfilter=possible+vandalism
(reported on en.wp's VPT), this seems to be a wider problem, where the history
page simply does not try hard enough to establish the previous revision.

This is probably a separate bug. (Related to r100722)

(In reply to comment #5)

This is probably a separate bug. (Related to r100722)

[edit conflict note: agreed :) ]

Okay, so there's two different effects at work here. The one that prompted this
thread is simple enough: line 795 of SpecialContributions.php reads:

$parentLen = isset( $this->mParentLens[$row->rev_parent_id] ) ?
$this->mParentLens[$row->rev_parent_id] : 0;

Thus turning a NULL ("never recorded") into a 0 ("page creation").

Comment #2 by me outlines what I think should happen here. The other, related
issue (which I mentioned in comment #4), I have split off into bug #34978.

(In reply to comment #3)

Actually upon further examination on a test wiki, merely change rev_parent_id
to NULL does not seem to artificially create the problem. So it seems that
correlation does not imply causation...

Oh, apologies for the bugspam, but I really should retract that, since I was looking at the wrong page to test. This bug *is* perfectly reproducible using rev_parent_id, just to confirm.

Made the code more tolerant of a null rev_parent_id in r112995.

Marking bug fixed. I think Jarry1250 is writing another bug requesting db cleanup for rev_parent_id.