Page MenuHomePhabricator

Add revision.rev_length column to track revision sizes
Closed, ResolvedPublic

Description

As discussed on IRC, I would like to propose two extra fields for the database
schema:

table revision: rev_len unsigned integer

This field would contain the length of the revision's raw text, same as page_len
in page table. Having this field would tremendously help vandal-fighting bots,
as it will allow simple queries for page blanking and bulk imports (fairly
common forms of vandalism). It will also reduce the load on the server from such
tools, because the raw text will not be needed in many cases. The length will,
potentially, allow much more sophisticated analysis then what the next,
rc_change field would allow.

table recentchanges: rc_change signed integer

This field would contain the size of the change (delta) between two revisions
(either positive or negative). This change would also allow for quick vandalism
lookups.


Version: unspecified
Severity: enhancement

Details

Reference
bz6277

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:20 PM
bzimport set Reference to bz6277.
bzimport added a subscriber: Unknown Object (MLST).

robchur wrote:

Adding revision.rev_len is going to require us to run a script to update the
field for all revisions...

rc_change is a fairly easy change, and probably should have been filed as a
separate request. Its benefits will immediatelly benefit detection of any
blanking/dumping vandalisms.

If we do this, it'll have to be after 1.7 branch and we'll
need to schedule downtime to upgrade the tables.

(In reply to comment #3)

If we do this, it'll have to be after 1.7 branch and we'll
need to schedule downtime to upgrade the tables.

Is there a meta page that puts together all such requests so that when an update
is scheduled, all changes can be done at once?

robchur wrote:

Let's keep discussion and stuff about MediaWiki on MediaWiki.org, eh?

robchur wrote:

recentchanges.rc_old_len and recentchanges.rc_new_len have been added.

titoxd.wikimedia wrote:

I recommend against this, as just in the English Wikipedia, it would require
updating 96 million revisions. That is some major processing time. Besides,
judging by the replies to adding a similar visible feature to
[[Special:Watchlist]], it's going to annoy some people anyways.

Note that NULL values could be left on old rows to minimize conversion
requirements. It's still a table change, but we have a good handle on how to do
that now.

Note also that keeping the data is a separate issue from displaying googly
colored thingies on history lists.

robchur wrote:

*** This bug has been marked as a duplicate of 1723 ***