Author: erikzachte
Description:
There is growing audience for revert stats. Nimisz Gautam and Erik Zachte both made scripts to generate revert stats based on comparing revisions in the dumps via MD5 sums. Rob Lanphier expects MD5 can be used for even fancier processing.
Right now the only way to harvest MD5's is by parsing the full archive dumps which takes forever.
Proposal is to store MD5's in stub dumps for every revision. This would allow monthly refresh of revert stats (see URL above) and regular publication of revert data files for researchers.
e.g.
<page> <title>United States Declaration of Independence</title> <id>19</id> <revision> <id>1926607</id> <timestamp>2010-06-15T22:06:14Z</timestamp> <contributor> <username>Innotata</username> <id>172490</id> </contributor> <text id="1894246" /> <md5>eff7d5dba32b4da32d9a67a519434d3f</md5> </revision> </page>
Version: unspecified
Severity: enhancement
URL: http://stats.wikimedia.org/EN/EditsRevertsEN.htm