Page MenuHomePhabricator

New revisions occasionally created with wrong text (but correct rev_len)
Closed, ResolvedPublic

Description

It appears that recently some edits on the English Wikipedia (possibly elsewhere too?) have resulted in revisions that are blank or contain text from other, unrelated pages. Oddly, the byte count reported in the page history (based on the rev_len field), as well the corresponding information in the recentchanges table, match the content that _should've_ been there.

For example, the revision http://en.wikipedia.org/w/index.php?title=Talk:Pikachu&oldid=227969847 is blank, even though the page history reports its length as 22,396 bytes. See also discussion at:

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Bug:_revisions.2Fpagesizes.2Fpagerendering.2Fwikisource_not_matching_up.2C_resulting_in_blanking_or_page_replacements
http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents#SYSTEM_BUG:_rollback_replaced_a_page_by_an_irrelevant_page_instead_of_reverting

I'm marking this as critical in case this is a symptom of more serious database corruption. Feel free to downgrade if it turns out to be something more benign.


Version: unspecified
Severity: critical
URL: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Bug:_revisions.2Fpagesizes.2Fpagerendering.2Fwikisource_not_matching_up.2C_resulting_in_blanking_or_page_replacements

Details

Reference
bz14933

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 21 2014, 10:15 PM
bzimport set Reference to bz14933.
bzimport added a subscriber: Unknown Object (MLST).

herd wrote:

from http://toolserver.org/~amidaniel/chanlogs/%23mediawiki/20080726.txt ->

[09:35:22] <Sadik_Khalid> Hi, when I tried to edit this page (http://ml.wikipedia.org/wiki/%E0%B4%B2%E0%B5%82%E0%B4%AF%E0%B4%BF_%E0%B4%AA%E0%B4%BE%E0%B4%B8%E0%B5%8D%E0%B4%9A%E0%B4%B0%E0%B5%8D%E2%80%8D) I am getting Egypt page (http://ml.wikipedia.org/wiki/Egypt)
[09:37:45] <Sadik_Khalid> History page don't mach with the content of the article

Changing title since this occurs outside enwiki.

Possibly related to bug 14930

Also may be related to the recent ext. storage problems on one cluster (https://wikitech.leuksman.com/view/Server_admin_log)

OK, I can't find any relevant software changes. I'm almost sure this is due to the above issue. As things are now, as of now, no *new* edits should be recorded wrongly anymore.

jeluf wrote:

This happened due to a master switch on the external storage cluster.

Apparently, the new master didn't have an up-to-date replica of the master, a few records were missing. Due to this, the same text IDs were used twice. The edits saved on the old master that were not replicated to the new master are lost, no way to get them back.

I have to close this bug as "FIXED" because there's no "CANTFIX"

It wasn't fixed. srv104 still had an old copy of the configuration (because it's not reachable by ssh), and so it was still writing blobs to srv101. I've taken srv104 out of LVS rotation now. Maybe we'll be able to recover the edits from srv101 at some point, but it looks like it might be hanging on I/O now.

jeroenvrp wrote:

I can confirm this on nl.wikipedia to.

See e.g. http://nl.wikipedia.org/w/index.php?title=Yang_Yaozu&diff=13286529&oldid=13139324

In the recent changes this revision have added 15 bytes, but the page is empty:
http://nl.wikipedia.org/w/index.php?title=Yang_Yaozu&action=edit&oldid=13286529

See also http://nl.wikipedia.org/w/index.php?title=Yang_Yaozu&action=history (2.159 bytes vs. 2.144 bytes).

jeroenvrp wrote:

Ok I didn't saw it was fixed.

herd wrote:

Unsure if related, but these do not show the revision #798283:

And yet, these do (sort of):

Although, Per VP/T Tim said:

It looks like the anomalous blank revisions are just cache pollution, and will
fix themselves when the cache expires in a week. The revisions that show the
wrong article are due to database corruption, and will need to be fixed manually.

daniel wrote:

This edit is attributed to my bot
http://commons.wikimedia.org/w/index.php?title=Image%3AHyena_pup.jpg&diff=13062289&oldid=12189366

But it is pretty much impossible that the bot performed it (nothing remotely similar to CopyVio tagging is in the source code).

Might be due to the same server issue, although the nature of the glitch seems different from the ones reported.

(In reply to comment #13)

But it is pretty much impossible that the bot performed it (nothing remotely
similar to CopyVio tagging is in the source code).

Might be due to the same server issue, although the nature of the glitch seems
different from the ones reported.

Also note that the length reported in the history is larger than the edit.
I understand this happens becaouse the write goes to the false master and
then the real one reuses the same revision id.

Probably we could find between the deleted revisions at a similar time,
another with that same content.

Another magic blanking:
http://es.wikipedia.org/w/index.php?title=Wikipedia:Vandalismo_en_curso&diff=19107113&oldid=19107017

Should be fixed as of July 30, 03:00 UTC. Initially, ordinary edits processed by srv101/srv104 polluted the revision cache, which has an expiry of one week. This was identified and fixed (without me ever seeing this bug report) on July 27, by removing those servers from HTTP LVS. However, they continued to run the job queue, and refreshLinks jobs would have continued to pollute the revision cache. This was fixed on July 30, by firewalling srv101/104 from all core DB servers.

I'm running a script to fix the revision cache. This will make the old revision view and old revision edit work properly. Any broken diffs will have to be fixed manually by appending &action=purge to the diff URL.

Note that the script only affects page blankings (which are due to cache pollution), not replacement with unrelated text, which is due to corruption of the core DB with incorrect text rows referencing blob_ids on the old cluster17 master, srv101.