Page MenuHomePhabricator

Rollback does not restore linking tables (categorylinks, pagelinks)
Open, MediumPublic

Description

Author: jaga_x_1

Description:
If a vandal blanks a page and that edit is rolled back, the page is restored to Wikipedia but pagelinks is not restored. As far as pagelinks is concerned, that page has no outgoing links. The problem goes away with a null edit.

I've seen this happen with page blanking and when vandals replace content with "hi" etc. Also, I've seen it when ClueBot does the rollback, and when a Huggle user does the rollback, so it isn't a bot-only problem or something like that.

This could happen with every rollback. It has come to my attention because I've created a toolserver page that builds a list of deadend pages, and I kept getting articles that had no business whatsoever in the list - Yoko Ono, for instance.

How to reproduce:

  1. Pick one of these articles I've found to exhibit the bug (as of time of writing): "2006 Winter Olympics", "Infectious disease", "List of poems"
  1. Verify that the last edit in its history is a user rolling back blanking vandalism
  1. Pick any link on the page (on "Infectious disease", I chose "Clostridium difficile") and click it
  1. Click "What links here" for the random link
  1. Search the "What links here" list for your article, it isn't there

This is causing a problem for me because if it weren't for this bug, my deadend articles list would be suitable for a bot to use for auto-tagging. But because of this, everything must be reviewed manually.


Version: 1.14.x
Severity: normal

Details

Reference
bz17154

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:27 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz17154.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

  1. Pick any link on the page (on "Infectious disease", I chose "Clostridium

difficile") and click it

  1. Click "What links here" for the random link
  1. Search the "What links here" list for your article, it isn't there

A faster way is to use http://en.wikipedia.org/w/api.php?action=query&prop=links|revisions&titles=2006_Winter_Olympics|Infectious_disease|List_of_poems which lists all outgoing links (according to the pagelinks table that is) on the given pages and their latest revision. And you're right: those lists are empty and the last revision is a revert (at the time or writing).

  • Bug 18267 has been marked as a duplicate of this bug. ***
  • Bug 17686 has been marked as a duplicate of this bug. ***

I can't seem to reproduce this. If this is still an issue, it must be some sort of transitory fluke.

craigbear wrote:

Here's an example of this bug in action:

University of Warsaw was [http://en.wikipedia.org/w/index.php?title=University_of_Warsaw&oldid=359035641 vandalized on April 29], and [http://en.wikipedia.org/w/index.php?title=University_of_Warsaw&diff=359035647&oldid=359035641 reverted by ClueBot] within *seconds*. However, the article remained listed on the [http://toolserver.org/~jason/uncategorized_articles.php uncategorized articles] list for a full month, even though categories *were* present on the article, until I caught it on May 30 and did a no-edit save to the article to force it off the list.

craigbear wrote:

Another one: Bulgars was blanked and reverted on May 19; the page is still listed as an "uncategorized" article, even though categories are present, as I write this on June 6.

This needs to be fixed, because we can't deal with uncategorized articles in a quick and timely manner (e.g. setting a bot loose to tag them) if it isn't.

  • Bug 23336 has been marked as a duplicate of this bug. ***

I can't seem to reproduce this. If this is still an issue, it must be some sort of transitory fluke.

A regular (or null) edit seems to immediately update the *links tables. I'm not sure if a rollback does the same. My guess is that instead of doing an immediate update, a rollback edit punts updates to the job queue. If rollback edits pass off their work to the job queue, depending on the assigned number of job runners and the size of the queue, it could be a lot faster or slower to see updated *links records, which might explain the intermittency of the problem. Maybe. This is just a guess based on a number of questionable assumptions.

Is this still an issue?

Seems to be, see new merged task

This is possible a slave lag issue, because sometimes it works. It is not a problem on all rollbacks (in the past)