Page MenuHomePhabricator

Sudden reversion to old version of page ("lastrevid" != "revid")
Closed, ResolvedPublic

Description

On Chrome Windows 7, I noted twice today that ANI on English Wikipdia was not displaying current threads, but was displaying threads three days old. Refreshing did not make a difference, although occasionally it would display the current page.

The issue evidently struck others as well, as a user who tried to add a new section wound up adding it to the old material instead of the current page: https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&diff=595313330&oldid=595312917

This seems to have happened again, here:
https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&diff=595314104&oldid=595313971

On the IRC English Wikipedia admin's channel, an admin noted that every time he tried to load it, the page was showing a post from 21:53, 10 February 2014 (UTC) at the very bottom.

When I communicated with an editor about the issue here - https://en.wikipedia.org/w/index.php?title=User_talk:NE_Ent&oldid=595314919 - I noticed that another editor had been impacted in the section above.


Version: wmf-deployment
Severity: major

Details

Reference
bz61319

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:08 AM
bzimport set Reference to bz61319.

(In reply to Maggie Dennis from comment #0)

On Chrome Windows 7, I noted twice today that ANI on English Wikipdia was
not displaying current threads, but was displaying threads three days old.
Refreshing did not make a difference, although occasionally it would display
the current page.

Did somebody try purging ([[WP:Purge]])?

Is this really a "reversion" in the sense of reverting changes, or maybe just an old version being delivered and displayed for some people (caching issues)?

Oh, yes. People have tried purging repeatedly. The old version is delivered and displayed erratically - sometimes I am seeing the current version and other times the old. If you look at the history, you can see that it is still happening - https://en.wikipedia.org/w/index.php?title=Wikipedia:Administrators%27_noticeboard/Incidents&action=history

There is also discussion at Village Pump/Technical, although i don't know if it will help:
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Discussions_disappearing_and_reappearing

Coren speculated earlier that it might be related to "new section", and when you look at the history of ANI there does seem to be something to that.

However, when it happened to me just a few minutes ago (https://en.wikipedia.org/w/index.php?title=Wikipedia%3AAdministrators%27_noticeboard%2FIncidents&diff=595354506&oldid=595351438) I had intended to edit the last section only. I can't be sure I did, because I wiped out the pre-built edit summary. But I was aware that the problem might be related to this and intended to avoid it, anyway.

An hour ago there was a problem with esams cluster, similar to what happened on bug 54647 and then tracked on bug 56545, so the problem seems to be the same: cluster fails and we're getting cached pages, which is better than having no pages at all

This is not good.

API query: https://en.wikipedia.org/w/api.php?action=query&prop=info|revisions&rvlimit=1&format=jsonfm&pageids=2535910&servedby=1

{

"query-continue": {
    "revisions": {
        "rvcontinue": 595381322
    }
},
"servedby": "mw1192",
"query": {
    "pages": {
        "2535910": {
            "pageid": 2535910,
            "ns": 4,
            "title": "Wikipedia:Reference desk/Science",
            "contentmodel": "wikitext",
            "pagelanguage": "en",
            "touched": "2014-02-14T00:41:03Z",
            "lastrevid": 595381347,
            "counter": "",
            "length": 112194,
            "revisions": [
                {
                    "revid": 595381347,
                    "parentid": 595381322,
                    "minor": "",
                    "user": "SineBot",
                    "timestamp": "2014-02-14T00:41:03Z",
                    "comment": "Signing comment by [[Special:Contributions/68.41.73.11|68.41.73.11]] - \"/* Freezing point? */ new section\""
                }
            ]
        }
    }
}

}

The "lastrevid" field and the "revid" field in revisions should be the same. I suspect that some of the slave DBs are somehow screwed up and haven't gotten the page_latest field updared

Oops, pasted the wrong copy.

{

"query-continue": {
    "revisions": {
        "rvcontinue": 595381322
    }
},
"servedby": "mw1205",
"query": {
    "pages": {
        "2535910": {
            "pageid": 2535910,
            "ns": 4,
            "title": "Wikipedia:Reference desk/Science",
            "contentmodel": "wikitext",
            "pagelanguage": "en",
            "touched": "2014-02-14T00:41:03Z",
            "lastrevid": 594888322,
            "counter": "",
            "length": 80791,
            "revisions": [
                {
                    "revid": 595381347,
                    "parentid": 595381322,
                    "minor": "",
                    "user": "SineBot",
                    "timestamp": "2014-02-14T00:41:03Z",
                    "comment": "Signing comment by [[Special:Contributions/68.41.73.11|68.41.73.11]] - \"/* Freezing point? */ new section\""
                }
            ]
        }
    }
}

}

More data:

anomie@terbium:/usr/local/apache/common-local$ for db in 'db1055' 'db1043' 'db1037' 'db1049' 'db1051' 'db1056'; do echo $db; echo -e 'select page_latest from page where page_id=2535910;' | mwscript sql.php --wiki=enwiki --slave=$db; done
db1055
stdClass Object
(

[page_latest] => 595381347

)
db1043
stdClass Object
(

[page_latest] => 595381347

)
db1037
stdClass Object
(

[page_latest] => 595381347

)
db1049
stdClass Object
(

[page_latest] => 595381347

)
db1051
stdClass Object
(

[page_latest] => 595381347

)
db1056
stdClass Object
(

[page_latest] => 594888322

)

So db1056 seems out of sync somehow.

Change 113322 had a related patch set uploaded by Springle:
depol db1056 for pt-table-sync checks bug 61319

https://gerrit.wikimedia.org/r/113322

Change 113322 merged by jenkins-bot:
depol db1056 for pt-table-sync checks bug 61319

https://gerrit.wikimedia.org/r/113322

db1056 has been depooled for a sync check, and the remaining slaves will get the same treatment in rotation jic.

db1056 was demoted from master a couple weeks ago, backed up and then eventually rebuilt from another unpooled s1 slave, db1050. It's possible the original problem lies on that box.

Sean: Anything left to do / investigate here or can this be closed as FIXED?

Sean: Anything left to do / investigate here or can this be closed as FIXED?