Page MenuHomePhabricator

rollback token system is randomly broken
Closed, InvalidPublic

Description

Since we switched in huggle to new rollback token system where rollback token is retrieved in separate query, we are facing issues where token we get is randomly invalid. It work in about 90% of cases, but sometimes, the token is invalid (badtoken) and even if we fallback to previous method, the new token we get is also invalid. Bellow some examples:
Working case:

token

OUT https://en.wikipedia.org/w/api.php?action=query&meta=tokens&type=watch%7Cpatrol%7Crollback&rawcontinue=1&format=xml
IN <?xml version="1.0"?><api><query><tokens watchtoken="e11b5b5346ce0a744f0e31278a7cf7e0545f2aee+\" patroltoken="5789d00efc3dd2a7ca4c586cd10348ad545f2aee+\" rollbacktoken="d228eb077234bf71dbf98a015b989abf545f2aee+\" /></query></api>

reverts

OUT https://en.wikipedia.org/w/api.php?action=rollback&format=xml&assert=user title=Jes%C3%BAs%20Murillo%20Karam&token=d228eb077234bf71dbf98a015b989abf545f2aee%2B%5C&watchlist=nochange&user=198.208.159.18&summary=Reverted%20edits%20by%20%5B%5BSpecial%3AContributions%2F198.208.159.18%7C198.208.159.18%5D%5D%20%28%5B%5BUser%20talk%3A198.208.159.18%7Ctalk%5D%5D%29%20%28%5B%5BWP%3AHG%7CHG%5D%5D%29
IN <?xml version="1.0"?><api><rollback title="Jesús Murillo Karam" pageid="10579766" summary="Reverted edits by [[Special:Contributions/198.208.159.18|198.208.159.18]] ([[User talk:198.208.159.18|talk]]) ([[WP:HG|HG]])" revid="633067555" old_revid="633067544" last_revid="630729747" /></api>
BUG:

token

OUT https://en.wikipedia.org/w/api.php?action=query&meta=tokens&type=watch%7Cpatrol%7Crollback&rawcontinue=1&format=xml
IN <?xml version="1.0"?><api><query><tokens watchtoken="2e6847e19ae3ecf0da33c7aa2bb409ec545fb203+\" patroltoken="088745a10c40d65413b94e483c0a050e545fb203+\" rollbacktoken="3de1b08c3de9a4fe792182e8b2bf5d67545fb203+\" /></query></api>

reverts

OUT https://en.wikipedia.org/w/api.php?action=rollback&format=xml&assert=user title=List%20of%20college%20athletic%20programs%20in%20California&token=3de1b08c3de9a4fe792182e8b2bf5d67545fb203%2B%5C&watchlist=nochange&user=216.160.129.240&summary=Reverted%20edits%20by%20%5B%5BSpecial%3AContributions%2F216.160.129.240%7C216.160.129.240%5D%5D%20%28%5B%5BUser%20talk%3A216.160.129.240%7Ctalk%5D%5D%29%20%28%5B%5BWP%3AHG%7CHG%5D%5D%29
IN <?xml version="1.0"?><api servedby="mw1146"><error code="badtoken" info="Invalid token" xml:space="preserve">See https://en.wikipedia.org/w/api.php for API usage</error></api>
then huggle try to fix it on next revert falling back to:
OUT https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=ids%7Cuser%7Ctimestamp%7Ccomment&rvlimit=1&rvtoken=rollback&rvstartid=633119006&rvdiffto=prev&titles=Economy%20of%20Milan&rawcontinue=1&format=xml
response from mediawiki with new token
IN <?xml version="1.0"?><api><query-continue><revisions rvcontinue="633113435" /></query-continue><warnings><revisions xml:space="preserve">The rvtoken parameter has been deprecated.</revisions></warnings><query><pages><page pageid="24724090" ns="0" title="Economy of Milan"><revisions><rev revid="633119006" parentid="633113435" user="Sushi242" timestamp="2014-11-09T18:27:49Z" comment="Starting to add edits from talk page and word document into the article main space, includes some missing citations" rollbacktoken="df2624df67619c7b916774002ac79bab545fb226+\"><d.....
huggle does a next revert using this new token
OUT https://en.wikipedia.org/w/api.php?action=rollback&format=xml&assert=user title=Tipi&token=df2624df67619c7b916774002ac79bab545fb226%2B%5C&watchlist=nochange&user=73.18.143.85&summary=Reverted%20edits%20by%20%5B%5BSpecial%3AContributions%2F73.18.143.85%7C73.18.143.85%5D%5D%20%28%5B%5BUser%20talk%3A73.18.143.85%7Ctalk%5D%5D%29%20%28%5B%5BWP%3AHG%7CHG%5D%5D%29
IT FAILS
IN <?xml version="1.0"?><api servedby="mw1199"><error code="badtoken" info="Invalid token" xml:space="preserve">See https://en.wikipedia.org/w/api.php for API usage</error></api>
Notes:

This bug happens only when high traffic is on wiki, especially in peak hours when lot of vandals are being reverted (15+ reverts per minute). It can be reproduced only on wikimedia installation (for example english wikipedia) not custom mw installs, and it happens only rarely (1 from 20 sessions). I believe it is related to some api caching, memory caches or buffer overwrites.

Version: 1.25-git
Severity: normal

Details

Reference
bz73207

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:45 AM
bzimport set Reference to bz73207.
bzimport added a subscriber: Unknown Object (MLST).

Note that it is nowhere described how this new revert system should be used, whether the revert token retrieved by meta query is permanent, or if it changes for every revert.

Can you explain to me based on what you assessed this as low priority? It blocks the next release of huggle. Only solution for us is to revert back to deprecated method which may be removed entirely from MediaWiki, leaving no working option.

(In reply to Peter Bena from comment #1)

whether the revert token retrieved by meta query is permanent, or if it
changes for every revert.

A new token is not required for for every revert, although there's nothing stopping you from fetching a new token each time if you want.

The token returned will change each time you fetch one due to the new time-limited token functionality,[1] but at the moment any of the returned tokens should be valid until the session expires or something triggers a change to the token-secret in the session.

[1]: https://lists.wikimedia.org/pipermail/wikitech-l/2014-October/079092.html

(In reply to Peter Bena from comment #2)

Only solution for us is to revert back to
deprecated method which may be removed entirely from MediaWiki, leaving no
working option.

But you yourself state in comment 0 "and even if we fallback to previous method, the new token we get is also invalid". So how can you claim in comment 2 that you could use that method?

I've just submitted a few hundred rollbacks to enwiki that passed token validation (to avoid flooding sandbox history and the like, they were constructed to fail with code 'alreadyrolled' which is checked after token validation). 150 used the same session and the same token, while 300 used a new session with a newly-fetched token for each. You didn't specify when "peak hours" are, but I see "Reverted" around 11-14 times per minute in Special:RecentChanges at the moment and I'd think 1 out of 150 or 1 out of 300 would have failed if you're seeing 1 out of *20*.

It seems to me that it's more likely you've got something wrong in your code with respect to session cookie handling or the like.

(In reply to Peter Bena from comment #2)

Can you explain to me based on what you assessed this as low priority?

Sure:

(In reply to Peter Bena from comment #0)

This bug happens only when high traffic is on wiki, especially in peak
hours when lot of vandals are being reverted (15+ reverts per minute). It
can be reproduced only on wikimedia installation (for example english
wikipedia) not custom mw installs, and it happens only rarely (1 from 20
sessions).

(In reply to Brad Jorsch from comment #3)

(In reply to Peter Bena from comment #1)

whether the revert token retrieved by meta query is permanent, or if it
changes for every revert.

A new token is not required for for every revert, although there's nothing
stopping you from fetching a new token each time if you want.

The token returned will change each time you fetch one due to the new
time-limited token functionality,[1] but at the moment any of the returned
tokens should be valid until the session expires or something triggers a
change to the token-secret in the session.

[1]:
https://lists.wikimedia.org/pipermail/wikitech-l/2014-October/079092.html

(In reply to Peter Bena from comment #2)

Only solution for us is to revert back to
deprecated method which may be removed entirely from MediaWiki, leaving no
working option.

But you yourself state in comment 0 "and even if we fallback to previous
method, the new token we get is also invalid". So how can you claim in
comment 2 that you could use that method?

I've just submitted a few hundred rollbacks to enwiki that passed token
validation (to avoid flooding sandbox history and the like, they were
constructed to fail with code 'alreadyrolled' which is checked after token
validation). 150 used the same session and the same token, while 300 used a
new session with a newly-fetched token for each. You didn't specify when
"peak hours" are, but I see "Reverted" around 11-14 times per minute in
Special:RecentChanges at the moment and I'd think 1 out of 150 or 1 out of
300 would have failed if you're seeing 1 out of *20*.

It seems to me that it's more likely you've got something wrong in your code
with respect to session cookie handling or the like.

The code is open source, you are free to check it, but this used to work in past and now it doesn't. The queries were submitted just as I posted here. Maybe it would be more useful if mediawiki produced more descriptive error than "badtoken" which doesn't really say anything. According to documentation it should be valid. In same moment as when rollback token stop being valid, edit tokens are still valid. Isn't that weird?

How long is that rollback token timeout? And how often does it need to be refreshed? In order to reproduce this, you need to verify these basically whole day, checking this within a frame of 2 hours may not even trigger this bug. I always have to be "benchmarking" mediawiki whole day to get this error.

I tried to check most significant changes to api processing, and so far I got this https://github.com/huggle/huggle3-qt-lx/commit/93886156dd979767b6877b9e15ce5c62bc35be1f which I believe should not trigger this bug, but who knows...

I can no longer reproduce it, this bug will need to be resolved by x-files department of FBI.

Just as with UFO, there are people who have seen it. Have evidence. But nobody can explain it.

ok it started happening again, even for old huggle but only in PEAK hours where we had around 15 reverts per minute and after half an hour where traffic went down it started working. Note this can't be related to cookies etc because editing works in this bugged sessions. It's just not possible to rollback and requesting new rollback token doesn't help. Only fix is keep logging out and back until it works.

I believe this is caused by some caches like memcache or sql lags

SInce we switched these tokens even for editing by using csrf token, it's broken randomly even for editing

Anomie set Security to None.

Since this can't really be reproduced and probably doesn't actually have anything to do with the API, I'm just going to close this.