Page MenuHomePhabricator

[Regression] Redirects on file repository no longer work on local wiki
Closed, ResolvedPublic

Description

[[commons:File:MaintenanceShell-Screenshot.png]] is a redirect to
[[commons:File:MaintenanceShell-v0.4.0-screenshot.png]].


Version: 1.22.0
Severity: major

Details

Reference
bz52200

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:58 AM
bzimport set Reference to bz52200.
bzimport added a subscriber: Unknown Object (MLST).

Hmm. The file on commons isn't supressing the "no file by this name text". Its almost as if its being treated as a normal redirect instead of a file redirect.

Confirming request for backport_to_WMF on this as it is pretty high priority. Let's try to get this done before the end of the week so we don't run into Wikimania mania.

(In reply to comment #2)

Confirming request for backport_to_WMF on this as it is pretty high priority.
Let's try to get this done before the end of the week so we don't run into
Wikimania mania.

Note, this appears to be an isolated incident. Most file redirects still appear to work.

Additionally, it appears some sort of cache expired, and the example above is no longer broken.

Oh.... well then.... cleared the request. :)

(In reply to comment #3)

(In reply to comment #2)

Confirming request for backport_to_WMF on this as it is pretty high priority.
Let's try to get this done before the end of the week so we don't run into
Wikimania mania.

Note, this appears to be an isolated incident. Most file redirects still
appear
to work.

Meh, appears to be common on newer redirects (example: https://commons.wikimedia.org/w/index.php?title=File:Baku9.jpg&redirect=no ). Some sort of cache isn't being purged properly.

Current suspicion: File objects get cached in memcached. Perhaps when moving file, file isn't cleared from memcached. Person goes to view redirect, various code loads the file from cache. However the file object loaded from cached is the one that used to be associated with this title, but isn't anymore. Thus $file->getRedirected() returns false, because its still the old version of the file before it was a redirect. Eventually file falls out of memcached, and everything works.

(I haven't tested this theory yet. may be wrong).

I took a few minutes to do some debugging on this. It appears that the problem is that a memc key is being set in LocalRepo::checkRedirect() with an expiry of 86400s. This key is supposed to be invalidated by a call to LocalRepo::invalidateImageRedirect() when necessary, but apparently that is getting missed somehow.

Perhaps a more robust thing to do, would be instead of clearing that key, explicitly populate it on page move. That way, it shouldn't fallback to reading the (lagged) slave.

Interestingly enough, problem is not reproducible on test.wikipedia.org. ( https://test.wikipedia.org/w/index.php?title=File:Ex_1%25_ample.jpg&redirect=no ). I wonder if possibly there's some anti-vandalism bot that auto-loads pages on edit at commons, which is causing the cache to be repopulated in a different request before the change replicates to the slave db. (That theory might be stretching it though).

I made a patch at https://gerrit.wikimedia.org/r/77562 which would instead of clearing the image redirect cache on page move, simply repopulate it (and hopefully avoid issues with slave db replag). I'm unsure if that would solve the issue here (Since the issue here seems only reproducible on commons, I'm a bit confused as to what's different between commons and enwikipedia - aka I have no idea what's causing this). Nonetheless I think that patch is a step in the right direction.

I also did https://gerrit.wikimedia.org/r/77563 to make this cache be cleared on ?action=purge, since it seems like something that should be.

(In reply to comment #4)

Oh.... well then.... cleared the request. :)

My comment 3 above was in error (It happens to all recent moves on commons < 24 hours old), so I'm resetting that flag.

(In reply to comment #11)

(In reply to comment #4)

Oh.... well then.... cleared the request. :)

My comment 3 above was in error (It happens to all recent moves on commons <
24
hours old), so I'm resetting that flag.

OK, I'll wait until Gerrit tells me it is merged and we'll go from there.

(In reply to comment #10)

I made a patch at https://gerrit.wikimedia.org/r/77562 which would instead of
clearing the image redirect cache on page move, simply repopulate it (and
hopefully avoid issues with slave db replag). I'm unsure if that would solve
the issue here (Since the issue here seems only reproducible on commons, I'm
a
bit confused as to what's different between commons and enwikipedia - aka I
have no idea what's causing this). Nonetheless I think that patch is a step
in
the right direction.

Ok. So there is a way for the user to trigger invalidateImageRedirectCache - namely, making an edit to the redirect (real edit, not a null edit). I tested that with https://commons.wikimedia.org/w/index.php?title=File:Shaik_Mydeen-3.jpg&redirect=no, and my edit seemed to fix the issue, which seems to confirm that it is indeed a problem with that cache.

Change 77562 had a related patch set uploaded by Brian Wolff:
More rigorous clearing of image redirect cache

https://gerrit.wikimedia.org/r/77562

Change 77562 merged by jenkins-bot:
More rigorous clearing of image redirect cache

https://gerrit.wikimedia.org/r/77562

Confirmed fixed (This is now deployed to commons. Can confirm new file redirects do not suffer from this issue)

Gilles raised the priority of this task from High to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to High.Dec 4 2014, 11:22 AM