Page MenuHomePhabricator

Purge for some thumbnails fails after re-uploading a file
Closed, DeclinedPublic

Description

Author: mr.heat

Description:
I'm very, very sorry to say that but this stinkin' bug is driving me crazy.

Outdated garbage:
https://upload.wikimedia.org/wikipedia/commons/thumb/9/92/Schädel_und_Gebiss_einer_Großkatze.png/629px-Schädel_und_Gebiss_einer_Großkatze.png

Correct:
https://upload.wikimedia.org/wikipedia/commons/thumb/9/92/Schädel_und_Gebiss_einer_Großkatze.png/628px-Schädel_und_Gebiss_einer_Großkatze.png

Reported as bug 48927 and many, many others. See my comments at bug 31680 and bug 41130. It is always exactly the same problem (there may be multiple reasons but what I see as a user is always the same). I can reproduce this problem with almost every single file (re-)upload I do. Most thumbnails are purged but a random thumbnail size is not. Manual purge does not help. Sometimes the "hack" described in some of the other reports works, sometimes it does not.

I'm in Germany so please don't tell me this works for you.


Version: wmf-deployment
Severity: major

Details

Reference
bz49362

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:03 AM
bzimport set Reference to bz49362.
bzimport added a subscriber: Unknown Object (MLST).

The usual workaround to try first does not work either (go to https://upload.wikimedia.org/wikipedia/commons/thumb/9/92/Sch%C3%A4del_und_Gebiss_einer_Gro%C3%9Fkatze.png/629px-Sch%C3%A4del_und_Gebiss_einer_Gro%C3%9Fkatze.png?tralala123 , then go to http://commons.wikimedia.org/wiki/File:Sch%C3%A4del_und_Gebiss_einer_Gro%C3%9Fkatze.png?action=purge ), so the thumbnail seems to be stuck.

It is exactly the same problem, but another instance on another server...

The next step to improve the situation is to fix bug 43449.

mr.heat wrote:

Now the reported image is fixed. Why? Isn't the caching set to 30 days like the caching of the article pages?

This is exactly the kind of bug that needs to be fixed by a paid developer (you know what I'm talking about, Andre). What I still don't understand: This is known for about two years. Why isn't it possible to find and fix such an annoying problem in two years?

In this specific case I expect a number of underlying, mingled bugs with a complexity of investigation that stands in no relation to the actual number of incidents and their severity, or in different words: It's simply not annoying enough that anybody has found a complete solution yet, but that does not mean that nobody has investigated the problems. I know that's probably a bit disappointing.

mr.heat wrote:

Should I really open new reports every time I came across this issue? Just to make a point and to increase the number of dependencies on some other bugs? Does this make it move up on some to-do lists? Probably not.

Bad (white border left and right):
https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Otterndorf_blick_suedost.jpg/640px-Otterndorf_blick_suedost.jpg

Good:
https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Otterndorf_blick_suedost.jpg/639px-Otterndorf_blick_suedost.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Otterndorf_blick_suedost.jpg/641px-Otterndorf_blick_suedost.jpg

As I said many times: It's every single re-upload. Every single one. You know there are a lot of reasons why people don't report this more often. One of the reasons is: they think Commons is bogus and bad by design anyway. It's not that the users don't care. They just gave up years ago.

(In reply to comment #4)

Should I really open new reports every time I came across this issue?

If there are no active reports on the issue, yes. We can't fix issues we don't know exist (Actually we should make monitoring not suck...). At this particular moment I was not aware there was any open issues with purging.

Anytime you open a bug on this topic, please include:
*How often it occurs.
*Are you in Europe.
*Does doing ?action=purge on the image description page fix the issue.
*Does appending ?somerandomstring to the end of the thumbnails url result in the correct image being shown.


As for this issue:

displays correctly for me. Do you have other examples of old thumbnails showing?

As for this issue:

https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/
Otterndorf_blick_suedost.jpg/640px-Otterndorf_blick_suedost.jpg

displays correctly for me. Do you have other examples of old thumbnails
showing?

I was able to reproduce with https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/World_homosexuality_laws.svg/800px-World_homosexuality_laws.svg.png It did not matter if I was accessing via upload.wikimedia.org or upload-lb.esams.wikimedia.org.

[Long winded bit about performance of action=purge which may or may not be related]

It had an age of 5035 (1 hour 28 min ago), and appears to be showing the version from: 2013-07-08T19:16:36 (3 versions ago). The previous 3 upload versions should have all trigered purges.

Upon ?action=purge (via https), request took forever, eventually leading to a 504 Gateway Time-out from nginx (also may I say, what an ugly error page. What happened to the nice wikimedia customized error page?). This did not cause the thumbnail to be dropped from cache.
*Second attempt at action=purge (via just http) went through after what was still a very long delay but not as long. However the varnish caches did not have the thumbnail dropped from cache
*Third attempt (via http so I got the prettier error message - via cp1012.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.131 (10.64.0.131)), 504 timeout again. Note, the timeout was in the initial purge part (not the redirected get). So its not page rendering that is causing slow down. All this should have to do is get all thumbs, delete them, send htcp packets, update page_touched. Furthermore all the thumbs should have been deleted already, so the delete thumbs and send htcp packets should have been a no-op. I don't know what's taking so much time, but it really shouldn't.
*Fourth attempt suceding within the time limit (only took 47 seconds). Still did not manage to purge the thumbnail from the varnish cache. I even did the ?addparamtothumburl hack to make sure a new version of thumb is generated on the back end)

It appears performance of purging is correlated to the number of old version (oldimage table entries) the image has. [[commons:File:Bearbeitungsstand_Denkmale_Österreichs_nach_Gemeinden_Bilder.svg]] has 590 (but very few actual uses, so presumably not very many thumbs in the system) and similarly times out on purge. For reference World_homosexuality_laws.svg has 244 old versions.


Asking on Commons, responses seem to indicate its common for re-uploads to have outdated thumbnails, that go away upon ?action=purge, which suggests purges aren't happening properly on upload new version.

CC'ing Leslie as comment 6 by bawolff is particularly interesting.

On Further investigation:
*On ?action=purge we purge all the thumbnails of old versions of the file. This is very expensive if there are a lot of old versions (over 200 seems to get one in to the territory of the request timing out)
*We originally started doing this for bug 30192, which seemed to add the purge old thumbs overzealously. Its probably really only needed when deleting a file. Additionally it appears that the code paths that actually need this purge (except possibly for revdel) already do that via some other means as well.
*Commons users (I asked a small sample of users on irc) complain that often on overwriting a file, cache is not purged initially, but ?action=purge works. This isn't really consistent with the large-number-of-old-versions issue, so I suspect that it is a separate issue.

(In reply to comment #8)

*Commons users (I asked a small sample of users on irc) complain that often
on
overwriting a file, cache is not purged initially, but ?action=purge works.
This isn't really consistent with the large-number-of-old-versions issue, so
I
suspect that it is a separate issue.

At first I was wondering if maybe there is some race condition type issue (Image gets recached before new one fully gets committed to swift) However looking at recently overwritten image that does not appear to be the case.

For example, consider https://commons.wikimedia.org/wiki/File:Mari_Possa,_Teagan_Presley_at_Digital_Playground_Party_1.jpg

When I visited it at roughly 15:08 (about a minute and a half after it was re-uploaded), the main version of the image (https://upload.wikimedia.org/wikipedia/commons/6/6f/Mari_Possa%2C_Teagan_Presley_at_Digital_Playground_Party_1.jpg ) had been purged 85 seconds ago, which was consistent with the reupload time (It should be noted that that is the full image, and not a thumb which is possibly handled from a different code path).

However the thumb in the history section ( https://upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Mari_Possa%2C_Teagan_Presley_at_Digital_Playground_Party_1.jpg/120px-Mari_Possa%2C_Teagan_Presley_at_Digital_Playground_Party_1.jpg )
had an age of 13247 (about 3 hours 40 min). Which means no purge was received for it during the reupload process.

On the subject of slow purges of files with lots of old versions, I submitted https://gerrit.wikimedia.org/r/#/c/72769/ (I don't think this is the problem the commons users are complaining about)

I think this is the relevant bug.

File:Lac megantic affected area.png has several versions. The second oldest version from 21:42, 7 July 2013 is displayed as the thumbnail in the article even after ?action=purge on:
*The image description page on en.wp
*The image description page on commons
*The thumbnail file https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Lac_megantic_affected_area.png/220px-Lac_megantic_affected_area.png

After the purges I now see the old version on the article and when viewing the image file. I see the correct version on the image description pages.

I am located in the UK, as is the user who reported this on the article talk page.

Netherlands here, and purging https://en.wikipedia.org/wiki/File:Blue_cross_logo.png does not seem to get trough, since the thumbnail seems stuck on the original image.

CC'ing Ariel, Faidon and Aaron as this is a thumbnail purging issue (see comment 8 - comment 10 by bawolff). Could any of you take a look at this?

(In reply to comment #11)

I think this is the relevant bug.

File:Lac megantic affected area.png has several versions. The second oldest
version from 21:42, 7 July 2013 is displayed as the thumbnail in the article
even after ?action=purge on:
*The image description page on en.wp
*The image description page on commons
*The thumbnail file
https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/
Lac_megantic_affected_area.png/220px-Lac_megantic_affected_area.png

After the purges I now see the old version on the article and when viewing
the
image file. I see the correct version on the image description pages.

I am located in the UK, as is the user who reported this on the article talk
page.

Purging works fine for this image (doing ?action=purge, both eurpoe and n. america varnishes seem to get cleared). The file does not have enough versions to suffer from what I was talking about in comment 9 (It needs about 75 before that happens).

That said, there does seem to be issues at commons with initial purge on reupload (intermittently?) failing.

(In reply to comment #12)

Netherlands here, and purging
https://en.wikipedia.org/wiki/File:Blue_cross_logo.png does not seem to get
trough, since the thumbnail seems stuck on the original image.

ditto here too.

I don't see any difference (anymore?).

I don't see any unresolved caching issues left here, hence closing.

Gilles raised the priority of this task from High to Unbreak Now!.Dec 4 2014, 10:21 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to High.Dec 4 2014, 11:21 AM