Page MenuHomePhabricator

Images not purged from caching servers after deletion
Closed, DuplicatePublic

Description

https://commons.wikimedia.org/wiki/File:Minka_Kelly_2,_2013.jpg was deleted on 00:40, 6 July 2014 and it evidently still exists: http://upload.wikimedia.org/wikipedia/commons/e/ee/Minka_Kelly_2,_2013.jpg (if not as if you are reading cf. http://www.webcitation.org/6QquhDHKo - webcite 6 July 2014 00:56:08


Version: 1.24rc
Severity: normal

Details

Reference
bz67559

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:36 AM
bzimport set Reference to bz67559.
bzimport added a subscriber: Unknown Object (MLST).

Confirmed. Image is still available in varnish caches in eqiad, esams, and ulsfo. Image appears to be successfully deleted out of swift.

Unclear at this point if this is a one-off incident, or a more general problem. ?action=purge seems to clear original image assets currently on existing files, so does not appear to be a general problem with varnish purging.

I tried deleting [[testwiki:file:Barbershop's temp.jpg]] after ensuring it was in varnish cache. The delete operation removed the file from varnish cache.

I suspect this bug is a rare error due to packet loss or something of that nature. Not sure if there's much to do about that (short of redesigning how our purges work).

The file will definitely disappear after thirty days. If its important the file go away now (people are linking to it or whatever), someone with shell access could run something along the lines of

echo 'http://upload.wikimedia.org/wikipedia/commons/e/ee/Minka_Kelly_2,_2013.jpg' | php purgeList.php --verbose

Same with

http://upload.wikimedia.org/wikipedia/commons/3/3f/Minka_Kelly,_Roselyn_Sanchez.jpg (visible right now)
https://commons.wikimedia.org/wiki/File:Minka_Kelly,_Roselyn_Sanchez.jpg (00:50, 6 July 2014 Denniss (talk | contribs | block) deleted page File:Minka Kelly, Roselyn Sanchez.jpg)

I tried approx. 12 images now.

I suppose it would be beneficial if a a bot or something is running that looks, let's say 1 hour after deletion, whether the image still exists...

Gergo/Mark: is this something the Multimedia Team can take a look into?

Brian, do you know where in the code the varnish cache is supposed to get cleared on purge? I've only found mentioned of Squid so far and I wonder what mechanism links the two for purges.

Squid and varnish use the same purging mechanism. All the class names still say squid for historical reasons.

On commons: mediawiki sends a HTCP purge packet on udp to a multicast address (see SquidUpdate).this multicast address goes to all varnish boxes, with a relay server to transfer it across data centers. The vhtcpd deamon on the varnish boxes turn that packet into a normal http PURGE against the varnish on localhost.

This setup is a bit finicky and has caused many issues over the years.

So it does sound like this needs an async job ran some time after the purge attempt to check that it worked. Would hitting the public-facing URL be enough to know that it's gone from all data centers, though? Or is that location-dependent (depending on where you hit the URL from, it may or may not be there because you hit a different data center)?

I guess the poor man's solution, if it's failing because of intermittent network issues, is to always have a 2nd purge attempt (or more) scheduled to be run later.

Its location based. You get different servers depending on if you hit esams, ulsfo, or eqiad.

In the original example in comment 0, the cached file was present at all three locations (easy to check by doing wget http://upload-lb.esams.wikimedia.org/pathToFile -S --header 'host: upload.wikimedia.org' . And so on for other data centres)

I wonder if bug 67694 is related

So is this still a problem, four weeks later?

So is this still a problem, fourteen weeks later?

rillke? bawolff?

Well the original file from comment 0 is gone from cache (i believe varnish kills things after 30 days no matter what)

The issue is not occuring often enough to generate regular complaints. It probably still occurs rarely in an intermittent fashion due to packetloss.

Not sure if that is a yes or a no to your question.