Page MenuHomePhabricator

After re-uploading a file, users still see the browser-cached thumbnail for the old version
Closed, ResolvedPublic

Description

After the upload of your changed version you will see that the preview image has NOT changed, and will not show your new version.
After clearing the browser cache with Ctrl-F5, you will see the correct preview image.

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:23 AM
bzimport set Reference to bz36380.
bzimport added a subscriber: Unknown Object (MLST).

If users need to purge browser cache your web server is misconfigured

Closing wontfix

Bawolff, you closed that bug too early.

How to reproduce

You can reproduce it easily on mediawiki.org .

I used the test file https://www.mediawiki.org/wiki/File:Bug36380.pdf .

After the upload of your changed version you will see that the

  • preview image has NOT changed, and will not show your new version.

This is one apparent effect of the bug.

You need to manualy Ctrl F5 in your browser.

If users need to purge browser cache something is misconfigured or buggy. Might be MediaWiki :-)

Telling the user to jump through hoops should be a last resort; ideally, the software should take care of cache invalidation.

Tgr renamed this task from After file re-upload, add note to clear browser cache to After re-uploading a PDF file to mediawiki.org, users still see the thumbnail for the old version.Dec 30 2014, 12:37 AM
Tgr updated the task description. (Show Details)
Tgr set Security to None.

The cleanest solution is probably T66214.

eTags on the content should also be able to handle this situation.

eTags on the content should also be able to handle this situation.

Images use IfModifiedSince which would also work, but sometimes images are served from local browser cache and no request is sent at all to the server. Probably depends on the age of the image, I haven't tested in depth.

In my opinion, this is a situation where showing a loading icon or no image at all would be preferable to showing the wrong image.

We should already be re-parsing and re-rendering the file description page on re-upload... T66214 looks interesting, but I'm not sure how related it will be to resolving this task.

Going to the file page linked from the description (I haven't visited it before) and looking up the request for the thumbnail I get

Request headers
Accept: image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,hu;q=0.6,fr;q=0.4,he;q=0.2
Connection: keep-alive
Cookie: GeoIP=US:San_Francisco:37.7749:-122.4194:v4
Host: upload.wikimedia.org
Referer: https://www.mediawiki.org/wiki/File:Bug36380.pdf
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
Response Headers
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache, X-Varnish
Age: 0
Connection: keep-alive
Content-Length: 7104
Content-Type: image/jpeg
Date: Wed, 31 Dec 2014 07:26:21 GMT
Etag: 3573f5f6d7f884beed0d22a6dda8e136
Last-Modified: Fri, 18 Oct 2013 13:39:18 GMT
Server: nginx/1.1.19
Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
X-Cache: cp1064 miss (0), cp4014 miss (0), cp4006 frontend miss (0)
X-Object-Meta-Sha1base36: 28grco8igoi3kw2tktmmiz4z94hdy2v
X-Timestamp: 1382103557.86390
X-Trans-Id: tx67e7f6b134a341fe8af5a-0054a3a51d
X-Varnish: 3326298349, 3765805880, 3664096603

After going through the reproduction steps, the browser console says 200 OK (from cache), so there is no request sent at all. (Chrome 39 on Ubuntu.)

So either we add "Cache-control: no-cache" and get a lot more revalidation hits on the frontend cache and a few more hits on the backend, or we need to change the uri for each version...

So yeah, changing uri for each version sounds more sane.

Is it fair to say that this affects all images on wikis? This report is PDF-specific and the other one is Media Viewer-specific, but it sounds like it's a problem caused by versioning and would affect all file types and all places where the images are used.

I have seen the same caching behavior for in-article thumbnails of bitmap images; definitely not PDF- or MMV-specific. When reloading a random article, most but not all images are reloaded via an If-Modified-Since request; some are taken directly from browser cache. Probably depends on how long ago the image was cached, and maybe even how long ago it was uploaded (since that goes into the Last-Modified header, and IIRC the HTTP RfC suggest a default caching heuristic of caching files for 10% of the difference between Last-Modified and Date).

Gilles renamed this task from After re-uploading a PDF file to mediawiki.org, users still see the thumbnail for the old version to After re-uploading a file to mediawiki.org, users still see the thumbnail for the old version.Jan 9 2015, 8:05 AM
Gilles added subscribers: Aklapper, Deskana.

The 10% heuristic is indeed detailed here:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.4

Perhaps as a short term improvement, we could try adding:
Cache-control: "max-age=3600, must-revalidate"

An hour should at least be a lot less than 10% from last time the file was changed. It might also be an idea to have a separate smax-age.
I suspect that it will just give us a lot more 304 requests to handle on backend and varnish, with little perceptual difference for the end user, but as an experiment it might be worth exploring ?

How would you evaluate that experiment?

matmarex renamed this task from After re-uploading a file to mediawiki.org, users still see the thumbnail for the old version to After re-uploading a file to mediawiki.org, users still see the browser-cached thumbnail for the old version.Sep 9 2016, 4:26 PM
matmarex renamed this task from After re-uploading a file to mediawiki.org, users still see the browser-cached thumbnail for the old version to After re-uploading a file, users still see the browser-cached thumbnail for the old version.Nov 7 2016, 8:08 PM
matmarex added subscribers: Gestrid, matmarex.
matmarex removed a subscriber: wikibugs-l-list.

A proper solution would be T149847: RFC: Use content hash based image / thumb URLs, or T139294: Persistent media links for file versions if it's implemented in a different manner than just waiting for T149847.

Adding a cache-busting parameter to the URL (like the upload timestamp) would also work.

T149847 would be the eventual best solution, but a cache-busting query parameter would be a good quick fix. The WMF upload-frontend cache configuration removes the query string, so we won't bust the server-side cache by adding a parameter.

Change 756151 had a related patch set uploaded (by AntiCompositeNumber; author: AntiCompositeNumber):

[mediawiki/core@master] Add timestamp to thumbnail URLs on file pages

https://gerrit.wikimedia.org/r/756151

Test wiki created on Patch demo by AntiCompositeNumber using patch(es) linked to this task:

https://patchdemo.wmflabs.org/wikis/7056dc499e/w/

Change 756151 merged by jenkins-bot:

[mediawiki/core@master] Add timestamp to thumbnail URLs on file pages

https://gerrit.wikimedia.org/r/756151

The change will be deployed to Wikimedia wikis next week, per the usual schedule.

Jdforrester-WMF subscribed.

We merged this without declaring a thumbnail API let alone versioning it to add this new feature, so let's not mark it as a blocker. :-)

Test wiki on Patch demo by AntiCompositeNumber using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/7056dc499e/w/

Change 884363 had a related patch set uploaded (by BBlack; author: BBlack):

[operations/puppet@production] Commentary re: image timestamps in URL query part

https://gerrit.wikimedia.org/r/884363

Change 884363 merged by BBlack:

[operations/puppet@production] Commentary re: image timestamps in URL query part

https://gerrit.wikimedia.org/r/884363