Page MenuHomePhabricator

action=raw should allow public HTTP caching where possible
Closed, ResolvedPublic

Description

Author: metatron

Description:
When loading a gagdget or custom script from eg. meta/mediawiki/enwiki with

-mw.loader.load('//meta.wikimedia.org/w/index.php?title=...&action=raw&ctype=...')

the responded cache-control directive is

  • private, s-maxage=0, max-age=0, must-revalidate

which prevents these scripts from being cached properly and causes a lag (40-500ms) on each call.

(IMHO JS pages should have a reasonable default > 0)

Example with requested cache:

Remote Address:xx.xx.xx.xx:443
Request URL:https://meta.wikimedia.org/w/index.php?title=User:Hedonil/Test/XTools.js&action=raw&ctype=text/javascript&maxage=86400&smaxage=86400
Request Method:GET
Status Code:304 Not Modified
Request Headersview source
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en,no;q=0.8,nb;q=0.6,vi;q=0.4,nl;q=0.2,zh;q=0.2,ru;q=0.2,en-US;q=0.2,zh-CN;q=0.2,zh-TW;q=0.2
Connection:keep-alive
metawikiUserID=..centralauth_User..
Host:meta.wikimedia.org
If-Modified-Since:Wed, 13 Aug 2014 04:18:39 GMT
Referer:https://en.wikipedia.org/wiki/Charles_Schild
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36
Query String Parametersview sourceview URL encoded
title:User:Hedonil/Test/XTools.js
action:raw
ctype:text/javascript
maxage:86400
smaxage:86400
Response Headersview source
Accept-Ranges:bytes
Age:0
Cache-Control:private, s-maxage=0, max-age=0, must-revalidate
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/javascript; charset=UTF-8
Date:Wed, 13 Aug 2014 07:55:42 GMT
Last-modified:Wed, 13 Aug 2014 04:18:39 GMT
Server:nginx/1.1.19
Vary:Accept-Encoding
Via:1.1 varnish, 1.1 varnish, 1.1 varnish
X-Cache:cp1065 miss (0), amssq50 miss (0), amssq31 frontend miss (0)
X-Content-Type-Options:nosniff
X-Varnish:948365503, 2378009697, 4022640906
ConsoleSearchEmulationRendering

Example without requested directive:

Remote Address:xx.xx.xx.xx:443
Request URL:https://en.wikipedia.org/w/index.php?title=MediaWiki%3AGadget-HotCat.js%2Flocal_defaults&action=raw&ctype=text/javascript
Request Method:GET
Status Code:304 Not Modified
Request Headersview source
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en,no;q=0.8,nb;q=0.6,vi;q=0.4,nl;q=0.2,zh;q=0.2,ru;q=0.2,en-US;q=0.2,zh-CN;q=0.2,zh-TW;q=0.2
Connection:keep-alive
Cookie:enwikiSession=..centralauth_User=...
Host:en.wikipedia.org
If-Modified-Since:Mon, 23 Jun 2014 19:15:43 GMT
Referer:https://en.wikipedia.org/wiki/Charles_Schild
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36
Query String Parametersview sourceview URL encoded
title:MediaWiki:Gadget-HotCat.js/local_defaults
action:raw
ctype:text/javascript
Response Headersview source
Accept-Ranges:bytes
Age:0
Cache-Control:private, s-maxage=0, max-age=0, must-revalidate
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/javascript; charset=UTF-8
Date:Wed, 13 Aug 2014 07:55:42 GMT
Last-modified:Mon, 23 Jun 2014 19:15:43 GMT
Server:nginx/1.1.19
Vary:Accept-Encoding
Via:1.1 varnish, 1.1 varnish, 1.1 varnish
X-Cache:cp1053 miss (0), amssq55 miss (0), amssq54 frontend miss (0)
X-Content-Type-Options:nosniff
X-Varnish:2605122819, 917135542, 2533692991

Details

Reference
bz69460

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:42 AM
bzimport set Reference to bz69460.
bzimport added a subscriber: Unknown Object (MLST).

I can reproduce this on any WMF wiki (including *.beta.wmflabs.org, both logged in and logged out), but not on my private test wiki or on other public wikis like http://www.explainxkcd.com, where Cache-Control is set as requested by the parameters (or as the set default). So I guess it's Varnish that replaces the Cache-Control with "Cache-Control:private, s-maxage=0, max-age=0, must-revalidate"

I can confirm that appending &bcache=1 to the URL will set the Cache-Control as expected, so this definitely is caused by http://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/ce1f2b638ab2a09097ffe207ec6dc35c3823bc1b

(In reply to Michael M. from comment #2)

I can confirm that appending &bcache=1 to the URL will set the Cache-Control
as expected, so this definitely is caused by
http://git.wikimedia.org/commitdiff/operations%2Fpuppet.git/
ce1f2b638ab2a09097ffe207ec6dc35c3823bc1b

Thanks for investigating.

Mark: Any comments, as you worked on that change?

Mark: Any comments, as you worked on that change?

(In reply to Andre Klapper from comment #4)

Mark: Any comments, as you worked on that change?

That's an old change (2013) so he might have forgotten ;)

Krinkle/Roan/Trevor: Thoughts here on the right cache settings?

That change simply puts the same behaviour in place that was there with the Squid cluster before Varnish.

The problem is that we don't have separate Cache-Control headers for _our caches_ vs _clients_ in MediaWiki, and never have had. So MediaWiki's Cache-Control header is targeted at our caches, and Varnish replaces that header with one that for most URLs doesn't allow caching (just like our Squid config did). Unfortunately, Varnish can only guess from the URL.

metatron wrote:

If this is caused by this lines (as Michael M. stated):

sub vcl_deliver {

/* Override the Cache-Control header for wiki content pages */
if (req.url ~ "(?i)^/w(iki)?/.*"
    && req.url !~ "^/wiki/Special\:Banner(Controller|ListLoader)"
    && req.url !~ "(?i)^/w/(extensions/.*|api\.php)"
    && req.url !~ "(?i)bcache=1") {
    set resp.http.Cache-Control = "private, s-maxage=0, max-age=0, must-revalidate";
}

}

Why not just simply add a regex to meet the directives of the request?
Request-url and query-string are available at present; the "new" parameter bcache is respected, why not maxage/s-maxage?
IMHO at least "private" should be removed for requested .js/.css pages or for action "raw". This would allow proxies/local browsers to intelligently set caching. "private" + "must revalidate" leaves no elbowroom for systems to guess the right things.

https://meta.wikimedia.org/w/index.php?title=User:Hedonil/Test/XTools.js&action=raw&ctype=text/javascript&maxage=86400&smaxage=86400

Aklapper lowered the priority of this task from High to Medium.Sep 1 2015, 12:43 PM
Aklapper subscribed.

One year later, wondering if that's still an issue, or whether someone could answer the last comment:

Why not just simply add a regex to meet the directives of the request?

Krinkle renamed this task from Resource loader ignores requested caching directives / not caching properly to action=raw should allow public HTTP caching where possible.Apr 28 2016, 11:34 PM
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.

max-age and s-maxage aren't really relevant in this issue and have been mostly obsoleted by the fact that we now allow full Varnish/Squid caching for action=raw (and are automatically purged after edits, the same way we purge the regular article views of pages).

However that only applies to logged-out users. For logged-in users, action=raw, similar to regular page views, always bypass cache.

We should fix this so that on regular wikis (e.g. not private read-restricted wikis) action=raw responses are publicly cached so that they can be completely handled by Varnish instead of requiring an needless roundtrip through Apache/PHP/MediaWiki.

Is there any plan to let the maxage parameter work correctly? Currently it still returns a Cache-Control header with max-age=0 despite explicitly setting it to a higher value. Using bcache=1 isn't changing the behavior either.

A quick count shows I have ~106 calls to either importScript or mw.loader.load in my [[https://en.wikipedia.org/wiki/User:AfroThundr3007730/common.js|common.js]] currently. It would be really awesome if all those requests could be cached instead of being loaded from scratch every time I open a tab (of which I usually have dozens up at a time). Granted, my use case is not a typical one, but the software should support Cache-Control with a max-age > 0 for user loaded resources, especially if we use something like &maxage=86400 in the URI (which I'm not currently, since that parameter doesn't even work).

@AfroThundr3007730 There are no plans to do that currently. These cannot be safely cached for security and privacy reasons, as such they are intentionally set to 0.

Note that max-age=0,private does not mean they are not cached. It simply means they are not unconditionally re-used by the CDN and your browser, which is not the same as caching.

Your browser does cache these, and does re-use the downloaded script between pages. The only thing it does is transfer a very small message in the background when loading a page to check if the script has changed since your last page view (this means edits will immediately apply, which is nice).

The scripts are not re-downloaded every time you view a page, and there is no need to add any bcache, maxage or smaxage parameters to MediaWiki urls, these do not improve performance anymore. Performance is our default, as much as possible.

Krinkle lowered the priority of this task from Medium to Low.May 8 2019, 7:52 PM
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.
Krinkle removed a subscriber: wikibugs-l-list.

The scripts are not re-downloaded every time you view a page, and there is no need to add any bcache, maxage or smaxage parameters to MediaWiki urls, these do not improve performance anymore. Performance is our default, as much as possible.

So, are maxage and smaxage obsolete? There some docs that still mention them, notably in the raw action section here:
https://www.mediawiki.org/wiki/Manual:Parameters_to_index.php#Raw

The docs seems to suggest this should be better:

mw.loader.load("https://pl.wikipedia.org/w/index.php?action=raw&ctype=text/css&smaxage=86400&maxage=259200&title=Wikipedysta:Msz2001/sourcecode-links.css", "text/css");

Then this:

mw.loader.load("https://pl.wikipedia.org/w/index.php?action=raw&ctype=text/css&title=Wikipedysta:Msz2001/sourcecode-links.css", "text/css");

But looking at the headers and the fact that the raw call sometimes sets cookies, it seems docs are wrong, right? :-)

In T71460#8947620, @Nux wrote:

[…] There some docs that still mention them, notably in the raw action section here: https://www.mediawiki.org/wiki/Manual:Parameters_to_index.php#Raw

The docs seems to suggest this should be better:

mw.loader.load("https://pl.wikipedia.org/w/index.php?action=raw&ctype=text/css&smaxage=86400&maxage=259200&title=Wikipedysta:Msz2001/sourcecode-links.css", "text/css");

Then this:

mw.loader.load("https://pl.wikipedia.org/w/index.php?action=raw&ctype=text/css&title=Wikipedysta:Msz2001/sourcecode-links.css", "text/css");

But looking at the headers and the fact that the raw call sometimes sets cookies, it seems docs are wrong, right? :-)

Indeed, when logged-out, both of these URLs enjoy the same CDN caching, and neither of them permits unrestricted browser caching. A notable difference is that when the page is edited, only the second one will automatically be purged and updated. The first one will stay out of date on the CDN because its URL is non-standard (not "canonical"), and so it is not known when we save an edit that this URL might exist in the cache, as there are infinite variations one could make with these parameters.

When logged-in, action=raw is not cached by the CDN, no matter which parameters are set.

The only thing these parameters do, is make the URL non-standard, and thus its CDN cache entry (for logged-out browsers) will remain stale for up to 24 hours after editing.

Krinkle closed this task as Resolved.EditedJul 20 2023, 3:03 AM
Krinkle claimed this task.

The originally stated description was solved. The issue of CDN caching for logged-in users is filed at T279120: Make index.php?action=raw CDN cacheable for both logged-in and logged-out for import script use case.

the responded cache-control directive is
private, s-maxage=0, max-age=0, must-revalidate

The s-maxage is traditionally for cache proxies, such as the Wikimedia CDN. However, to avoid stale content, we artificially change this to s-maxage=0 at our CDN layer. The reason is to avoid cache proxies elsewhere in the world (outside our control) from storing obsolete copies.

This might make it look like that MediaWiki is sending s-maxage=0 to the CDN, but it is not. It is sending a higher value, and then when we send it from the CDN cache to the browser, we change it to 0.

To prove this, check the other headers:

curl -I 'https://pl.wikipedia.org/w/index.php?action=raw&ctype=text/css&title=Wikipedysta:Msz2001/sourcecode-links.css'

last-modified: Sun, 04 Sep 2022 13:22:02 GMT
x-cache: …, cp3056 hit/2
x-cache-status: hit-front
age: 524
cache-control: private, s-maxage=0, max-age=0, must-revalidate

The x-cache shows that Varnish (Wikimedia CDN) served it directly from the cache ("hit"). The age header shows how long ago it was added to the cache (524 second, or 8 minutes, ago).