Page MenuHomePhabricator

Separate Cache-Control header for proxy and client
Open, LowPublic

Description

MediaWiki has traditionally used the Cache-Control header to control the CDN (i.e. Squid reverse proxy), then the Cache-Control header for clients has been specified in Squid configuration. Specifically, when a certain URL regex matches, the Cache-Control header is stripped out and replaced with the configured header.

This is not ideal, as noted by Gabriel in a comment in the original code. It would be better if MediaWiki specified both headers in its response, so that the URL regex and client Cache-Control header does not need to be maintained in the CDN configuration. Originally, this would have required a Squid patch, but now that we are switching to Varnish, the feature can be implemented with VCL.

Specifically, MW should send a Client-Cache-Control header which Varnish will rewrite to Cache-Control as appropriate.

Outcome
  • Less MediaWiki-related configuration needed in Varnish VCL.
  • Consolidate the code logically part of MediaWiki actually inside MediaWiki.
  • Reduce risk of accidental exposure of the internal cache headers to the outside world in case of bugs in Varnish.
Todo
  • Decide on the order of the below, and/or decide whether to have MW send both for a while.
  • Write VCL code to support Surrogate-Control.
  • Change the Cache-Control headers sent by MediaWiki to send the s-maxage values that are meant for internal use only (require PURGE control) to not be sent as part of that header. And instead send it as a separaet Surrogate-Control header instead.

Details

Reference
bz48835

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:32 AM
bzimport set Reference to bz48835.
bzimport added a subscriber: Unknown Object (MLST).

We could do this the other way around and partially implement the semi-standard (semi because it's from the W3C, not IETF) Surrogate-Control header and leave Cache-Control intact for end-users. Fastly, for example, seems to be suggesting users to use this, so this may be a more compatible with the real world alternative.

(In reply to comment #1)

We could do this the other way around and partially implement the
semi-standard (semi because it's from the W3C, not IETF)
Surrogate-Control header and leave Cache-Control intact for
end-users. Fastly, for example, seems to be suggesting users
to use this, so this may be a more compatible with the real world
alternative.

Varnish uses the Cache-Control header in RFC2616_Ttl(), so I suppose it would be necessary to move the Cache-Control header out to some temporary pseudo-header in vcl_fetch, and to move it back into place in vcl_deliver. While the object is in the cache, Surrogate-Control would be copied into Cache-Control.

Support for pass mode would theoretically be simpler with Surrogate-Control.

Either way, there would have to be some backwards compatible handling in Varnish, to account for the progressive rollout of the new MW code. If Client-Cache-Control/Surrogate-Control is missing, Varnish would have to interpret Cache-Control in the old way.

On the MW side, OutputPage could provide an interface allowing configuration of the mapping of headers:

a) Old $wgUseSquid = false:

  • Client-Cache-Control -> Cache-Control
  • Surrogate-Control -> deleted

b) Old $wgUseSquid = true;

  • Surrogate-Control -> Cache-Control
  • Client-Cache-Control -> deleted

c) Surrogate-Control scheme:

  • Surrogate-Control -> Surrogate-Control
  • Client-Cache-Control -> Cache-Control

d) Client-Cache-Control scheme:

  • Surrogate-Control -> Cache-Control
  • Client-Cache-Control -> Client-Cache-Control

We currently don't have a Client-Cache-Control header at all and I don't think we should introduce it now with that name. Introducing just Surrogate-Control and doing the VCL ping-pong you mentioned sounds more sensible to me. We'd need a temporary header to store the client cache control, so we may end up using Client-Cache-Control internally inside VCL as an interim header but I don't see a reason for MediaWiki to use it. i.e. as I see it, the VCL could just be:

sub vcl_fetch {

if (beresp.http.Surrogate-Control) {
  set beresp.http.Client-Cache-Control = beresp.http.Cache-Control
  set beresp.http.Cache-Control = beresp.http.Surrogate-Control
  unset beresp.http.Surrogate-Control
}

}

sub vcl_deliver {

if (resp.http.Client-Cache-Control) {
  set resp.http.Cache-Control = resp.http.Client-Cache-Control
  unset resp.http.Client-Cache-Control
}

}

I don't see any handling that we do now to preserve the backwards compatibility you mentioned. Even if we do and I missed it, we can easily implement it as "else" clauses above, no?

It's a pity that Varnish doesn't natively support Surrogate-Control natively, indeed. Ironically, Squid 3 does in some form :) (so using it inside MediaWiki may be generally useful). I guess we could provide patches to Varnish for the long-term but VCL hacks seem viable in the short-term.

Note that the standard specifies a Surrogate-Capabilities request header to signal the capability to handle Surrogate-Control. We could set it in Varnish and MediaWiki could check for it, so you may avoid a configuration option.

Also note that the same Surrogate-Capabilities/Control mechanism could be also used to signal ESI back and forth (this is defined in the spec). Yuri has used X-Force-ESI (request) and X-Enable-ESI (response) for this purpose in the mobile caches for his ESI testing. We could deprecate those in favor of a unified Surrogate handling by core, especially while we move in the direction of doing ESI.

Immediate applications:

  • Normal page views (vcl_deliver in text-frontend.inc.vcl.erb)
  • Mobile page views (vcl_deliver in mobile-frontend.inc.vcl.erb)

Also, the use of Cache-Control in vcl_fetch in wikimedia.vcl.erb and in vcl_fetch in text-backend.inc.vcl.erb would have to be updated. Some care would have to be taken to ensure that MW does not accidentally send a public Surrogate-Control on responses with private data, where CC:private is currently sent and assumed to be sufficient. Maybe CC:private should override Surrogate-Control.

Aaron suggests that the feature could be used to allow private caching of resources delivered to logged-in users.

Note that, contrary to what I implied in comment #2, Surrogate-Control does not have the same format as Cache-Control. In particular http://www.w3.org/TR/edge-arch specifies the use of the no-store token and does not recognise no-cache or private.

bd808 set Security to None.
faidon edited projects, added acl*sre-team, Traffic; removed Varnish.

Note also that both CC and SC have grace-mode information as well.

In CC: our max-age is s-maxage with fallback to maxage, and our grace is stale-while-revalidate.
In SC: max-age's format is X[+Y], where X is our max-age, and Y is our optional grace window.

The plan that's starting to form in my head here, interrelated with T124954 (where we have issues with being able to consistently cap the overall life of objects as they traverse cache layers), is something like this:

  1. Start by blocking out the SC headers at our front edge: do not accept Surrogate-Capabilities from the outside, and do not forward any Surrogate-Control to the outside world.
  2. At least initially, block it from the applayer similar on the backend fetch side, so we can deal with inter-cache issues first.
  3. On reception of a response from an application backend, process Cache-Control/Expires (happens in Varnish already today, but we can also re-process...) to determine the effective max-age. Forward this on to upper caches as Surrogate-Control: max-age, after having applied our TTL caps to it, leaving CC unmolested.
  4. On inter-cache response reception, ignore CC and re-set the TTL that Varnish set (from CC/Expires) with Surrogate-Control max-age.
  5. Eventually expand on the initial SC-creation in step (3), so that we have a compatibility mode where we translate from CC or accept applayer SC (but again, we apply our own policy restrictions for TTL capping, grace, etc). Then advertise this new information to app devs (MW, RB, etc) how they should start using SC to replace CC for Varnish-control purposes, and how CC should be handled in terms of controlling the outside world (e.g. what does s-maxage really mean given HTTPS and given it's not controlling our own caches?).
Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.
Krinkle added a project: Platform Engineering.

As part of my focus on stability/sustainability, I'd like to try taking this on as part of the Perf Team. I would need support from Traffic, and possibly from CPT as well.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

Krinkle lowered the priority of this task from Medium to Low.Oct 5 2022, 6:31 PM