Separate Cache-Control header for proxy and client
Open, LowPublic
Actions

Assigned To

None

Authored By

	tstarling
	May 26 2013, 11:58 AM

Description

MediaWiki has traditionally used the Cache-Control header to control the CDN (i.e. Squid reverse proxy), then the Cache-Control header for clients has been specified in Squid configuration. Specifically, when a certain URL regex matches, the Cache-Control header is stripped out and replaced with the configured header.

This is not ideal, as noted by Gabriel in a comment in the original code. It would be better if MediaWiki specified both headers in its response, so that the URL regex and client Cache-Control header does not need to be maintained in the CDN configuration. Originally, this would have required a Squid patch, but now that we are switching to Varnish, the feature can be implemented with VCL.

Specifically, MW should send a Client-Cache-Control header which Varnish will rewrite to Cache-Control as appropriate.

Outcome

Less MediaWiki-related configuration needed in Varnish VCL.
Consolidate the code logically part of MediaWiki actually inside MediaWiki.
Reduce risk of accidental exposure of the internal cache headers to the outside world in case of bugs in Varnish.

Todo

Decide on the order of the below, and/or decide whether to have MW send both for a while.
Write VCL code to support Surrogate-Control.
Change the Cache-Control headers sent by MediaWiki to send the s-maxage values that are meant for internal use only (require PURGE control) to not be sent as part of that header. And instead send it as a separaet Surrogate-Control header instead.

Details

Reference: bz48835

Related Objects

Mentioned In: T279205: Harden and improve HTTP cache headers and purging in MediaWiki core (Sprint placeholder)
T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan
T124954: Decrease max object TTL in varnishes
T113591: Enable caching for the Mobile Content Service's RESTBase public endpoints
Mentioned Here: T124954: Decrease max object TTL in varnishes

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:32 AM

• bzimport added projects: MediaWiki-General, platformeng.

• bzimport set Reference to bz48835.

• bzimport added a subscriber: Unknown Object (MLST).

tstarling created this task.May 26 2013, 11:58 AM

We could do this the other way around and partially implement the semi-standard (semi because it's from the W3C, not IETF) Surrogate-Control header and leave Cache-Control intact for end-users. Fastly, for example, seems to be suggesting users to use this, so this may be a more compatible with the real world alternative.

(In reply to comment #1)

We could do this the other way around and partially implement the
semi-standard (semi because it's from the W3C, not IETF)
Surrogate-Control header and leave Cache-Control intact for
end-users. Fastly, for example, seems to be suggesting users
to use this, so this may be a more compatible with the real world
alternative.

Varnish uses the Cache-Control header in RFC2616_Ttl(), so I suppose it would be necessary to move the Cache-Control header out to some temporary pseudo-header in vcl_fetch, and to move it back into place in vcl_deliver. While the object is in the cache, Surrogate-Control would be copied into Cache-Control.

Support for pass mode would theoretically be simpler with Surrogate-Control.

Either way, there would have to be some backwards compatible handling in Varnish, to account for the progressive rollout of the new MW code. If Client-Cache-Control/Surrogate-Control is missing, Varnish would have to interpret Cache-Control in the old way.

On the MW side, OutputPage could provide an interface allowing configuration of the mapping of headers:

a) Old $wgUseSquid = false:

Client-Cache-Control -> Cache-Control
Surrogate-Control -> deleted

b) Old $wgUseSquid = true;

Surrogate-Control -> Cache-Control
Client-Cache-Control -> deleted

c) Surrogate-Control scheme:

Surrogate-Control -> Surrogate-Control
Client-Cache-Control -> Cache-Control

d) Client-Cache-Control scheme:

Surrogate-Control -> Cache-Control
Client-Cache-Control -> Client-Cache-Control

We currently don't have a Client-Cache-Control header at all and I don't think we should introduce it now with that name. Introducing just Surrogate-Control and doing the VCL ping-pong you mentioned sounds more sensible to me. We'd need a temporary header to store the client cache control, so we may end up using Client-Cache-Control internally inside VCL as an interim header but I don't see a reason for MediaWiki to use it. i.e. as I see it, the VCL could just be:

sub vcl_fetch {

if (beresp.http.Surrogate-Control) {
  set beresp.http.Client-Cache-Control = beresp.http.Cache-Control
  set beresp.http.Cache-Control = beresp.http.Surrogate-Control
  unset beresp.http.Surrogate-Control
}

}

sub vcl_deliver {

if (resp.http.Client-Cache-Control) {
  set resp.http.Cache-Control = resp.http.Client-Cache-Control
  unset resp.http.Client-Cache-Control
}

}

I don't see any handling that we do now to preserve the backwards compatibility you mentioned. Even if we do and I missed it, we can easily implement it as "else" clauses above, no?

It's a pity that Varnish doesn't natively support Surrogate-Control natively, indeed. Ironically, Squid 3 does in some form :) (so using it inside MediaWiki may be generally useful). I guess we could provide patches to Varnish for the long-term but VCL hacks seem viable in the short-term.

Note that the standard specifies a Surrogate-Capabilities request header to signal the capability to handle Surrogate-Control. We could set it in Varnish and MediaWiki could check for it, so you may avoid a configuration option.

Also note that the same Surrogate-Capabilities/Control mechanism could be also used to signal ESI back and forth (this is defined in the spec). Yuri has used X-Force-ESI (request) and X-Enable-ESI (response) for this purpose in the mobile caches for his ESI testing. We could deprecate those in favor of a unified Surrogate handling by core, especially while we move in the direction of doing ESI.

Immediate applications:

Normal page views (vcl_deliver in text-frontend.inc.vcl.erb)
Mobile page views (vcl_deliver in mobile-frontend.inc.vcl.erb)

Also, the use of Cache-Control in vcl_fetch in wikimedia.vcl.erb and in vcl_fetch in text-backend.inc.vcl.erb would have to be updated. Some care would have to be taken to ensure that MW does not accidentally send a public Surrogate-Control on responses with private data, where CC:private is currently sent and assumed to be sufficient. Maybe CC:private should override Surrogate-Control.

Aaron suggests that the feature could be used to allow private caching of resources delivered to logged-in users.

Note that, contrary to what I implied in comment #2, Surrogate-Control does not have the same format as Cache-Control. In particular http://www.w3.org/TR/edge-arch specifies the use of the no-store token and does not recognise no-cache or private.

greg edited projects, added MediaWiki-Core-Team; removed platformeng.Dec 8 2014, 10:15 PM

bd808 edited projects, added Varnish; removed MediaWiki-Core-Team.Apr 8 2015, 11:42 PM

bd808 set Security to None.

• mobrovac subscribed.Oct 1 2015, 4:19 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2015, 4:19 PM

faidon updated the task description. (Show Details)Oct 1 2015, 4:19 PM

faidon edited projects, added acl*sre-team, Traffic; removed Varnish.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptOct 1 2015, 4:19 PM

• mobrovac mentioned this in T113591: Enable caching for the Mobile Content Service's RESTBase public endpoints.Oct 1 2015, 4:19 PM

BBlack mentioned this in T124954: Decrease max object TTL in varnishes.May 5 2016, 4:17 PM

Note also that both CC and SC have grace-mode information as well.

In CC: our max-age is s-maxage with fallback to maxage, and our grace is stale-while-revalidate.
In SC: max-age's format is X[+Y], where X is our max-age, and Y is our optional grace window.

The plan that's starting to form in my head here, interrelated with T124954 (where we have issues with being able to consistently cap the overall life of objects as they traverse cache layers), is something like this:

Start by blocking out the SC headers at our front edge: do not accept Surrogate-Capabilities from the outside, and do not forward any Surrogate-Control to the outside world.
At least initially, block it from the applayer similar on the backend fetch side, so we can deal with inter-cache issues first.
On reception of a response from an application backend, process Cache-Control/Expires (happens in Varnish already today, but we can also re-process...) to determine the effective max-age. Forward this on to upper caches as Surrogate-Control: max-age, after having applied our TTL caps to it, leaving CC unmolested.
On inter-cache response reception, ignore CC and re-set the TTL that Varnish set (from CC/Expires) with Surrogate-Control max-age.
Eventually expand on the initial SC-creation in step (3), so that we have a compatibility mode where we translate from CC or accept applayer SC (but again, we apply our own policy restrictions for TTL capping, grace, etc). Then advertise this new information to app devs (MW, RB, etc) how they should start using SC to replace CC for Varnish-control purposes, and how CC should be handled in terms of controlling the outside world (e.g. what does s-maxage really mean given HTTPS and given it's not controlling our own caches?).

Krinkle subscribed.May 26 2016, 9:11 PM

• ema subscribed.May 27 2016, 2:52 PM

• Gilles subscribed.Jun 8 2016, 12:58 PM

• ema moved this task from Backlog to Caching on the Traffic board.Sep 30 2016, 2:33 PM

BBlack mentioned this in T124418: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan.May 30 2017, 11:08 PM

• Phabricator_maintenance moved this task from Backlog to Acknowledged on the SRE board.Jan 26 2019, 7:48 PM

Aklapper removed a subscriber: • wikibugs-l-list.Apr 12 2020, 9:55 AM

Krinkle edited projects, added MediaWiki-libs-BagOStuff, Performance-Team (Radar); removed MediaWiki-General.Apr 12 2020, 8:55 PM

Krinkle moved this task from General to HTTP Cache on the MediaWiki-libs-BagOStuff board.

As part of my focus on stability/sustainability, I'd like to try taking this on as part of the Perf Team. I would need support from Traffic, and possibly from CPT as well.

Krinkle claimed this task.Apr 12 2020, 9:00 PM

Krinkle moved this task from Watching to Perf recommendation on the Performance-Team (Radar) board.

• Pchelolo moved this task from Inbox to Tracking/Watching on the Platform Engineering board.Apr 13 2020, 1:35 PM

Krinkle removed Krinkle as the assignee of this task.Apr 13 2020, 3:10 PM

Krinkle edited projects, added Performance-Team; removed Performance-Team (Radar).

Krinkle moved this task from Inbox, needs triage to To-do: Goals, prioritized next 4 Quarters on the Performance-Team board.

Krinkle mentioned this in T279205: Harden and improve HTTP cache headers and purging in MediaWiki core (Sprint placeholder) .Apr 2 2021, 9:25 PM

BBlack moved this task from Caching to Icebox-Temp on the Traffic board.Oct 8 2021, 5:29 PM

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!