Page MenuHomePhabricator

Support SPDY
Closed, ResolvedPublic

Description

This tracks the rollout of the SPDY protocol, and possibly HTTP/2.0, across Wikimedia's production clusters.

Details

Reference
bz33890

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:02 AM
bzimport set Reference to bz33890.
bzimport added a subscriber: Unknown Object (MLST).

Firefox 11 should include spdy, though i think disabled by default.
Some dumb questions:

  • does this require using alternate URLs to access resources, eg spdy://en.wikipedia.org/ ?
  • if so,where would we introduce such URLs other than telling people for testing?
  • isthere any facility for autoupgrading a connection?
  • is the use of multiple domains eg bits & upload an issue for spdy? would this undosome ofits ability to multiplex connections with common flow control or would tnisbe within normal expectations?

(In reply to comment #1)

  • does this require using alternate URLs to access resources, eg

spdy://en.wikipedia.org/ ?

AFAIK the client automatically does the url magic if SPDY is supported (EG: if you visit google in the newer chromes)

I'm seeing surprisingly little user-facing documentation on this. :P

Last section of https://code.google.com/p/mod-spdy/wiki/GettingStarted seems to indicate that users should use https:// URLs at least for mod_spdy, which will serve out SPDY if clients support it and HTTPS otherwise.

What info I can find indicates that SPDY inherently runs over SSL, so I assume that http:// URLs would not 'auto-upgrade' the protocol while https:// would...?

lcarr wrote:

They just released an apache mod for this... http://googledevelopers.blogspot.com/2012/04/add-spdy-support-to-your-apache-server.html

Might be worth testing out in labs, however it looks like this won't affect most of our infrastructure.

Well, I think it could (negatively) affect the squid caches...

Moving down to very lowest priority.

(In reply to comment #4)

Might be worth testing out in labs, however it looks like this won't affect
most of our infrastructure.

any updates?

  • Bug 54986 has been marked as a duplicate of this bug. ***

I'm going to add support for this into the dynamicproxy on labs, and see how that goes.

From bug 33890:

SPDY is supported in Nginx and most modern browsers. Since we are using Nginx
for HTTPS termination already, we should consider enabling SPDY support in it.
This cuts down the overhead per request, which in turn makes it feasible to
de-bundle in particular API requests for less cache fragmentation.

Upping priority as this is something we can do with Nginx without affecting Varnish or anything else in the backend.

That patch has been merged, and dynamicproxy has spdy enabled! \o/

http://spdycheck.org/#pinklake.wmflabs.org

This provides support for spdy/2, which is what nginx supports so far. It is a newer build of nginx packaged for this project (by andrewbogott), and I guess it can eventually be used in production too.

SPDY is an optimization that a lot of our clients can profit from now: http://caniuse.com/spdy

Especially clients on high-latency and low-bandwidth links (mobile, developing countries) benefit from header compression and avoidance of TCP slow-start on concurrent connections.

Are there specific issues that prevent us from applying the simple config change in [1] for a subset of the production traffic for testing?

[1]: https://gerrit.wikimedia.org/r/#/c/87682/2/modules/dynamicproxy/templates/proxy.conf

(In reply to comment #14)

Are there specific issues that prevent us from applying the simple config
change in [1] for a subset of the production traffic for testing?

[1]:
https://gerrit.wikimedia.org/r/#/c/87682/2/modules/dynamicproxy/templates/
proxy.conf

Yeah -- the packaged build of Nginx that we are running in production was not compiled with the requisite flag for SPDY support. But your broader point is correct: it is pretty simple to do, and there is no compelling reason not to do it, AFAIK.

(In reply to comment #15)

(In reply to comment #14)

Are there specific issues that prevent us from applying the simple config
change in [1] for a subset of the production traffic for testing?

[1]:
https://gerrit.wikimedia.org/r/#/c/87682/2/modules/dynamicproxy/templates/
proxy.conf

Yeah -- the packaged build of Nginx that we are running in production was not
compiled with the requisite flag for SPDY support. But your broader point is
correct: it is pretty simple to do, and there is no compelling reason not to
do
it, AFAIK.

Except that our infrastructure isn't really set up in an ideal way to use SPDY. We use multiple endpoints, for instance. It also complicates our SSL plans. Can you use a shared SSL cache? Does it properly support forward secrecy (can we roll the keys? we can't properly with nginx). Etc. etc. etc.

It's not as simple as you're making it out to be.

(In reply to comment #15)

Yeah -- the packaged build of Nginx that we are running in production was not
compiled with the requisite flag for SPDY support. But your broader point is
correct: it is pretty simple to do, and there is no compelling reason not to
do
it, AFAIK.

IIRC SPDY's header compression is vulnerable to the CRIME/BREACH family of attacks. HTTP 2.0 is going to use a different header compression technique for that reason, but implementations of HTTP 2.0 aren't really done yet AFAIK. If we deploy SPDY, we should turn off header compression.

@ori, ah, I see. I guess a backport or the next Ubuntu LTS upgrade (three months from now?) could help here.

@faidon: Is the main complication becoming tied to nginx? With SPDY being an application-level protocol I would be surprised if it affected TLS layer issues.

@Roan: According to [1] "The nginx web-server was not vulnerable to CRIME since 1.0.9/1.1.6 (October/November 2011) using OpenSSL 1.0.0+, and since 1.2.2/1.3.2 (June / July 2012) using all versions of OpenSSL". Disabling header compression sounds like a prudent measure though. Even without header compression SPDY saves bandwidth by avoiding re-sending identical headers for each request.

According to [2] BREACH is not specific to SPDY; it rather applies to all uses of TLS.

[1]: https://en.wikipedia.org/wiki/CRIME_(security_exploit)#Mitigation
[2]: https://en.wikipedia.org/wiki/Transport_Layer_Security#CRIME_and_BREACH_attacks

@Ryan: My response directed to Faidon was really for you, in case that wasn't clear from the context.

(In reply to comment #18)

@Roan: According to [1] "The nginx web-server was not vulnerable to CRIME
since
1.0.9/1.1.6 (October/November 2011) using OpenSSL 1.0.0+, and since
1.2.2/1.3.2
(June / July 2012) using all versions of OpenSSL". Disabling header
compression
sounds like a prudent measure though. Even without header compression SPDY
saves bandwidth by avoiding re-sending identical headers for each request.

Which I believe is what HTTP 2.0 does too: only send modified headers, but don't compress their contents.

So I guess this is waiting for a more up to date version of nginx at this point.

(In reply to Ryan Lane from comment #16)

Yeah -- the packaged build of Nginx that we are running in production was not
compiled with the requisite flag for SPDY support. But your broader point is
correct: it is pretty simple to do, and there is no compelling reason not to
do
it, AFAIK.

Except that our infrastructure isn't really set up in an ideal way to use
SPDY. We use multiple endpoints, for instance.

Do you mean us using several domains? Even just using it for the API and bits would already be a good step forward. We can optimize further down the road.

Can you use a shared SSL cache?

Can you describe what this is about?

Does it properly support forward secrecy (can we roll the keys? we can't properly with nginx). Etc. etc. etc.

As far as I can tell nginx supports forward secrecy. Can you describe the issue that you see with key rolling?

(In reply to Gabriel Wicke from comment #21)

So I guess this is waiting for a more up to date version of nginx at this
point.

(In reply to Ryan Lane from comment #16)

Yeah -- the packaged build of Nginx that we are running in production was not
compiled with the requisite flag for SPDY support. But your broader point is
correct: it is pretty simple to do, and there is no compelling reason not to
do
it, AFAIK.

Except that our infrastructure isn't really set up in an ideal way to use
SPDY. We use multiple endpoints, for instance.

Do you mean us using several domains? Even just using it for the API and
bits would already be a good step forward. We can optimize further down the
road.

Can you use a shared SSL cache?

Can you describe what this is about?

Sure. Currently we use source hash in LVS to ensure a client always hits the same frontend SSL server to ensure they always reuse the SSL session. This works ok, but often leads to bugs. For instance, the monitoring servers don't always detect errors, because they are always hitting the same nodes, subsets of users see a problem while others, including us, don't, etc.. Also, it's not possible to weight the load of different servers while using source hash, so we'd really like to switch to weighted round robin.

Part of that is supporting an SSL session cache that spans the SSL nodes. Apache has support for this. Stud has support for this. Nginx does not, so we were considering switching away or adding it. Adding SPDY into this may change things. Does it use the same cache as HTTPS? If we switch to something other than Nginx, will it support SPDY?

Either way, this is actually a more important problem to solve than SPDY, currently.

Does it properly support forward secrecy (can we roll the keys? we can't properly with nginx). Etc. etc. etc.

As far as I can tell nginx supports forward secrecy. Can you describe the
issue that you see with key rolling?

It's actually handled by openssl on initialization. So, until you restart nginx the key isn't actually rotated. This is also a concern with using weighted round robin load balancing, since we'd need to ensure the rotated key is rotated across all the nodes as well. Apache keeps the rotated key on the filesystem (and recommends using shared memory for this), so this is less of a problem with Apache, but we have no way of handling this with nginx, except for restarting the servers, which sucks.

Anyway, we can likely ignore forward secrecy for now since it's basically worthless considering we're very vulnerable to traffic analysis.

(In reply to Ryan Lane from comment #22)

(In reply to Gabriel Wicke from comment #21)

Can you use a shared SSL cache?

Can you describe what this is about?

Sure. Currently we use source hash in LVS to ensure a client always hits the
same frontend SSL server to ensure they always reuse the SSL session. This
works ok, but often leads to bugs. For instance, the monitoring servers
don't always detect errors, because they are always hitting the same nodes,
subsets of users see a problem while others, including us, don't, etc..
Also, it's not possible to weight the load of different servers while using
source hash, so we'd really like to switch to weighted round robin.

Part of that is supporting an SSL session cache that spans the SSL nodes.
Apache has support for this. Stud has support for this. Nginx does not, so
we were considering switching away or adding it. Adding SPDY into this may
change things. Does it use the same cache as HTTPS? If we switch to
something other than Nginx, will it support SPDY?

SPDY sets up a single TCP connection per host (really IP and cert) and then multiplexes all requests over that connection. My understanding of the way we use LVS is that all incoming traffic for a given TCP connection is forwarded to the same backend server even in round robin mode. The single SPDY connection would end up talking to the same backend all the time. So moving towards SPDY might actually reduce the need for a shared SSL cache for most connections.

(In reply to Gabriel Wicke from comment #23)

(In reply to Ryan Lane from comment #22)

(In reply to Gabriel Wicke from comment #21)

Can you use a shared SSL cache?

Can you describe what this is about?

Sure. Currently we use source hash in LVS to ensure a client always hits the
same frontend SSL server to ensure they always reuse the SSL session. This
works ok, but often leads to bugs. For instance, the monitoring servers
don't always detect errors, because they are always hitting the same nodes,
subsets of users see a problem while others, including us, don't, etc..
Also, it's not possible to weight the load of different servers while using
source hash, so we'd really like to switch to weighted round robin.

Part of that is supporting an SSL session cache that spans the SSL nodes.
Apache has support for this. Stud has support for this. Nginx does not, so
we were considering switching away or adding it. Adding SPDY into this may
change things. Does it use the same cache as HTTPS? If we switch to
something other than Nginx, will it support SPDY?

SPDY sets up a single TCP connection per host (really IP and cert) and then
multiplexes all requests over that connection. My understanding of the way
we use LVS is that all incoming traffic for a given TCP connection is
forwarded to the same backend server even in round robin mode. The single
SPDY connection would end up talking to the same backend all the time. So
moving towards SPDY might actually reduce the need for a shared SSL cache
for most connections.

If the connection is broken and a new connection is needed, it'll likely hit another server when using round robin, which means an SSL cache miss. This is pretty common with mobile clients, which is where things matters the most.

(In reply to Ryan Lane from comment #24)

If the connection is broken and a new connection is needed, it'll likely hit
another server when using round robin, which means an SSL cache miss. This
is pretty common with mobile clients, which is where things matters the most.

The difference is that it happens less often with SPDY, as a single connection is going to remain busy & kept alive for longer, and you only pay the setup cost for one connection rather than 6 or so otherwise.

How long is the SSL cache normally kept around / valid?

Let's take a step back: SSL's scaling & performance requirements and SPDY are not things that can be discussed effectively in a BZ bug, I think. There's a lot of work involved, some of which is documented under https://wikitech.wikimedia.org/wiki/HTTPS/Future_work and other that is not (SPDY).

There is going to be most likely a quarterly SSL/SPDY goal with multiple people involved as it spans multiple layers, involves some low-level C coding, has cross-team dependencies etc. It's possible it may even span more than a quarter — there is a lot of work needed to have a properly functioning, scalable infrastructure.

I think it's unlikely it's going to be in this coming quarter's goals, but the priorities have not been set yet so nothing's definite — Gabriel, Ori, Roan and others you're very much welcome to provide input to this process as it relates to your team's goals (SOA, performance etc.) as it would certainly help us prioritize it more effectively.

Such a project will result into multiple bug reports/RT issues and leaving this open as a placeholder and master ticket is fine IMHO. I just don't think we can effectively have such a large discussion here.

@faidon: I agree that a wider discussion is needed to come to a conclusion & make a plan / agree on priorities. Lets use this bug to collect more information for now to inform that discussion.

Nginx lets you specific keepalive timeouts separately for HTTPS? vs. SPDY connections. See keepalive_timeout and spdy_keepalive_timeout. With only a single connection used for SPDY the keepalive can be set significantly higher than the 65s default for HTTPS? without resulting in an excessive number of connections. Combined with around 65% of requests already supporting SPDY [1] this might reduce the need for SSL session caching somewhat.

Also potentially relevant is http://tools.ietf.org/html/rfc5077, with an implementation discussed in http://vincent.bernat.im/en/blog/2011-ssl-session-reuse-rfc5077.html#sharing-tickets. Sadly Safari and old IE versions don't support it, with Safari being the main non-SPDY hold-out. According to https://www.ssllabs.com/ssltest/viewClient.html?name=IE&version=11&platform=Win%208.1 IE 11 does support session tickets.

[1]: http://caniuse.com/spdy

Another bit of info re keep-alives from https://groups.google.com/forum/#!topic/spdy-dev/xgyPztsAKls:

FF keeps SPDY connections alive for 3 minutes using PING frames. Servers can also keep connections alive using PING frames. Not sure if any implementations do that currently. On mobile pings do have a battery cost.

nginx 1.6 now has SPDY 3.1 support.

Good news: The last browser hold-out (Safari) will finally get SPDY support as well:

"Safari supports the latest web standards, including WebGL and SPDY"
http://www.apple.com/pr/library/2014/06/02Apple-Announces-OS-X-Yosemite.html

This should soon increase browser support beyond the 67% currently claimed on http://caniuse.com/spdy.

That's good news indeed.

Of course that 67% figure is bogus, as it assumes SPDY is one protocol, while there are four versions available and browsers & servers each support a different combination of these. So, for example, nginx 1.5.10+ advertises only SPDY 3.1, which is only supported by Firefox 27+, Chrome 28+ etc. Similarly, SPDY/2 support was removed from Firefox 28/Chrome 33, so it's not like a server can stick to a previous version.

This is partially alleviated by automatic browser updates, but it's hardly the same as having that table be predominantly "green" and saying "67% of the browsers support it" :)

(In reply to Faidon Liambotis from comment #26)

There's a lot of work involved, some of which is documented under
https://wikitech.wikimedia.org/wiki/HTTPS/Future_work and other that is not
(SPDY).

There is going to be most likely a quarterly SSL/SPDY goal with multiple
people involved as it spans multiple layers, involves some low-level C
coding, has cross-team dependencies etc. It's possible it may even span more
than a quarter — there is a lot of work needed to have a properly
functioning, scalable infrastructure.

What work is needed? I re-read this ticket (and looked for dependency tickets) and re-skimmed [[wikitech:HTTPS/Future work]], but I'm still unclear on what work is needed to support SPDY on Wikimedia wikis.

I don't think there is anything directly blocking SPDY except upgrading nginx (or switching to something else) and testing it. Both are quite a bit of work in themselves. After that it will be usefull to analyse if enabling it actually helps.

However I also think that it doesn't make sense to upgrade nginx only to then decide to switch to apache because one of the other items from [[wikitech:HTTPS/Future work]] require it. So one implementation needs to be selected and the necessary missing features (like e.g. an distributed session cache for nginx) need to be coded. (Not necessarily in that order. There are a few more variants on the wiki page I ommited for brevity, basically more informed decisions need to be made. All interesting stuff, wish I could spend more time on it.)

faidon renamed this task from support SPDY protocol to Support SPDY.Jan 13 2015, 3:04 PM
faidon updated the task description. (Show Details)
faidon added a project: ops-core.
faidon set Security to None.

This seems like a smart thing to prioritize for the HTTPS-by-default tag, since it has such drastic front-end speed improvements for multiplexing resources. I've never managed an infrastructure like Wikipedia's, but the SPDY module for nginx has shipped for a while and is very easy to turn on.

We've made a conscious decision to prioritize our HTTPS scalability work and turn on SPDY (or rather, HTTP/2.0) very shortly after. You could argue it's part of the same series of steps or a separate step, but in the end, it doesn't really matter; what does matter, is that it's happening right after this work.

The new jessie nginx test install already supports SPDY, and I believe is serving a fraction of the prod traffic: https://spdycheck.org/#cp1008.wikimedia.org

So it looks like we'll gradually get wider SPDY support as the Jessie Nginx installs are being rolled out.

faidon claimed this task.
faidon added a subscriber: BBlack.

@BBlack tackled this while implementing T86648. All HTTP frontends are now running an newer stack and have SPDY enabled. There is a number of subsequent performance enhancements that we can implement because of this (e.g. -somehow- undo our domain sharding, or move to the same service IP + certificate) but we'll track these separately.

gpaumier subscribed.

(Added a link to the English Wikipedia article for people who come here from Tech News and don't know what SPDY is.)