Page MenuHomePhabricator

Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon)
Closed, ResolvedPublic

Description

Details

Reference
bz47832

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:20 AM
bzimport added projects: HTTPS, acl*sre-team.
bzimport set Reference to bz47832.
bzimport added a subscriber: Unknown Object (MLST).

Users --> logged-in users? Or all traffic?

(In reply to comment #1)

Users --> logged-in users? Or all traffic?

All traffic, otherwise I'd have said so. :-)

(In reply to comment #2)

(In reply to comment #1)

Users --> logged-in users? Or all traffic?

All traffic, otherwise I'd have said so. :-)

Oh, I assumed you did mean logged in. Maybe we should start with that.

My understanding is we have spare capacity now, but are far from provisioned for all anonymous traffic. Starting with all logged in traffic (perhaps excluding explicit opt-outs) is a natural way to ramp up.

(In reply to comment #3)

Starting with all logged in traffic (perhaps excluding explicit opt-outs) is
a natural way to ramp up.

Yes, agreed. I'd call a bug about forcing Wikimedia logged-in users over HTTPS a blocker (or dependency...) of this bug, in fact.

In addition to spare SSL capacity, some users have said that HTTPS does not work for them (for whatever reason). We'll need to take small bites to eat this elephant.

In order to get this bug moving forward, it would be good to have an associated wiki page or list here on Bugzilla of what needs to be done first. This bug's current "see also"s seem like a good start toward this.

Note: If https://gerrit.wikimedia.org/r/47089 is merged, then both solutions (i.e., forcing logged in users and forcing logged in and anon users) will be possible by just setting $wgSecureGroups to array( 'user' ) and array( '*' ) respectively.

Sensible, overdue.

MZM: Where are users saying https doesn't work for them? I don't see any other unaddressed concerns about implementing this bug or the related bugs.

Tyler commented in #39380 "What about the extreme (or maybe not-so-extreme) case of other wikis who don't have the resources to put their users over HTTPS," - but that does not seem to apply to Wikimedia wikis.

Well, this bug seems a little politically charged...

Are we considering the users that will completely lose access when this action occurs? Some countries completely block HTTPS, but allow HTTP. I dislike eavedropping by governments, but I also dislike not providing access to people too.

The rough steps ops was looking at (but hadn't fully decided upon) were:

  1. Redirect logged in users to HTTPS. Protecting credentials is important. I think forcing HTTPS here is sane.
  2. Expand the HTTPS infrastructure, moving the terminators directly onto the frontends. Put in engineering effort so that we can distribute the SSL cache and switch to a round robin load balancer rather than the source hash one we're currently using.
  3. Slowly soft-enable HTTPS for anons, by default, for smaller projects moving towards soft-enabling it on the larger projects as well. We can do this by using rel=canonical, with the https link being given.
  4. Consider force enabling HTTPS via redirects. This action is difficult. We'll lock out users with this action. I think this should be a community decision (maybe per-wiki) and isn't up for discussion on bugzilla.
  5. If HTTPS is force enabled, then enable HSTS to protect against downgrade attacks.

Maybe we could offer a warning to folks on HTTP, saying that they should consider using HTTPS and that their communications can be monitored? Of course, we shouldn't do this until we can properly handle all anon traffic on HTTPS.

I know this is politically charged, but what's the security goal for forcing anons to use SSL. Stop government from eavesdropping on the communication that they could just as easily get from reading Special:Recentchanges?

Stop government from figuring out which articles somebody is reading? Well that's a goal I support, will SSL actually stop that? Assuming government can monitor the encrypted traffic going down the wire, they should be able to see how much traffic has been transferred. I imagine this somewhat uniquely identifies an article, especially when you consider images. OTOH, I suppose it makes figuring out which article they are reading more difficult, and you can't just snoop for keywords.

OTOH SSL everywhere all the time certainly couldn't hurt. But that's a long process. Step 1 sounds like a reasonable starting point.

[Not a security expert. Let me know if I said anything stupid and wrong :)]

Anons are generally readers. Forcing anons to HTTPS would be to try to prevent eavesdropping. People who edit anon are giving up their privacy. If they wish to stay anonymous while editing they should make an account.

Yes, there's still ways to eavesdrop when using HTTPS (like the example you've mentioned, but it's significantly harder and isn't completely accurate.

I'd also like to point out that users can always choose to use HTTPS if they feel they are being eavesdropped on. The only reason we'd force all anon users over HTTPS is if doing so will force governments to allow HTTPS, which is something we cannot guarantee. Maybe instead of allowing HTTPS, the government in question will just decide serving Wikipedia to its people is too much of a bother to deal with.

(In reply to comment #7)

MZM: Where are users saying https doesn't work for them? I don't see any
other unaddressed concerns about implementing this bug or the related bugs.

As Ryan says in comment 8, some people seem unable to support HTTPS for whatever reason. This matches my experience switching daily-article-l (https://lists.wikimedia.org/mailman/listinfo/daily-article-l) to use HTTPS links; out of 30,000-ish subscribers, a handful wrote in to say that they couldn't access the site any longer. Wikimedia wikis are much too large to not hit the law of large numbers: some percentage of our users will be unable to access the site over HTTPS. I don't think we currently know what percentage that is.

(In reply to comment #5)

In order to get this bug moving forward, it would be good to have an
associated wiki page or list here on Bugzilla of what needs to be done first.
This bug's current "see also"s seem like a good start toward this.

QFT.

In addition to the steps Ryan helpfully lays out in comment 8, I'd add "assessment of the impact of forced HTTPS for anons" somewhere in there, as I think we still don't fully know just how many people such an action would affect. Additional information and data points would be helpful.

I'm thinking that an [[mw:RFC]] or page on the Wikitech wiki would make sense here (to be clear: not to assess whether to implement some of these changes or not, simply to assess technical options to move forward [comment 8 in wiki form, basically]).

(In reply to comment #8)

Thanks, Ryan. I think #1,2,3 in your list are worth pushing for. Call that "A" and #4 & 5 "B".

Implementing A seems more straightforward to plan for, and would cover all logged-in users and many anons. It could be combined with a message to http-only readers (possibly a low-frequency message only seen every Nth page view).

Implementing B seems to call for experiment and discussion, and learning from other global sites that have thought about this.

Some countries completely block HTTPS, but allow HTTP. I dislike
eavedropping by governments, but I also dislike not providing access to
people too.

We might also crosscheck with projects such as verdict that maintain databases over time of what is blocked where; and monitor traffic changes to notice any drops in access.

(In reply to comment #12)
An on-wiki discussion would be helpful. Hopefully it would distinguish between implementing A and B. B involves bikeshed decisions that could prolong any discussion.

Other bugs which might be resolved by a global move to https:
#48572 #47276 #37790 #35760 #34670 #31325 #29898

Edit: s/verdict/herdict/ . Also greatfire and similar country-specific trackers.

(In reply to comment #13)

Other bugs which might be resolved by a global move to https:
#48572 #47276 #37790 #35760 #34670 #31325 #29898

bug 48572, bug 47276, bug 37790, bug 35760, bug 34670, bug 31325, bug 29898.

Since China has already blocked HTTPS access to Wikipedia, forcing all users to use HTTPS would be equivalent to cutting off the site entirely for everyone in mainland China. I don't support this -- I don't think education should be "all or nothing". Wikipedia can do plenty of good in China even with the "three T's" filtered out.

Maybe I could support a geolocated redirect, or a redirect implemented in JavaScript which detects whether HTTPS works before redirecting.

I like to think that if we did enable it, China would mitm the connection and either serve it plaintext inside or spoof the cert. I can't imagine they would want to cut off their users. But I'm not sure if I'm up for playing chicken with them at the moment.

Also, one argument I haven't seen on this bug I keep meaning to add is that HTTPS is not just about privacy (even over HTTPS, it's pretty easy to do traffic analysis and guess what article a user is viewing), it's also about integrity. When a user isn't using HTTPS, anyone between the user and us can make the site say or not say whatever they like. For that reason, I think we owe it to our anons to try and aim for everyone using HTTPS, even if it's in a distant future.

I don't have very strong worries about certificate spoofing except in very targeted uses. It's possible to detect certificate spoofing if we're pinning our certificates (which we should be doing). If a vendor is caught allowing a government to spoof using their authority, they'll get yanked pretty quickly from all browsers.

The targeted cases /are/ particularly worrying, of course, but it's not something we can really control, and that person is likely already screwed.

(In reply to comment #18)

It's possible to detect certificate spoofing if we're pinning
our certificates (which we should be doing).

I agree we *should*. Are we?

(In reply to comment #19)

(In reply to comment #18)

It's possible to detect certificate spoofing if we're pinning
our certificates (which we should be doing).

I agree we *should*. Are we?

Why would or should we be using certificate pinning? Are we really going to require all users to explicitly install the Wikimedia certificate on their browsers?

(In reply to comment #20)

Why would or should we be using certificate pinning? Are we really going to
require all users to explicitly install the Wikimedia certificate on their
browsers?

Of course not. It's done automatically, such as through a HTTP header. See https://tools.ietf.org/html/draft-ietf-websec-key-pinning-07 . This is already implemented at least in Chrome (https://code.google.com/p/chromium/issues/detail?id=78369). There's also a competing standard, http://tack.io/

Interesting! I didn't know this RFC existed. That's pretty cool. Maybe I'll add it to my Extension:SecureSessions (which contains all the session security stuff MW doesn't do :P).

(In reply to comment #22)

Interesting! I didn't know this RFC existed.

That's probably because its not an RFC (yet) :P

neilk wrote:

An idea for dealing with China and other cases where HTTPS is blocked - could there be an 'http://insecure.wikipedia.org'? The inverse of the situation we had with 'https://secure.wikipedia.org'. At least the domain name advertises the problem to the user.

This is far less slick than a geolocated redirect, but in situations where HTTPS is blocked, I don't see any way to preserve usability and security at the same time.

As far as I know, if HTTPS is blocked, we can't redirect them to HTTP, because they won't even reach our server. We could only 'upgrade', by redirecting from HTTP to HTTPS. But by then it's too late; the article they've requested was captured by intermediaries.

Maybe with 'insecure.wikipedia.org', savvy users in places like China would learn the trick, or use browser extensions to fix links. Presumably it would be the 'HTTP Everywhere' extension.

(In reply to comment #24)

An idea for dealing with China and other cases where HTTPS is blocked - could
there be an 'http://insecure.wikipedia.org'? The inverse of the situation we
had with 'https://secure.wikipedia.org'. At least the domain name advertises
the problem to the user.

It was wikiMedia, and no, please let's not do that again.

(In reply to comment #24)

Maybe with 'insecure.wikipedia.org', savvy users in places like China would
learn the trick, or use browser extensions to fix links. Presumably it would
be the 'HTTP Everywhere' extension.

That wouldn't work.

(In reply to comment #24)

An idea for dealing with China and other cases where HTTPS is blocked - could
there be an 'http://insecure.wikipedia.org'? The inverse of the situation we
had with 'https://secure.wikipedia.org'. At least the domain name advertises
the problem to the user.

This is far less slick than a geolocated redirect, but in situations where
HTTPS is blocked, I don't see any way to preserve usability and security at
the
same time.

As far as I know, if HTTPS is blocked, we can't redirect them to HTTP,
because
they won't even reach our server. We could only 'upgrade', by redirecting
from
HTTP to HTTPS. But by then it's too late; the article they've requested was
captured by intermediaries.

Maybe with 'insecure.wikipedia.org', savvy users in places like China would
learn the trick, or use browser extensions to fix links. Presumably it would
be
the 'HTTP Everywhere' extension.

I somewhat like this concept for the advertising part too. We want to make sure that those who are unable to use https (ESPECIALLY if that's because of government interference like china) are able to use the site but we also should (in my mind) make sure that they KNOW they are unable to use it like normal and that their use is less secure because those users are often the catalyst for change.

If we had more time I'd say we should strive for a geolocated test in Hong Kong during Wikimania though I imagine even the suggestion of that makes ops shudder :).

(In reply to comment #26)

(In reply to comment #24)

An idea for dealing with China and other cases where HTTPS is blocked - could
there be an 'http://insecure.wikipedia.org'? The inverse of the situation we
had with 'https://secure.wikipedia.org'. At least the domain name advertises
the problem to the user.

This is far less slick than a geolocated redirect, but in situations where
HTTPS is blocked, I don't see any way to preserve usability and security at
the
same time.

As far as I know, if HTTPS is blocked, we can't redirect them to HTTP,
because
they won't even reach our server. We could only 'upgrade', by redirecting
from
HTTP to HTTPS. But by then it's too late; the article they've requested was
captured by intermediaries.

Maybe with 'insecure.wikipedia.org', savvy users in places like China would
learn the trick, or use browser extensions to fix links. Presumably it would
be
the 'HTTP Everywhere' extension.

I somewhat like this concept for the advertising part too. We want to make
sure
that those who are unable to use https (ESPECIALLY if that's because of
government interference like china) are able to use the site but we also
should
(in my mind) make sure that they KNOW they are unable to use it like normal
and
that their use is less secure because those users are often the catalyst for
change.

If we had more time I'd say we should strive for a geolocated test in Hong
Kong
during Wikimania though I imagine even the suggestion of that makes ops
shudder
:).

Yes, please let's not consider this. We had do so nasty apache stuff to make that work and I'm glad we no longer do it.

Just to let everyone know, HTTPS traffic to desktop Wikimedia sites is no longer blocked in China. However, HTTPS to the mobile sites is still disrupted. I say this from personal experience, but here is a site to back it up: https://en.greatfire.org/https/en.wikipedia.org
https://en.greatfire.org/https/en.m.wikipedia.org

Just to let everyone know, HTTPS traffic to desktop Wikimedia sites is no longer blocked in China. However, HTTPS to the mobile sites is still disrupted. I say this from personal experience, but here is a site to back it up: https://en.greatfire.org/https/en.wikipedia.org
https://en.greatfire.org/https/en.m.wikipedia.org

It seems mobile-lb.eqiad is blocked in China, but esams and ulsfo are not. How about mapping Chinese users to one of these two clusters?
https://en.greatfire.org/https/mobile-lb.eqiad.wikimedia.org
https://en.greatfire.org/https/mobile-lb.ulsfo.wikimedia.org
https://en.greatfire.org/https/mobile-lb.esams.wikimedia.org

We can try in case it's a lingering mistake, but if it's intentional I'm sure they'll just adapt on their side.

Well, CN belongs in ulsfo anyway, geographically speaking. It stayed in eqiad during the ulsfo rollout on purpose, in order to not poke the (firewall) beast :)

We can move CN to ulsfo now. The only real risk is that desktop HTTPS being unblocked is a mistake and they'll realize it as soon as we shuffle things around :)

The switch of CN to ulsfo was done in https://gerrit.wikimedia.org/r/#/c/199621/ about half an hour ago. Our TTLs are considerably shorter than that, so the change should already be in effect in cases where DNS caching isn't broken. We'll see how it goes :)

Great! https://en.m.wikipedia.org now works in China. According to the report on zhwiki and tests on greatfire.org, only http://zh.wikisource.org/, http://zh.wikinews.org/, and http://ug.wikipedia.org/ are still blocked, which I think are blocked based on URL rather than IP. All other Wikimedia sites, (http and https) are not blocked now. Thank you!

I see that the HTTPS infrastructure is now able to serve HTTPS-by-default (https://www.mediawiki.org/w/index.php?title=Wikimedia_Engineering%2F2014-15_Goals&diff=1535407&oldid=1519260). That's great! Any updates on the actual implementation? Thanks.

With the completion of the scaling work, whether and when we flip the switch to force HTTPS for more (or all) wikis is mostly an analysis and decision-making issue now rather than a technical one. I believe we're currently still trying to gather some insightful/useful data on user impact (in terms of latency, mostly) to inform those decisions. Of course there's always a risk that the data could lead us back to something like "more software changes are needed before the tradeoffs are acceptable to turn this on globally."

I see. Just for your reference, the English Wikipedia community had a discussion (https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)/Archive_120#Should_Wikipedia_use_HTTPS_by_default_for_all_readers.3F) about this and while many points were made on both sides, the conclusion from that community was that this issue is to be decided by the WMF/developers. So good luck with your analysis, and I hope that any possible issues can be resolved soon so that we can make the switch in the near future. Thanks.

Can we start to force HTTPS for all users from the US soon? They should have low latency impact, since they are close to the datacenters. Do we have a timeline now? One thing is that once we redirect to HTTPS for US users, Google will update the indexed Wikipedia links to HTTPS as well.

Turning this on doesn't really go by-user-location, it goes by-wiki or groups of wikis (mostly because of HSTS, but regardless, geoip isn't reliable enough to make a hard distinction like this, and our datacenter locations aren't strictly tied to user location either).

Personally, I'm all for turning this on as quickly as we can and could write paragraphs on why we should in the net of all current considerations. However, there's still some ongoing research and debate happening within the broader WMF as a whole regarding the impact of these decisions on all of the other things we value aside from privacy.

Personally, I'm all for turning this on as quickly as we can and could write paragraphs on why we should in the net of all current considerations. However, there's still some ongoing research and debate happening within the broader WMF as a whole regarding the impact of these decisions on all of the other things we value aside from privacy.

Can we please be more specific here? Who is doing research and debating and when are such activities expected to finish? My concern is that vague comments will delay us a lot longer. In my opinion, at this point, given the technical all-clear, we need to figure out what the next specific action items are.

My understanding is that we could do a slow ramp-up by setting the canonical URL (via a <meta> tag in the <head> of the HTML page source) to specify HTTPS. This would signal our preference for HTTPS to Google and other crawlers. Would this be reasonable now across Wikimedia wikis?

My understanding is that we could do a slow ramp-up by setting the canonical URL (via a <meta> tag in the <head> of the HTML page source) to specify HTTPS. This would signal our preference for HTTPS to Google and other crawlers. Would this be reasonable now across Wikimedia wikis?

As I've stated before, personally I'd prefer to do the hard redirects before the rel=canonical during the initial rollout process, simply because it's easier to take back in realtime if anything doesn't work out as planned in terms of load and capacity. We already have a process down for this stuff. It's not my place to speak to the rest, but I assure you people are aware and working on it.

As I've stated before, personally I'd prefer to do the hard redirects before the rel=canonical during the initial rollout process, simply because it's easier to take back in realtime if anything doesn't work out as planned in terms of load and capacity. We already have a process down for this stuff. It's not my place to speak to the rest, but I assure you people are aware and working on it.

Actually even if we specify rel=canonical, Google may still index HTTPS once we start to redirect from HTTP to HTTPS.

From https://support.google.com/webmasters/answer/139066?hl=en&rd=1#https:

Although our systems prefer HTTPS pages over HTTP pages by default, you can ensure this behavior by taking any of the following actions:
Add 301 or 302 redirects from the HTTP page to the HTTPS page.
•Add a rel="canonical" link from the HTTP page to the HTTPS page.
•Implement HSTS.

So I think if you really don't want Google to index HTTPS at first, we need to explicitly exclude Google from the HTTP-to-HTTPS redirect.

My point about realtime here is that when we first turn on a 302 for a wiki or group of wikis, we can see the effect within minutes. Seconds, even. And we can undo that on similar timescales, none of which are concerned-about-Google-much timescales.

Information: Mozilla announced plan to deprecate Non-Secure HTTP: https://blog.mozilla.org/security/2015/04/30/deprecating-non-secure-http/

HTTPS is the way to move forward. I agree with MZMcBride: "Can we please be more specific here? Who is doing research and debating and when are such activities expected to finish? My concern is that vague comments will delay us a lot longer. In my opinion, at this point, given the technical all-clear, we need to figure out what the next specific action items are." What are the issues that need to be discussed before we flip the switch? If there are potential problems, how should we go about solving them?

My concern is that vague comments will delay us a lot longer.

The situation is not vague or indefinite internally - the process is simply not taking place in public view at this particular stage, and it's a meta process about the process, etc. It will return to the world of real actionables in phab eventually. I would be very surprised if we don't come to a decision (to set a timeline for enabling, or to define a set of clear, rational, solvable tasks that must be completed before enabling) within the next two weeks, but my predictive powers often fall short of infinite.

The situation is not vague or indefinite internally - the process is simply not taking place in public view at this particular stage, and it's a meta process about the process, etc. It will return to the world of real actionables in phab eventually. I would be very surprised if we don't come to a decision (to set a timeline for enabling, or to define a set of clear, rational, solvable tasks that must be completed before enabling) within the next two weeks, but my predictive powers often fall short of infinite.

And why precisely is it not taking place in the public view. Sure there's things that need to be secret, prices of hardware, etc. But deciding when and if to flip the switch hardly seems one of them. What is the threat model you're trying to defend against by being secretive?

BBlack claimed this task.

This is resolved now for all of the primary domains on the primary production clusters (text, mobile, upload). There are many smaller issues and other sites to address that don't live on the primary clusters, but I think this particularly long-winded task is ready to die now.