Page MenuHomePhabricator

Beta Cluster no longer listens for HTTPS
Closed, DuplicatePublic

Description

https://en.wikipedia.beta.wmflabs.org/ no longer responds at all.

While we have never had a valid cert for beta, we did in the past answer HTTPS URLS, forcing the user to proceed manually over a security warning. As of sometime fairly recently, we no longer listen on HTTPS at all.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=48501
https://bugzilla.wikimedia.org/show_bug.cgi?id=63538

Details

Reference
bz68387

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:39 AM
bzimport set Reference to bz68387.
bzimport added a subscriber: Unknown Object (MLST).

HTTPS is handled using nginx on the varnish server by applying role::protoproxy::ssl::beta

Looking at the puppet run of deployment-cache-text02.eqiad.wmflabs (the text cache) I find:

Debug: Executing '/etc/init.d/nginx status'

So puppet knows about nginx but for some reason does not start it :-(

I attempted to start it manually:

service nginx start

Starting nginx: nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/ssl/private/star.wmflabs.org.key") failed (SSL: error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch)
nginx: configuration file /etc/nginx/nginx.conf test failed

I can't remember how we got the SSL keys deployed for beta :-/ Some ops with better knowledge about SSL than me would probably know.

Apparently broken since April 11 :/

This has been broken as long as we have been in eqiad as far as I know. role::protoproxy::ssl::beta is used to setup the nginx ssl terminators in front of *.beta.wmflabs.org. That in turn applies role::protoproxy::ssl::beta::common which includes install_certificate{'star.wmflabs.org': privatekey => false}. The "privatekey => false" bit there tells puppet not to try and manage the ssl private key install. This is done because labs/private.git does not contain the x509 private key for the real *.wmflabs.org cert (for good reason).

To fix it we need to either:
a) Have an Opsen populate /etc/ssl/private/star.wmflabs.org.key on all of the frontend boxes for beta [0]. This private key must match the public key in operations/puppet [1].
b) Create a self-signed cert for beta and change puppet

  • Put the private key in labs/private/ssl on deployment-salt
  • Put the public key in operations/puppet/files/ssl on deployment-salt (or operations/puppet)
  • Change role::protoproxy::ssl::beta::common to install the new self-signed cert

[0]: https://wikitech.wikimedia.org/w/index.php?title=Special:Ask&q=%5B%5BResource+Type%3A%3Ainstance%5D%5D%5B%5BPuppet+Class%3A%3Arole%3A%3Aprotoproxy%3A%3Assl%3A%3Abeta%5D%5D&p=format%3Dbroadtable%2Flink%3Dall%2Fheaders%3Dshow%2Fsearchlabel%3D%E2%80%A6-20further-20results%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3FInstance+Name%0A%3FPuppet+Class%0A%3FPuppet+Var%0A&sort=Modification+date&order=DESC&limit=50&eq=no
[1]: https://github.com/wikimedia/operations-puppet/blob/production/files/ssl/star.wmflabs.org.pem

(In reply to Bryan Davis from comment #4)

This has been broken as long as we have been in eqiad as far as I know.

FWIW I'm about 90% sure that https to beta labs worked in eqiad. My browser autocompletion URLs for Flow pages on beta were all https and I had a forceHTTPS cookie for beta labs, and as I recall it worked fine until 2-3 weeks ago. I had to manually remove the cookie in order to login and now I'm OK.

Especially given that Fabrice reports it only broke for him yesterday, I'm pretty sure this had been working until pretty recently.

I'm pretty sure it has not/never worked the last month, b/c occasionally I still hit a old https-beta link from my history, which was never working after migration.

This bug would be a duplicate of bug 63538, if this wouldn't have been marked as "resolved fixed" because "there is no need to have two bugs to track the issue"...

  • Bug 73680 has been marked as a duplicate of this bug. ***
mmodell raised the priority of this task from Medium to High.Mar 3 2015, 1:20 PM
mmodell subscribed.

The certificate file is there but it's zero bytes. This is causing puppet failures in beta. We really should do something about it. If ops is unwilling to help then we should create a new self-signed cert.

greg renamed this task from beta labs no longer listens for HTTPS to Beta Cluster no longer listens for HTTPS.Mar 3 2015, 4:00 PM
greg set Security to None.

That sounds about right, as long as we can deal with the browser test fallout (making sure the browser tests have those certs in place so they don't get the stupid warning).

See also: https://phabricator.wikimedia.org/T50501#527951

Shouldn't the option of :

a) Have an Opsen populate /etc/ssl/private/star.wmflabs.org.key on all of the frontend boxes for beta [0]. This private key must match the public key in operations/puppet [1].

Actually be: Create a key for this and place in puppet, and have said key be what you want us to populate the key with manually. We dont like manual processes on cluster, right?

Edit addition: If this is a file that needs to exist on these systems, even post-reinstallation, then it needs to be puppetized accordingly.

I also think suggesting that ops isn't willing to help is both counter-productive, and incorrect. I was pinged on this today by someone not even on the task, and this doesn't have the Blocked-on-Operations tag.

@RobH: I apologize for the snark. Perhaps a bit of an over-reaction to the age and severity of the bug - it had been sitting for over 7 months, had the operations tag, and I don't think I was aware of the existence of blocked-on-* tags at the time I made the comment.

14:21 < kaldari> mutante, Coren: The network doesn't matter, but the browser does. In Firefox http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page tries to redirect to the https site and fails. In Safari http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page loads fine, but https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page fails. I don't have any special Firefox extensions

14:21 < kaldari> mutante, Coren: The network doesn't matter, but the browser does. In Firefox http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page tries to redirect to the https site and fails. In Safari http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page loads fine, but https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page fails. I don't have any special Firefox extensions

This happens when there is a cookie on the client asking for https only wiki access. I don't remember the name of the cookie but I do remember that it is obvious when you look at the active cookie list. I've never figured out exactly what causes the cookie to be set in the first place.

hashar lowered the priority of this task from High to Low.Oct 6 2015, 1:30 PM

I made a certificate for beta on deployment-puppetmaster and replaced the star.wmflabs.org cert with it there (also had a mess around with some other settings to get it to work), then went and changed all the cache instances so they could get the new cert, start nginx and get puppet working again. We probably want to separate that cert from star.wmflabs.org so we can get the patch into the operations/puppet repository.

Change 247587 had a related patch set uploaded (by Alex Monk):
Change star.wmflabs.org to beta certificate

https://gerrit.wikimedia.org/r/247587

Using a real trusted certificate is covered in T50501, T75919 and T97593.