Using either
- the pywikibot page generator implementation,
- the python implementation on https://wikitech.wikimedia.org/wiki/RCStream, and
- http://codepen.io/Krinkle/full/laucI/
Version: wmf-deployment
Severity: normal
Using either
Version: wmf-deployment
Severity: normal
http://codepen.io/Krinkle/full/laucI/ seems to work most of the time, but about 1/20 I see the following in the network:
ws://stream.wikimedia.org/socket.io/1/websocket/281487980761
Error during WebSocket handshake: Unexpected response code: 502
After that it falls back to xhr-polling with loads of paired POST/GET requests.
If we want to get this up against beta, I have a WIP for that. https://gerrit.wikimedia.org/r/#/c/138312/ Ideas/code welcome for how to allow for beta in our site/family structure.
Merlijn van Deen offered to look into this with me and we were able to identify the problem: the WebSocket handshake requires two round-trips to the server, and the load balancers were configured to distribute incoming requests across backends in a round-robin fashion. Because the requests that make up the initial handshake follow each other in quick succession, the most common case was for one request to be routed to one server, and the follow-up request to be routed to another server, which had not started negotiating a session with the client and was therefore not expecting the request.
This also explains why it sometimes worked: if another client request intervened between the two requests, you'd get routed to the same server and the handshake would succeed.
Giuseppe and I decided to temporarily "fix" this by simply shutting down one of the servers, causing all requests to get routed to the single remaining server. This made the errors go away, validating the diagnosis. The more permanent fix is to use a different scheduling algorithm to make sessions sticky. This is implemented in https://gerrit.wikimedia.org/r/#/c/152960/, which will be deployed in the next few days, most likely.