Page MenuHomePhabricator

http proxy doesn't work on eqiad
Closed, ResolvedPublic

Description

when I setup wmflabs.org dns for any eqiad instance and try to relay it through https://wikitech.wikimedia.org/wiki/Special:NovaAddress I am getting some bad gateway error

this is likely because new instances can't be resolved from old cluster


Version: unspecified
Severity: blocker

Details

Reference
bz62234

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 2:58 AM
bzimport added a project: VPS-Projects.
bzimport set Reference to bz62234.
bzimport added a subscriber: Unknown Object (MLST).

given that this blocks migration of all projects that require http, I push the priority to top

nobody seem to care, pushing even higher priority :P time is running out and people can't migrate instance because of this

Petr: Did you talk to Yuvi before assigning it to him? Is Yuvi aware?
Adding blocker bug 62042.

blocks migration of all projects that require http

How many projects are roughly affected? Priorities should reflect reality...

As other projects seem to have successfully migrated, what is the instance's name and what doesn't work?

Also, [[wikitech:Special:NovaAddress]] neither relays http nor proxies anything else, but just assigns public IPs. The web proxy is managed at [[wikitech:Special:NovaProxy]].

Reassigning to nobody until it's clear what's wrong.

This also seems to be happening for me sometimes, currently trying to move one of my instances and using a temporary domain wdjenkins2.wmflabs.org

Specifically http://wdjenkins2.wmflabs.org/ci returns a 502.

On my instance the web server is set up and watching the access.log not a single drop seems to get through.
Oddly occasionally it will start working and I will see requests etc. coming through and then it will stop again.

Addshore: Just for the obvious questions:

  • This is a web proxy, i. e. [[wikitech:Special:NovaProxy]]?
  • Your security groups are open "enough"? (I. e. not 10.4.0.0/21, but 10.0.0.0/8?)
  • Can you access the webserver on your instance directly from other instances in Labs?

Just to make it clear YES I AM TALKING ABOUT WebProxy, not NovaAddress

And yes it's still broken, so don't ask people if it's really on their side or if it's really broken because it is really broken. so fix it plz.

wm-bot's public logs are down until the moment it is fixed because someone did disable creation of proxies for pmtpa which is only working cluster for proxies now. So no new proxies can be created.

Tim Landscheidt: why should people on NEW cluster open firewall to OLD cluster which is just going to be shut down? There is no point in doing that - it's the proxy servers which are borked and needs to be migrated to NEW cluster so that they see the servers on NEW cluster which is the cluster which matters now (OLD cluster is to be deleted, there is no point in adjusting NEW cluster to make it work with OLD cluster, you need to do that other way)

btw this doesn't work too
80 80 tcp 0.0.0.0/0
443 443 tcp 0.0.0.0/0

Andre: regarding projects - all projects on labs which are using proxy, that are almost all projects with any web server

We have had an ongoing DNS problem in eqiad (well, and in pmtpa a little bit too.) The dns cache for labs instances gets swamped and there are periodic, brief dns outages.

The behavior that I've seen for proxied instances is that things mostly work, much of the time, but periodically I see an nginx gateway error. These errors seem to correspond to the dns outages.

  1. Does that explain this bug, or are we talking about something else here?
  1. Are things any better today? I just cranked up the dnsmasq cache on labnet1001 in hopes of easing this problem.

I see only gateway errors. I don't know when these "windows when it works" are expected but I have never seen them, only errors.

Now gateway was replaced with 404 error

But it's not. I still see the error.

Are you sure the 404s are not coming from your webserver?

http://bots.wmflabs.org/ works for me now, I get the initial 'It works!' apache page. bots.wmflabs.org resolves to 208.80.155.156

I'm about to step onto a plane, but will check in with this bug when I arrive (which will take more than a day :( )

-A

well http://wdjenkins.wmflabs.org/ci for me seesm to be working a hell of allot better than it was when I first commented on this bug and I have done nothing so I can only guess cracking up the dnsmasq cache did something!

Will report back if anything else unexpected happens!

I still do see:

404 Not Found
nginx/1.5.0

This is not from my server

If I access http://bots.wmflabs.org/~wm-bot/logs as per http://permalink.gmane.org/gmane.science.linguistics.wikipedia.technical/75917, I get:

Not Found
The requested URL /~wm-bot/logs was not found on this server.
Apache/2.2.22 (Ubuntu) Server at bots.wmflabs.org Port 80

This doesn't look like nginx.

When I open the same link I get:

404 Not Found
nginx/1.5.0

That doesn't look like apache to me

Maybe it works in midnight only, but now in morning (europe) it doesn't

It remains hard for me to debug this since I can't see the failure here. Peter, can you please specify which exact URL is producing this failure? Is it still just http://bots.wmflabs.org/ ?

Now it works... NOW. But I wouldn't be surprised if tommorow morning it stopped working again :P