Page MenuHomePhabricator

Set up a server for query relaying
Closed, DeclinedPublic

Description

For example you may want to host a tool that listen on some port, but if you run it on grid the hostname is random.

For this reason you need to have a static hostname (server) that relay the tcp connections to that hostname which the bot is running at.


Version: unspecified
Severity: normal

Details

Reference
bz51936

Related Objects

StatusSubtypeAssignedTask
DeclinedPetrb
Declinedcoren

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:07 AM
bzimport added a project: Toolforge.
bzimport set Reference to bz51936.

For example bot A is listening on port 26242

you always need to be able to connect to somehost:26242

This is not possible now when the bot is running on grid

You can write the hostname the server is running on to a publicly readable file so this isn't a problem. I fear that solving this with ad-hoc iptables would be very complex overkill :-).

where such a file would be hosted, how can you ensure that people would be able to read it from foreign projects and send the query to execution host through firewall?

(In reply to comment #3)

where such a file would be hosted, how can you ensure that people would be
able
to read it from foreign projects and send the query to execution host through
firewall?

You want users to run arbitrary services on ports available from the InterNet? I think that's outside the scope of Tools and should be solved in dedicated Labs projects.

Not only is this outside the scope of tool labs, but it's going to be specifically prohibited; in order to allow the general Wikimedia privacy policy to apply, tools are not allowed to gather IP addresses from their users (which allowing connections from outside would allow).

Tools that need to host publicly-accessible network services must do so from their own project (and subject to the general Labs TOU, including the necessity of posting disclaimers and a lesser privacy policy).

I talked to Ryan_Lane about hipache, and when we implement that (+ similar IP scrubbing, etc), we can have this.

Ok I think that this effectively kills the migration of wm-bot then...

Why does this kill wm-bot migration?

(In reply to comment #8)

Why does this kill wm-bot migration?

Looking at https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots/Documentation/wm-bot, it seems to be fairly easy to translate it to Tools without external access. Just replace the "local sysadmin console" with calls to "jstart wm-bot3" and "jstop wm-bot3" with the added benefit that you don't need to manage a password, but can rely on Tools' users group. For convenience, you can add scripts that start or stop all bot instances.

I don't see a reason why you would need a bouncer because when the bot instance does not have network access, the bouncer will not have either; but if that is a must, you can start one in the start-up script ("wm-bot3" in the above example). This will then be on the localhost.

(In reply to Yuvi Panda from comment #8)

Why does this kill wm-bot migration?

because it requires this feature for it to works. No query relaying, no wm-bot. It's that simple.

If this isn't possible in tools and separate project needs to exist for it, then wm-bot can't be hosted in tools and it needs to be hosted in separate project.

(In reply to Tim Landscheidt from comment #9)

(In reply to comment #8)

Why does this kill wm-bot migration?

Looking at
https://wikitech.wikimedia.org/wiki/Nova_Resource:Bots/Documentation/wm-bot,
it seems to be fairly easy to translate it to Tools without external access.
Just replace the "local sysadmin console" with calls to "jstart wm-bot3" and
"jstop wm-bot3" with the added benefit that you don't need to manage a
password, but can rely on Tools' users group. For convenience, you can add
scripts that start or stop all bot instances.

I don't see a reason why you would need a bouncer because when the bot
instance does not have network access, the bouncer will not have either; but
if that is a must, you can start one in the start-up script ("wm-bot3" in
the above example). This will then be on the localhost.

The reason why bot is using bouncers isn't better stability from network point of view, but because its core is frequently patched and restarted.

Because I don't want wm-bot to reconnect to freenode and rejoin these 180 channels everytime (putting heavy load on freenode as well) I am using bouncers. It has nothing to do with network connectivity, it just prevents annoying quit / joins as well as holes in channel logs.

By the way, this query relaying isn't required for bouncers (it has nothing to do with connection bouncing). It is required for NetCat plugin to work, which is being used in a huge number of channels and is one of best features this bot provides - it actively relay any TCP text messages sent from anything, including simple shell script like

#!/bin/sh

echo "WM-Bot hello world" | nc bots-labs 64834 -w0

loosing this feature just by migrating to tool labs isn't really worth of any migration