Page MenuHomePhabricator

$wgSquidServersNoPurge should support CIDR ranges
Closed, ResolvedPublic

Description

Yes, I'm aware that there is a TrustedXFF extension, though it seems like
there ought to be a way to specify a relatively small number of proxy IP
ranges using CIDR notation in core, directly in $wgSquidServersNoPurge.
Conceivably, this could be used to shorten lists such as squid.php.


Version: 1.23.0
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57021

Details

Reference
bz52829

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:13 AM
bzimport set Reference to bz52829.

Mark has asked for this as well as it would make dealing with IPv6 in the varnish clusters easier for Ops.

Change 94186 had a related patch set uploaded by BryanDavis:
Support CIDR ranges in $wgSquidServersNoPurge

https://gerrit.wikimedia.org/r/94186

Change 94186 merged by jenkins-bot:
Support CIDR ranges in $wgSquidServersNoPurge

https://gerrit.wikimedia.org/r/94186

Change 94392 had a related patch set uploaded by BryanDavis:
Support CIDR ranges in $wgSquidServersNoPurge

https://gerrit.wikimedia.org/r/94392

Change 94392 merged by jenkins-bot:
Support CIDR ranges in $wgSquidServersNoPurge

https://gerrit.wikimedia.org/r/94392

Change 94396 merged by jenkins-bot:
Support CIDR ranges in $wgSquidServersNoPurge

https://gerrit.wikimedia.org/r/94396

So, because this checked against the list we already have, $wgSquidServersNoPurge, which is currently 225 entries of single IPs, this ran a CIDR match up to 225 times, which in turn increased the appserver load by about 20-30% -- the CPU cost could be differ a lot depending on hour of the day (-> service region -> datacenter -> list order :)).

This was deployed yesterday and we had to revert today, which brought appservers down from 80% to 50% usage and API appservers from 60% to 30%.

Reedy optimized this check with https://gerrit.wikimedia.org/r/#/c/95163/ which Antoine reviewed and merged, but hasn't deployed yet. We could also reduce our load in another way, by aggregating our list to CIDR and cutting it down to 10-15 entries at most (which was the original intention anyway).

Since the isInRange seems to be expensive, though, I'd feel more comfortable if someone took a closer look and optimized the call (e.g. by making it be *just* CIDR with a simple bitwise operation, not ranges in general) and/or made a separate array for CIDR ranges as it sounds pretty silly to do such expensive checks on what is known to have been a list of single IP addresses until now.

As a summary, we have to apply on the wmf branches:

Support CIDR ranges in $wgSquidServersNoPurge
https://gerrit.wikimedia.org/r/#/c/94186/
f111b2687c894bd744f53edff4ae049ddb48c59a

AND

Short circuit $wgSquidServersNoPurge iteration if ip is listed
https://gerrit.wikimedia.org/r/#/c/95163/
b89355c27035be293f6b28d32239fe8879b7e46c

Then later on we can introduce CIDR ranges in operations/mediawiki-config.git which should probably be another bug.

(In reply to comment #8)

So, because this checked against the list we already have,
$wgSquidServersNoPurge, which is currently 225 entries of single IPs, this
ran
a CIDR match up to 225 times, which in turn increased the appserver load by
about 20-30% -- the CPU cost could be differ a lot depending on hour of the
day
(-> service region -> datacenter -> list order :)).

This was deployed yesterday and we had to revert today, which brought
appservers down from 80% to 50% usage and API appservers from 60% to 30%.

Reedy optimized this check with https://gerrit.wikimedia.org/r/#/c/95163/
which
Antoine reviewed and merged, but hasn't deployed yet. We could also reduce
our
load in another way, by aggregating our list to CIDR and cutting it down to
10-15 entries at most (which was the original intention anyway).

Since the isInRange seems to be expensive, though, I'd feel more comfortable
if
someone took a closer look and optimized the call (e.g. by making it be
*just*
CIDR with a simple bitwise operation, not ranges in general) and/or made a
separate array for CIDR ranges as it sounds pretty silly to do such expensive
checks on what is known to have been a list of single IP addresses until now.

Faidon, thanks very much for investigating; I should have reviewed this more carefully.

The improved versions of the horribly unperformant initial patch were included in 1.23wmf4 and are now deployed to all wikis.

Do we want to introduce CIDR ranges in operations/mediawiki-config.git as well or is using a list of IP good enough for us?

(In reply to comment #12)

Do we want to introduce CIDR ranges in operations/mediawiki-config.git as
well or is using a list of IP good enough for us?

I think that Mark would like to start using CIDR ranges at least for the IPv6 varnish addresses. Actually doing this work was escalated due to some configuration related issues that marked edits as coming from internal IPs following the ulsfo rollout.

(In reply to comment #13)

(In reply to comment #12)

Do we want to introduce CIDR ranges in operations/mediawiki-config.git as
well or is using a list of IP good enough for us?

I think that Mark would like to start using CIDR ranges at least for the IPv6
varnish addresses. Actually doing this work was escalated due to some
configuration related issues that marked edits as coming from internal IPs
following the ulsfo rollout.

I have pinged the ops internal mailing list to figure out whether they are interested in getting CIDR ranges for $wgSquidServersNoPurge.