Page MenuHomePhabricator

mediawiki & mediawiki/job cgroups creation should be moved to limit.sh
Open, LowPublic

Description

We currently contain image scaling jobs into cgroups. We have an upstart script in puppet (/modules/mediawiki/files/cgroup/mw-cgroup.conf) that basically does:

pre-start script

mkdir -p /sys/fs/cgroup/memory/mediawiki
mkdir -m 0777 /sys/fs/cgroup/memory/mediawiki/job
echo "/usr/local/bin/cgroup-mediawiki-clean" > /sys/fs/cgroup/memory/release_agent

end script

When cgroup-bin gets reconfigured e.g. during an upgrade, the cgroups go away (that looks like a bug of its own?) and the upstart job "mw-cgroup" is never re-run again, since it was already in the "started" upstart state.

In the meantime, thumbnailing jobs fail since they can't create their own job cgroup as the parent hierarchy (mediawiki/job) doesn't exist.

Although we could do all kinds of upstart tricks (stop on cgconfig stop for example), I can't see a reason on why limit.sh can't check for the existence of mediawiki & mediawiki/job and if they don't exist, create them itself.

This would nicely solve this and it'd be far more resilient.

Note that the above issue produced a complete thumbnail outage for the past hour or so and it is bound to happen again on the next cgroup-bin upgrade.


Version: 1.22.0
Severity: normal

Details

Reference
bz53800

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:53 AM
bzimport set Reference to bz53800.
bzimport added a subscriber: Unknown Object (MLST).

jgerber wrote:

one reason this is an upstart script is that its run as root.
can you also restart it on the videoscalers, they are also out.

Change 83067 had a related patch set uploaded by J:
restart mw-cgroup on cgconfig restart

https://gerrit.wikimedia.org/r/83067

Change 83067 merged by Faidon Liambotis:
restart mw-cgroup on cgconfig restart

https://gerrit.wikimedia.org/r/83067

Sure, I guess this works too.

Maybe it would be better to use cgconfig.conf for this? It's better to use the standard configuration system than to start a wheel war with it, right?

jgerber wrote:

cgconfig.conf is not flexible enough to accommodate the current setup,
if its possible to rework the cgroups use to fit within the options cgconfig.conf allows,
moving it to that would be an option.

cgconfig.conf can be used to limit the overall resources for a group,
but not to have per process limits within a group.
afaik it also does not provide an option to install a release agent.

given those limitations having an upstart job that sets up the cgroups, as we do now, seams to be the best option. with the merged change, restarts are also no longer a problem.
not sure i would call that a wheel war - its more that there was a bug in mw-cgroup.conf that got fixed.