We currently contain image scaling jobs into cgroups. We have an upstart script in puppet (/modules/mediawiki/files/cgroup/mw-cgroup.conf) that basically does:
pre-start script
mkdir -p /sys/fs/cgroup/memory/mediawiki mkdir -m 0777 /sys/fs/cgroup/memory/mediawiki/job echo "/usr/local/bin/cgroup-mediawiki-clean" > /sys/fs/cgroup/memory/release_agent
end script
When cgroup-bin gets reconfigured e.g. during an upgrade, the cgroups go away (that looks like a bug of its own?) and the upstart job "mw-cgroup" is never re-run again, since it was already in the "started" upstart state.
In the meantime, thumbnailing jobs fail since they can't create their own job cgroup as the parent hierarchy (mediawiki/job) doesn't exist.
Although we could do all kinds of upstart tricks (stop on cgconfig stop for example), I can't see a reason on why limit.sh can't check for the existence of mediawiki & mediawiki/job and if they don't exist, create them itself.
This would nicely solve this and it'd be far more resilient.
Note that the above issue produced a complete thumbnail outage for the past hour or so and it is bound to happen again on the next cgroup-bin upgrade.
Version: 1.22.0
Severity: normal