Page MenuHomePhabricator

Scaler's stderr should be captured and logged
Closed, ResolvedPublic

Description

The outage caused by the upgrade of cgroup-bin upgrade (see #53800) took a while to figure out and fix.

Running the "convert" job by hand, with the arguments logged in thumbnail.log didn't produce any errors. Running what MediaWiki runs, limit.sh, was immediately revealing but figuring out the right arguments to it took some effort.

I propose that MediaWiki should capture the stdout/err of the convert jobs and log it (possibly the last 2-3 lines). I'm imagining this would also be useful for correlating e.g. imagemagick errors.


Version: unspecified
Severity: normal

Details

Reference
bz53824

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:54 AM
bzimport set Reference to bz53824.

Aaron, could you make a recommendation here? You don't necessarily need to be the one to fix this, but I'd like to get your take before having someone run off and do it.

2013-09-05 21:18:07 mw1160 commonswiki: thumbnail failed on mw1160: error 137 "" from "'/usr/bin/convert' -quality 80 -background white -define jpeg:size=8192x5145 '/tmp/localcopy_72e5b84ae821-1.jpg' -thumbnail '8192x5145!' -set comment 'File source: http://commons.wikimedia.org/wiki/File:Sandro_Botticelli_-_La_nascita_di_Venere_-_Google_Art_Project_-_edited.jpg' -depth 8 -sharpen '0x0.8' -rotate -0 '/tmp/transform_32eba68e9459-1.jpg' 2>&1"

^ That's already in the thumbnail logs. Are you wanting more than that?

limit.sh currently has the line:

if ! mkdir -m 0700 "$MW_CGROUP"/$$; then
        echo "limit.sh: failed to create the cgroup." 1>&2
        exit 1
fi

Given that we use php passthru and throw away stderr, I think it might make sense just to output that to stdout (Or possibly output it to both stdout and stderr).

Change 83974 had a related patch set uploaded by Brian Wolff:
Add way of including all stderr output when executing command

https://gerrit.wikimedia.org/r/83974

Just to clarify the situation here:

We currently output to the log:

sprintf( 'thumbnail failed on %s: error %d "%s" from "%s"',
                wfHostname(), $retval, trim( $err ), $cmd ) );

Where $err is the stdout and stderr from convert. However it does not include the stderr from limit.sh. https://gerrit.wikimedia.org/r/83974 would change this to include the stdout and stderr from limit.sh and the actual command executed.

Change 83974 merged by jenkins-bot:
Add way of including all stderr output when executing command

https://gerrit.wikimedia.org/r/83974

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:24 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:22 AM