Page MenuHomePhabricator

Logging: emit structured logs via node-bunyan
Closed, ResolvedPublic

Description

From Asher's email:
"Looking over parsoid logs and the code that creates it, I think that
refactoring could help deliver some of what I'd like without adding any
complexity. Parsoid already logs what's being parsed, request urls sent to
varnish, and parse times in ms, written to a mix of stdout and stderr. The
formatting could be standardized and/or the data combined into an access
log which we could parse for stats. It also wouldn't take much to use
node's dgram module to either udplog this to the mediawiki log server, send
stats directly to a statsd daemon, or both. This could indeed be done by
an engineer outside of parsoid team."

Gabriel's response:
"If you spec a format that is easily consumable by tools you would like
to use, then I would be happy to change the few lines that do our
current minimalistic logging to fit it. Apart from regular request
logging and cache requests (currently logged for debugging purposes)
we'd like to keep the ability to log exceptions. It probably makes sense
to split things up into request and error logs at that point."


Version: unspecified
Severity: normal
Whiteboard: incident-report

Details

Reference
bz49762

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:03 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz49762.

@Asher: Could you spec the log format you prefer?

I think your options are as follows:

For request / error / exception logging, you can send datagrams to fluorine.eqiad.wmnet:8420 in the format of '<sequence id>\t<log name> <log message>', where sequence ID is a uint64_t number that increments with every transmission.

(At least in theory. The sequence ID can be used as a basis for log sampling and for detecting packet loss, but neither is going to be useful for you, so I don't think there'd be any practical consequences to sending some constant number instead.)

The udp2log instance on fluorine (which is the MediaWiki log aggregator for the production cluster) will write each log message to /a/mw-log/<log file>.log, which gets rotated and archived daily. (It was formerly the case that someone with root would have to touch the file once for log2udp to start writing the log stream, but I am told this is no longer required. YMMV.)

Because log messages go into flat files, this doesn't buy you much by way of fancy queryability. The primary benefit would likely be an increased chance that someone unfamiliar with Parsoid would be able to find the log files in case of an outage.

There are also some tentative plans for writing this log data into a log analysis tool like logstash, and by logging to fluorine you make it more likely that your logs would be ported over for you at some point without you needing to do anything about it.

If you want the ability to slice and dice the log data using some high-level query language, you can create an event schema on metawiki, then log JSON blobs confirming to that schema via UDP to vanadium.eqiad.wmnet:8421. Your schema will be used to create both a MySQL table on db1047 and a MongoDB collection on vanadium and each entry will be written to both. See https://meta.wikimedia.org/wiki/Schema:MobileWebCentralAuthError for an example.

For logging stats (like counters and time measurements), use StatsD! There's an instance on the cluster that writes to either/both Ganglia and Graphite. The StatsD protocol is also UDP-based, and it is extremely simple: see https://github.com/b/statsd_spec for the format. You'll get automatically-generated graphs representing various summary figures computed at a regular interval.

  • Bug 58352 has been marked as a duplicate of this bug. ***

Change 109841 had a related patch set uploaded by Cscott:
Support structured logging.

https://gerrit.wikimedia.org/r/109841

Change 109846 had a related patch set uploaded by Cscott:
Support structured logging.

https://gerrit.wikimedia.org/r/109846

Change 109841 abandoned by Cscott:
Support structured logging.

Reason:
Abandoned in favor of https://gerrit.wikimedia.org/r/109846

https://gerrit.wikimedia.org/r/109841

Change 139196 had a related patch set uploaded by Marcoil:
Bug 49762: Logging: emit structured logs via node-bunyan

https://gerrit.wikimedia.org/r/139196

Change 139196 abandoned by Marcoil:
Bug 49762: Logging: emit structured logs via node-bunyan

Reason:
Abandoned in favor of a different design with Parsoid's Logger configuration driving the output instead.

https://gerrit.wikimedia.org/r/139196

Change 146130 had a related patch set uploaded by Marcoil:
Bug 49762: Logging - emit structured logs via node-bunyan

https://gerrit.wikimedia.org/r/146130

Change 146130 merged by jenkins-bot:
Bug 49762: Logging - emit structured logs via node-bunyan

https://gerrit.wikimedia.org/r/146130

The support code for logging using node-bunyan has been merged into master.