Page MenuHomePhabricator

Doubled zero tags in varnish logs
Closed, ResolvedPublic

Description

Beginning with the 2014-03-21 files, zero tags may come doubled, like

zero=250-99;zero=250-99

instead of

zero=250-99

. (I could not find tags with differing MCC MNCs) At least

/a/squid/archive/zero/zero.tsv.log-20140321.gz
/a/squid/archive/sampl mobile/mobile-sampled-100.tsv.log-20140321.gz
/a/squid/archive/sampled/sampled-1000.tsv.log-20140321.gz
/a/log/webrequest/zero/zero.tsv.log-20140321.gz
/a/log/webrequest/mobile/mobile-sampled-100.tsv.log-20140321.gz
Raw data in Hadoop
Hive's webrequest table

are affected.

Since the first occurrence was on 2014-03-21T00:15:41, it might be that

https://gerrit.wikimedia.org/r/#/c/119795/

is relevant (which mangles zero tags and got merged around that time).


Version: unspecified
Severity: normal

Details

Reference
bz62922

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:51 AM
bzimport set Reference to bz62922.
bzimport added a subscriber: Unknown Object (MLST).

Patch was merged, please close the bug if duplicates disappear. Is there an easy way to clean up the logs / hadoop?

I checked on live udp2log stream and no more double zero tags after the
above fix have been merged.