Page MenuHomePhabricator

Timestamp in filename of hourly webstatscollector output refers to end of the covered period instead of its start
Closed, DeclinedPublic

Description

The timestamp in the filename of the hourly webstatscollector output
files refers to the end of the covered period instead of its start.

So for example:

http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-02/pagecounts-20140206-110000.gz

covers from 2014-02-06 10:00 until 2014-02-06 11:00.

This is confusing for me, as I'd instead expect the above file to
cover

2014-02-06 11:00 until 2014-02-06 12:00.

.

From time to time, I see other people tripping over this as well. For
example when trying to relate the above file to sampled-1000 tsvs,
they'd grep for 2014-02-06T11 in the timestamp field, although they'd
have to grep for 2014-02-06T10 instead.

Could we either document this clearly on

http://dumps.wikimedia.org/other/pagecounts-raw/

or switch the filename to hold the start of the covered period?


Version: unspecified
Severity: normal

Details

Reference
bz60957

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:01 AM
bzimport set Reference to bz60957.
bzimport added a subscriber: Unknown Object (MLST).

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1433