The timestamp in the filename of the hourly webstatscollector output
files refers to the end of the covered period instead of its start.
So for example:
http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-02/pagecounts-20140206-110000.gz
covers from 2014-02-06 10:00 until 2014-02-06 11:00.
This is confusing for me, as I'd instead expect the above file to
cover
2014-02-06 11:00 until 2014-02-06 12:00.
.
From time to time, I see other people tripping over this as well. For
example when trying to relate the above file to sampled-1000 tsvs,
they'd grep for 2014-02-06T11 in the timestamp field, although they'd
have to grep for 2014-02-06T10 instead.
Could we either document this clearly on
http://dumps.wikimedia.org/other/pagecounts-raw/
or switch the filename to hold the start of the covered period?
Version: unspecified
Severity: normal