Page MenuHomePhabricator

No sampled-1000 tsv file for 2014-02-06 on stat1002
Closed, ResolvedPublic

Description

Although new tsv files typically appear some time after 08:00 on stat1002,
today's (2014-02-06) sampled-1000 tsv file is not yet there [1] some 4 hours
afterwards.

Due to the file not being there, daily jobs that rely on the daily files
being in place break. For example, today's wikipedia-zero run just failed.

[1]


qchris@stat1002 0 12:08:05
cwd: ~
ll /a/squid/archive/sampled/sampled-1000.tsv.log-201402*
-rw-r--r-- 1 stats stats 646898016 Feb 1 06:25 /a/squid/archive/sampled/sampled-1000.tsv.log-20140201.gz
-rw-r--r-- 1 stats stats 646896549 Feb 2 06:25 /a/squid/archive/sampled/sampled-1000.tsv.log-20140202.gz
-rw-r--r-- 1 stats stats 739443262 Feb 3 06:25 /a/squid/archive/sampled/sampled-1000.tsv.log-20140203.gz
-rw-r--r-- 1 stats stats 728781897 Feb 4 06:25 /a/squid/archive/sampled/sampled-1000.tsv.log-20140204.gz
-rw-r--r-- 1 stats stats 723531018 Feb 5 06:25 /a/squid/archive/sampled/sampled-1000.tsv.log-20140205.gz


Version: unspecified
Severity: normal

Details

Reference
bz60955

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:01 AM
bzimport set Reference to bz60955.

Checking on emery shows that the sampled-1000 file is ready on emery.
Let's see if it syncs over toworrow.

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Mingle card https://wikimedia.mingle.thoughtworks.com/projects/analytics/cards/cards/1432

Going through recent changes in the puppet repo, it seems the merge of
e405e7e622cb1a275b3c2acb65aad0ee4c1d7729 in the puppet repo is related:
https://gerrit.wikimedia.org/r/#/c/110382

Nuria has made some changes in the mingle ticket to fix this; will confirm tomorrow that issue is resolved.

Change 112233 had a related patch set uploaded by QChris:
Mark 2014-02-05, and 2014-02-06 as bad dates

https://gerrit.wikimedia.org/r/112233

Verified that there are sampled logs for the last couple of days:

719726187 Feb 9 06:25 sampled-1000.tsv.log-20140209.gz
811851138 Feb 10 06:25 sampled-1000.tsv.log-20140210.gz

Bug can be closed.

Good to see the new files on stat1002 again! Thanks.

But since this was no outtake but a filter change, we should have
good tsvs lying around on emery for stat1002's missing/bad files.
Could you bring the good files over so we can rerun the jobs that
failed due to the missing/bad tsvs?

Also, could you please update the corresponding row in
documentation table on

https://wikitech.wikimedia.org/wiki/Analytics/Requests_stream

?

I will put in an R2 request to get access to emery

Missing file has been restored.

(In reply to nuria from comment #10)

Missing file has been restored.

No more missing tsvs. Thanks!

But comment 7 mentions two other things, which did not yet
happen. Hence, reopening.

(In reply to christian from comment #7)

[ bad files ]

Just doing a plain “ll” on stat1002, the file for 2014-02-07 looks wrong.


qchris@stat1002 0 10:43:43
cwd: ~
cd /a/squid/archive/sampled ; ll sampled-1000.tsv.log-2014020*
-rw-r--r-- 1 stats stats 646898016 Feb 1 06:25 sampled-1000.tsv.log-20140201.gz
-rw-r--r-- 1 stats stats 646896549 Feb 2 06:25 sampled-1000.tsv.log-20140202.gz
-rw-r--r-- 1 stats stats 739443262 Feb 3 06:25 sampled-1000.tsv.log-20140203.gz
-rw-r--r-- 1 stats stats 728781897 Feb 4 06:25 sampled-1000.tsv.log-20140204.gz
-rw-r--r-- 1 stats stats 723531018 Feb 5 06:25 sampled-1000.tsv.log-20140205.gz
-rw-r--r-- 1 stats stats 728198594 Feb 6 06:25 sampled-1000.tsv.log-20140206.gz
-rw-r--r-- 1 stats stats 1022924002 Feb 7 06:25 sampled-1000.tsv.log-20140207.gz
-rw-r--r-- 1 stats stats 753216484 Feb 8 06:25 sampled-1000.tsv.log-20140208.gz
-rw-r--r-- 1 stats stats 719726187 Feb 9 06:25 sampled-1000.tsv.log-20140209.gz

(In reply to christian from comment #7)

Also, could you please update the corresponding row in
documentation table on

https://wikitech.wikimedia.org/wiki/Analytics/Requests_stream

?

This has not happened either.

Nuria pinged me that the 20140207 file is available on stat1002.
Thanks for the file!

The new file seems to miss a few lines, but since it is a sampled file
anyway, and the drop is way below the usual network drop, it's fine by
me.

The good file for 20140207 is gone again, and we're back with the ~1GB
file for 20140207 :-(

@Nuria, are you sure you did copy the fixed file to the rsync source as
we said yesterday in IRC:

Feb 20 12:25:28 <qchris>        Afterwards get it to the rsync source.

?

Sorry, I missunderstood rsync source, I thought you were talking about the directory in stat1002. It is fixed now so file should not change going forward.

(In reply to nuria from comment #14)

Sorry, I missunderstood rsync source, [...]

No worries.
The file is there again \o/
Thanks!

The file owner changed, but meh. It is world readable anyways :-)

But could you please clean up the left behind cruft file, as that
would sooner or later get in the way when doing adhoc checks?

@Nuria: The files now look good. Thanks \o/

But again ... could you please also update the corresponding
row in documentation table on

https://wikitech.wikimedia.org/wiki/Analytics/Requests_stream

?

Documentation has been updated by Nuria.

Change 112233 abandoned by QChris:
Mark 2014-02-05, and 2014-02-06 as bad dates

Reason:
Aborting since we want to to turn those scripts off, and it seems we
won't invest in cleaning them up. The latest delpoyed code of this
code in the qchris/zero branch.

https://gerrit.wikimedia.org/r/112233