Page MenuHomePhabricator

Raw webrequest bits and upload partition for 2014-10-28T19/1H not marked successful
Closed, ResolvedPublic

Description

The bits and upload webrequest partitions [1] for 2014-10-28T19/1H
have not been marked successful.

What happened?

[1]


qchris@stat1002 jobs: 0 time: 15:32:08 // exit code: 0
cwd: ~
~/cluster-scripts/dump_webrequest_status.sh

+------------------+--------+--------+--------+--------+
| Date             |  bits  | mobile |  text  | upload |
+------------------+--------+--------+--------+--------+

[...]

| 2014-10-28T17/1H |    .   |    .   |    .   |    .   |
| 2014-10-28T18/1H |    .   |    .   |    .   |    .   |
| 2014-10-28T19/1H |    X   |    .   |    .   |    X   |
| 2014-10-28T20/1H |    .   |    .   |    .   |    .   |
| 2014-10-28T21/1H |    .   |    .   |    .   |    .   |

[...]

+------------------+--------+--------+--------+--------+

Statuses:

. --> Partition is ok
M --> Partition manually marked ok
X --> Partition is not ok (duplicates, missing, or nulls)

Version: unspecified
Severity: normal

Details

Reference
bz72679

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:44 AM
bzimport set Reference to bz72679.
bzimport added a subscriber: Unknown Object (MLST).

For bits, only cp1056 was affected.
The affected period is the second 2014-10-28T19:52:57.
For that second, we saw 178 duplicates, no missing log lines.
So <<1 second worth of data is affected.

For upload, it affected cp1049, cp1051, cp3003, cp3004, cp3006,
cp3010, and cp3015.
The affected period are the three seconds
2014-10-28T19:52:54/2014-10-28T19:52:57.
No duplicates, but ~2K missing lines.
Again, <<1 second worth of data is affected.

Those affected time periods match the partition leader re-election for
bug 72550.

So the duplicates for bits, and the missing log lines for upload are
expected.