Raw webrequest partitions for 2014-10-20T13/1H not marked successful
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	QChris
	Oct 21 2014, 9:48 AM

Description

None of the webrequest partitions [1] for 2014-10-20T13/1H have been
been marked successful.

What happened?

[1]

qchris@stat1002 jobs: 0 time: 09:43:10 // exit code: 0
cwd: ~/refinery/hive/webrequest
~/cluster-scripts/dump_webrequest_status.sh

+------------------+--------+--------+--------+--------+
| Date             |  bits  | mobile |  text  | upload |
+------------------+--------+--------+--------+--------+

[...]

| 2014-10-20T11/1H |    .   |    .   |    .   |    .   |    
| 2014-10-20T12/1H |    .   |    .   |    .   |    .   |    
| 2014-10-20T13/1H |    X   |    X   |    X   |    X   |    
| 2014-10-20T14/1H |    .   |    .   |    .   |    .   |    
| 2014-10-20T15/1H |    .   |    .   |    .   |    .   |

[...]

+------------------+--------+--------+--------+--------+

Statuses:

. --> Partition is ok
M --> Partition manually marked ok
X --> Partition is not ok (duplicates, missing, or nulls)

Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=72306

Details

Reference: bz72296

Related Objects
Search...

Status	Assigned	Task
Resolved	Ottomata	T72085 Raw webrequest partitions that were not marked successful
Resolved	Ottomata	T74298 Raw webrequest partitions that were not marked successful due to network issues
Declined	None	T74296 Raw webrequest partitions for 2014-10-20T13/1H not marked successful

Event Timeline

• bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:52 AM

• bzimport added a project: Analytics-Clusters.

• bzimport set Reference to bz72296.

• bzimport added a subscriber: Unknown Object (MLST).

QChris created this task.Oct 21 2014, 9:48 AM

The affected period is 13:07:11--2014-10-20T13:25:38.
It affected only ulsfo caches, but all ulsfo caches.

The affected period shows round 2M duplicates, which are worth

79 seconds of ulsfo data, or
15 seconds of total data.

The affected period shows round 27M missing lines, which are worth

16 minutes of ulsfo data, or
3 minutes of total data.

Ops reported [1] that at 13:07 network issues between ulsfo and eqiad
started. This aligns and explains the issues that we're seeing.

[1] https://lists.wikimedia.org/mailman/private/ops/2014-October/042274.html

Raw webrequest partitions for 2014-10-20T13/1H not marked successfulClosed, DeclinedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Raw webrequest partitions for 2014-10-20T13/1H not marked successful
Closed, DeclinedPublic
Actions

Related Objects
Search...