Page MenuHomePhabricator

High priority jobs like enotifs are executed very slowly
Closed, ResolvedPublic

Description

Email notifications used to be instant, now they're taking about 20 minutes on en.wiki (with only 60 thousands jobs in the queue).
Not only this shows some problem with the job queue system and it's a non-small regression, but it's also very confusing because I'm sent notifications when they're already obsoleted (for instance because I already replied).


Dear Nemo bis,

The Wikipedia page User talk:Nemo bis has been changed on 13 January 2013
by anonymous user 76.126.142.118, see
http://en.wikipedia.org/wiki/User_talk:Nemo_bis for the current
revision.

See
http://en.wikipedia.org/w/index.php?title=User_talk:Nemo_bis&diff=next&oldid=532890436
to view this change.


Received: from imp-3.mail.tiscali.it (10.39.115.235) by mx-3-it.mail.tiscali.it (8.5.148)

id 50BF36D0094B0EEF for <redacted>@tiscali.it; Sun, 13 Jan 2013 21:01:44 +0100

Received: from wiki-mail.wikimedia.org ([208.80.152.133])
by imp-3.mail.tiscali.it with
id nY1j1k02z2swdko01Y1kqf; Sun, 13 Jan 2013 21:01:44 +0100
x-cnfs-analysis: v=2.0 cv=RYES+iRv c=1 sm=2 a=P51sRyCuLXUxWMHwWK9oAA==:17
a=eIhxMilvRf8A:10 a=z82XInz0jxkA:10 a=RyZ8rIAjjLkA:10 a=eztASiHJGFwA:10
a=IkcTkHD0fZMA:10 a=3GbmggnxAAAA:8 a=8pif782wAAAA:8 a=d2uY_mg3cpUA:10
a=nk0ike9KCJb9eP9e8BIA:9 a=QEXdDO2ut3YA:10 a=c7XZu54lUV4A:10
a=9vCFg7g2Nj6V2bzh:21 a=HUl_rzNbRn9v3Gf1:21 a=P51sRyCuLXUxWMHwWK9oAA==:117
Received: from mw8.pmtpa.wmnet ([10.0.11.8]:57845)
by mchenry.wikimedia.org with esmtp (Exim 4.69)
(envelope-from <wiki@wikimedia.org>)
id 1TuTkG-0003E4-Fs
for <redacted>@tiscali.it; Sun, 13 Jan 2013 20:01:28 +0000
Received: from apache by mw8.pmtpa.wmnet with local (Exim 4.76)
id 1TuTkG-0008Ux-Bg
for <redacted>@tiscali.it; Sun, 13 Jan 2013 20:01:28 +0000
To: Nemo bis
Subject: Wikipedia page User talk:Nemo bis has been changed by anonymous user 76.126.142.118
From: MediaWiki Mail <wiki@wikimedia.org>
Reply-To: reply@not.possible
Date: Sun, 13 Jan 2013 20:01:28 +0000
MIME-Version: 1.0
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: 8bit
Message-ID: <enwiki.50f3129856d1c5.83285442@en.wikipedia.org>
X-Mailer: MediaWiki mailer


Version: wmf-deployment
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=55822

Details

Reference
bz43936

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:12 AM
bzimport set Reference to bz43936.
bzimport added a subscriber: Unknown Object (MLST).

One day when https://ganglia.wikimedia.org will be accessible again I could even look at the JobQueue graph...

Nemo, is the lag of ~20min still a problem?

/me looking at https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+-owner:L10n-bot+message:jobqueue,n,z

Job queue is now under 2000 or so on en.wiki, so it looks like the wrong timing to try to reproduce this bug. https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=statistics
Anyway next time you can ask on my user talk and I'll compare timestamps of edit and enotif. :-)

Should probably raise severity because it takes now hours to receive an enotif from mediawiki.org (job queue 0 now, ~20 at 14 CET): 15:22–17:05 in the example.

Received: from wiki-mail.wikimedia.org ([208.80.152.133])
by imp-2.mail.tiscali.it with
id B55A1l00w2swdko0155BbL; Wed, 13 Mar 2013 18:05:11 +0100
x-cnfs-analysis: v=2.0 cv=KYdQQHkD c=1 sm=2 a=P51sRyCuLXUxWMHwWK9oAA==:17
a=gbdniXhMvlMA:10 a=RyZ8rIAjjLkA:10 a=cNjpVsleRgUA:10 a=eztASiHJGFwA:10
a=IkcTkHD0fZMA:10 a=3GbmggnxAAAA:8 a=4P5xif6CAAAA:8 a=KcaC6ams3nQA:10
a=mdTHgZqYbhYL0A32_hcA:9 a=QEXdDO2ut3YA:10 a=4wRdB16iIHwA:10
a=P51sRyCuLXUxWMHwWK9oAA==:117
Received: from mw1003.eqiad.wmnet ([10.64.0.33]:38380)
by mchenry.wikimedia.org with esmtp (Exim 4.69)
(envelope-from <wiki@wikimedia.org>)
id 1UFnVc-00068m-87
for <redacted>; Wed, 13 Mar 2013 15:22:28 +0000
Received: from apache by mw1003.eqiad.wmnet with local (Exim 4.76)
id 1UFnVc-00075V-19
for <redacted>; Wed, 13 Mar 2013 15:22:28 +0000
To: Nemo bis <redacted>
Subject: MediaWiki page Help:Extension:Translate/Configuration has been changed by Nikerabbit
From: MediaWiki Mail <wiki@wikimedia.org>
Reply-To: reply@not.possible
Date: Wed, 13 Mar 2013 15:22:28 +0000

If bug 46603 is right, Site requests is the correct component.
If it's just a jobqueue problem and mail relay doesn't factor into it, perhaps we just have too much stuff in "high priority"?

Currently it's basically instant, no time (1 s? unless Date is wrong) spent on apaches and about 20 s between mchenry.wikimedia.org and wiki-mail.wikimedia.org.
Global jobqueue very low around 100k, will check again when it gets higher.
https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large

Reopening: we have reports that password reminders on en.wiki take 60 minutes to arrive.
I can't think of any reason other than this bug; global job queue is reportedly around 2 millions. https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=month&z=default&jr=&js=&st=1365625056&z=large

From graphite, none of the job queue push/pop graphs look remarkable over the last 2 months. The are lots of Parsoid jobs though (about 2 million on enwiki).

(In reply to comment #7)

Reopening: we have reports that password reminders on en.wiki take 60 minutes
to arrive.

Link(s)?

I can't think of any reason other than this bug; global job queue is
reportedly around 2 millions.

There are apparently different queues.

(In reply to comment #9)

(In reply to comment #7)

Reopening: we have reports that password reminders on en.wiki take 60 minutes
to arrive.

Link(s)?

Nope. Reported on #wikimedia-tech, relayed from #wikipedia-en-help I think.

I can't think of any reason other than this bug; global job queue is
reportedly around 2 millions.

There are apparently different queues.

Yes (and it would be good to raise the concurrency for high priority jobs, they're still at 6 and used to be 8 till April IIRC) but this doesn't mean they don't affect each other; it happened in the past e.g. with bug 42614.

Nemo / MZ: Are you aware of any recent issues (as I'm not)?
This might end up as WORKSFORME now...

Is anybody aware of any recent issues (as I'm not) or is this WORKSFORME now?

Last call: Is anybody aware of any recent issues (as I'm not) or is this WORKSFORME now?

This bug can only be tested when the job queue is very high.

Aklapper lowered the priority of this task from Medium to Low.Apr 17 2015, 1:37 PM

This bug can only be tested when the job queue is very high.

I'm decreasing priority here.

@Nemo_bis Do you still have the feeling this is the case nowadays?

Dereckson claimed this task.