Page MenuHomePhabricator

Timeout when sending translation notification (again)
Closed, ResolvedPublic

Description

When sending out a translation notification from Meta, I received a "Wikimedia Foundation error" with the following error message:

Request: POST http://meta.wikimedia.org/wiki/Special:NotifyTranslators, from 208.80.154.77 via cp1012.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.142 (10.64.0.142)
Error: ERR_READ_TIMEOUT, errno [No Error] at Sun, 06 Oct 2013 09:51:22 GMT

The notification itself seems to have made it through alright - I have seen talk page and email messages that went out, and the log entry reads as follows:

09:51, 6 October 2013 Tbayer (WMF) (talk | contribs | block) sent a notification about translating page Wikimedia Highlights, August 2013; languages: all languages; deadline: none; priority: low; sent to 1639 recipients, failed for 0 recipients, skipped for 0 recipients

This is a followup to https://bugzilla.wikimedia.org/show_bug.cgi?id=41131 ("Timeout when sending translation notification"), which was closed in February 2013. Filing it as a new bug per Nikerabbit's advice at https://bugzilla.wikimedia.org/show_bug.cgi?id=53769#c12 .


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=53769
https://bugzilla.wikimedia.org/show_bug.cgi?id=41131

Details

Reference
bz55397

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:11 AM
bzimport set Reference to bz55397.
bzimport added a subscriber: Unknown Object (MLST).

This just happened again in the same way (error 503, but log entry, talk page and email messages look OK):

Request: POST http://meta.wikimedia.org/wiki/Special:NotifyTranslators, from 208.80.154.134 via cp1052 frontend ([10.2.2.25]:80), Varnish XID 850123795
Forwarded for: 192.195.83.38, 208.80.154.134
Error: 503, Service Unavailable at Fri, 15 Nov 2013 06:09:31 GMT

06:09, 15 November 2013 Tbayer (WMF) (talk | contribs | block) sent a notification about translating page Wikimedia Highlights, October 2013; languages: all languages; deadline: none; priority: medium; sent to 1632 recipients, failed for 0 recipients, skipped for 24 recipients

Translation Notifications currently submits a job for each recipient during the web request itself. 1632 is a lot of jobs, especially if they're going to multiple wikis.

I think if it wrapped them in a submit job like MassMessage does (see https://github.com/wikimedia/mediawiki-extensions-MassMessage/blob/master/MassMessageSubmitJob.php), it should be faster and hopefully get rid of the timeouts.

This (or something very similar with the same cause) just happened to me too, btw. I only submitted once and I didn't even get a Wikimedia error, just a 504 Gateway Time-out page, but in my case the notification was sent three times...

https://meta.wikimedia.org/w/index.php?title=Special:Log&dir=prev&offset=20131115060948&limit=3&type=notifytranslators&user=

10:06, 24 November 2013 Nemo bis (talk | contribs) sent a notification about translating page User:MediaWiki message delivery; languages: all languages; deadline: 2013-12-31; priority: medium; sent to 1099 recipients, failed for 0 recipients, skipped for 563 recipients
10:05, 24 November 2013 Nemo bis (talk | contribs) sent a notification about translating page User:MediaWiki message delivery; languages: all languages; deadline: 2013-12-31; priority: medium; sent to 1442 recipients, failed for 0 recipients, skipped for 220 recipients
10:05, 24 November 2013 Nemo bis (talk | contribs) sent a notification about translating page User:MediaWiki message delivery; languages: all languages; deadline: 2013-12-31; priority: medium; sent to 1446 recipients, failed for 0 recipients, skipped for 216 recipients

Change 97370 had a related patch set uploaded by Legoktm:
Use batch submission of jobs

https://gerrit.wikimedia.org/r/97370

It would be helpful if we had some profiling data on the special page. My guess is that it's the pushing of jobs into the queue, but it could be something else.

There's a db write for every user who gets sent a notification, which could also be expensive.

Change 97370 merged by jenkins-bot:
Use batch submission of jobs

https://gerrit.wikimedia.org/r/97370

I9f06dcef91a35dd8b7fe75271b26682d94db3d20 will also probably help.

Should be resolved now that gerrit 97370 has been merged.

  • Bug 57896 has been marked as a duplicate of this bug. ***