Timeout when sending translation notification (yet again)
Closed, DeclinedPublicPRODUCTION ERROR
Actions

Assigned To

None

Authored By

	• Tbayer
	Feb 10 2014, 5:30 AM

Description

While sending out a translation notification on Meta, after hitting the "send notification to translators" button and waiting for a bit, I received this error page:

504 Gateway Time-out
nginx/1.1.19

The notification itself seems to have gone out fine, here's the log entry:

05:16, 10 February 2014 Tbayer (WMF) (talk | contribs) sent a notification about translating page Data retention guidelines; languages: all languages; deadline: none; priority: high; sent to 1469 recipients, failed for 0 recipients, skipped for 6 recipients

This bug continues the hallowed tradition of T43131: Timeout when sending translation notification and T57397: Timeout when sending translation notification (again). Both of which have been fixed and had different kinds of error messages (not 504), which is why I'm filing a new bug rather than reopening one of them.

Details

Reference: bz61122

Related Objects

Mentioned In: T190260: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file
T186445: TranslationNotifications: code stewardship review
T160276: Use of Special:NotifyTranslators is not being logged
T141988: After operation aborted, on-wiki log is missing, but it was reported to irc.wikimedia
Mentioned Here: T43131: Timeout when sending translation notification
T57397: Timeout when sending translation notification (again)

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:55 AM

• bzimport added a project: TranslationNotifications.

• bzimport set Reference to bz61122.

• bzimport added a subscriber: Unknown Object (MLST).

• Tbayer created this task.Feb 10 2014, 5:30 AM

No timeout for me though. THO, did you get one?
https://meta.wikimedia.org/w/index.php?title=Special:Log&offset=20140822065925&limit=1&type=notifytranslators

Glaisher merged a task: T135540: Special:NotifyTranslators error - 503 Service Temporarily Unavailable; no email sent, no log entry..May 22 2016, 10:00 AM

Glaisher added subscribers: Aklapper, Zppix, • asherman.

Glaisher merged a task: T127751: Special:NotifyTranslators error (transaction aborted due to write duration), email sent, with no log entry.May 22 2016, 10:05 AM

Glaisher added subscribers: Krenair, JeanFred, I_JethroBT, StudiesWorld.

Merged all tasks that are related to the same root issue - it is not scalable to do all the updates in a single user web request.
These are time-outing because the extension attempts to insert each email and talk page edit job during the submission web request. It also tries to do the updates in the table for each user separately. On wikis with many translators, this means it could be hundreds (possibly thousands) of ~~queries~~ rows. MediaWiki also now automatically aborts transactions that take too long (and the duration will probably be decreased in the future).

The best thing to do here would be to defer to the job queue and do all the updates in one job which submits all other jobs. We also probably shouldn't be (mis)using the user_properties table and instead should have the extension's own table so that we can easily do updates in batches.

Glaisher raised the priority of this task from Medium to High.May 22 2016, 10:13 AM

Glaisher mentioned this in T141988: After operation aborted, on-wiki log is missing, but it was reported to irc.wikimedia.Aug 3 2016, 4:44 PM

MarcoAurelio updated the task description. (Show Details)Jul 20 2017, 9:14 AM

MarcoAurelio removed subscribers: • asherman, • wikibugs-l-list.

[WY2yRApAEKcAABSE9XgAAAAC] 2017-08-11 13:34:37: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError"

That said, it seems the job is sent to the queue and performed. It'll send double/triple notifications if you reload the page. If you just ignore the exception the translation notifications will get delivered. Question is, all of them?

And obviously the action was not logged...

MarcoAurelio mentioned this in T160276: Use of Special:NotifyTranslators is not being logged.Aug 11 2017, 1:50 PM

In T63122#3518366, @MarcoAurelio wrote:

[WY2yRApAEKcAABSE9XgAAAAC] 2017-08-11 13:34:37: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError"

[WY2yRApAEKcAABSE9XgAAAAC] /wiki/Special:NotifyTranslators   Wikimedia\Rdbms\DBTransactionSizeError from line 1177 of /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Transaction spent 5.2927534580231 second(s) in writes, exceeding the 3 limit.
referrer:	 https://meta.wikimedia.org/wiki/Special:NotifyTranslators

#0 [internal function]: Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges(Wikimedia\Rdbms\DatabaseMysqli)
#1 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1502): call_user_func_array(Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges;300, array)
#2 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/loadbalancer/LoadBalancer.php(1187): Wikimedia\Rdbms\LoadBalancer->forEachOpenMasterConnection(Closure$Wikimedia\Rdbms\LoadBalancer::approveMasterChanges;300)
#3 [internal function]: Wikimedia\Rdbms\LoadBalancer->approveMasterChanges(array)
#4 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactory.php(183): call_user_func_array(array, array)
#5 [internal function]: Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod(Wikimedia\Rdbms\LoadBalancer, string, array)
#6 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactoryMulti.php(417): call_user_func_array(Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod;230, array)
#7 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactory.php(186): Wikimedia\Rdbms\LBFactoryMulti->forEachLB(Closure$Wikimedia\Rdbms\LBFactory::forEachLBCallMethod;230, array)
#8 /srv/mediawiki/php-1.30.0-wmf.13/includes/libs/rdbms/lbfactory/LBFactory.php(223): Wikimedia\Rdbms\LBFactory->forEachLBCallMethod(string, array)
#9 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(598): Wikimedia\Rdbms\LBFactory->commitMasterChanges(string, array)
#10 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(571): MediaWiki::preOutputCommit(RequestContext, Closure$MediaWiki::main;324)
#11 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(884): MediaWiki->doPreOutputCommit(Closure$MediaWiki::main;324)
#12 /srv/mediawiki/php-1.30.0-wmf.13/includes/MediaWiki.php(523): MediaWiki->main()
#13 /srv/mediawiki/php-1.30.0-wmf.13/index.php(43): MediaWiki->run()
#14 /srv/mediawiki/w/index.php(3): include(string)
#15 {main}

I wonder how MassMessage behaves when sending the job. There are lists with hundreds of subscribers yet never timeouts. Maybe the same behaviour should be used here for the feature to get usable again? Ping @Legoktm for analysis.

MarcoAurelio added a project: Global-Collaboration.Oct 16 2017, 1:37 PM

MarcoAurelio mentioned this in T186445: TranslationNotifications: code stewardship review.Feb 4 2018, 10:12 AM

Yann mentioned this in T190260: Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError" trying to undelete a file.Mar 21 2018, 10:31 AM

Arrbee edited projects, added Language-Team; removed Global-Collaboration.Mar 26 2018, 1:07 PM

Arrbee removed a project: Language-Team.Apr 25 2018, 3:18 PM

Liuxinyu970226 subscribed.Jul 12 2018, 8:32 AM

Krinkle moved this task from Dec2019/1.35.wmf.10+ to Older on the Wikimedia-production-error board.Sep 18 2018, 8:22 PM

• mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:12 PM

Closing as too old for a prod error to be usefully investigable, and also does not seem to be generic enough for it to be an obvious general root cause that needs to be structurelly changed somehow.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptMay 1 2020, 5:14 PM

Timeout when sending translation notification (yet again)Closed, DeclinedPublicPRODUCTION ERRORActions

Description

Details

Related Objects

Event Timeline

Timeout when sending translation notification (yet again)
Closed, DeclinedPublicPRODUCTION ERROR
Actions