Page MenuHomePhabricator

"Data repair and upgrade" not completing using non-interactive job queue runner
Closed, DeclinedPublic

Description

At translatewiki.net we use SemanticMediaWiki. For a while, we've tried to rebuild all the semantic data using the feature "Data repair and upgrade" on Special:SMWAdmin. It always stopped updating at 50.01%.

Here's our cron job:

  • * * * * betawiki nice php /www/translatewiki.net/w/maintenance/runJobs.php --exclusive --maxtime=50 --procs=1 --memory-limit=250M >> /www/translatewiki.net/w/logs/jobqueue 2> /dev/null

I think it stops working around 100% completion of SMW\RefreshJob.
2013-12-26 13:49:59 SMW\RefreshJob Special:SMWAdmin spos=3639900 prog=0.99999917580221 rc=2 run=1 t=5 good

I've now disabled the queue running via cron, and am running it in interactive mode. It now get beyond the point where it used to stop.


Version: master
Severity: normal

Details

Reference
bz58969

Related Objects

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:26 AM
bzimport set Reference to bz58969.
bzimport added a subscriber: Unknown Object (MLST).

Any idea what the relevant difference between interactive and cron could be? Also, do you know if this is a new issue, or has it been present for a longer time?

(In reply to comment #1)

Any idea what the relevant difference between interactive and cron could be?
Also, do you know if this is a new issue, or has it been present for a longer
time?

I couldn't say... We don't rebuild the semantic data that often. I don't recall any issues with it previously, but I couldn't say if that was 6 months or 12 months ago since it succeeded last.

It could be an unnoticed memory limit issue. I had this while running the job queue as interactive. This one was logged to our error log file (also relayed to MediaWiki-Internationalization on Freenode). I'd assume that the cron job queue process running out of memory would have resulted in similar logging.

2013-12-26 15:54:09 SMW\RefreshJob Special:SMWAdmin spos=1243721 prog=0.34167539819333 rc=2 run=2 t=34 good
[a0b5f4f8] [no req] Exception from line 256 of /www/translatewiki.net/w/maintenance/runJobs.php: Detected excessive memory usage (149444096/157286400).
Backtrace:
#0 /www/translatewiki.net/w/maintenance/runJobs.php(156): RunJobs->assertMemoryOK()
#1 /www/translatewiki.net/w/maintenance/doMaintenance.php(119): RunJobs->execute()
#2 /www/translatewiki.net/w/maintenance/runJobs.php(271): require_once(string)
#3 {main}

We have this setting:

LocalSettings.php:ini_set( 'memory_limit', '175M' );

If I recall correctly, the commandline has it's own (hardcoded) memory limit somewhere.

Memory limit issue seems plausible. Guess the actionable item would be cleaning up our job and maintenance script code, adding tests, and making the whole thing more robust and error tolerant.

(In reply to comment #3)

Memory limit issue seems plausible. Guess the actionable item would be
cleaning up our job and maintenance script code, adding tests, and
making the whole thing more robust and error tolerant.

That seems to be the easy way out. Is that what you expect every SMW wiki admin to do? Do you think that's a reasonable expectation? If you think so, please close this issue as INVALID.

Working on getting "Data repair and upgrade to complete for almost 24 hours now.

I increased max memory from the 150M hard coded in runJobs.php to 250M. It failed fairly quickly.

2013-12-27 08:41:12 SMW\UpdateJob MediaWiki:Betafeatures-enable-all-desc/mr t=46 good
[341087c9] [no req] Exception from line 256 of /www/translatewiki.net/w/maintenance/runJobs.php: Detected excessive memory usage (249095784/262144000).
Backtrace:
#0 /www/translatewiki.net/w/maintenance/runJobs.php(156): RunJobs->assertMemoryOK()
#1 /www/translatewiki.net/w/maintenance/doMaintenance.php(119): RunJobs->execute()
#2 /www/translatewiki.net/w/maintenance/runJobs.php(271): require_once(string)
#3 {main}

It's running with 350M now.

It looks like some of the SMW jobs may be using a huge amount of memory at times.

translatewiki.net has 3.538.755 pages. My current theory is that creating one job for each of these pages, as is some somewhere in this process, requires over 250M of memory, and causes the process to fail.

(In reply to comment #4)

That seems to be the easy way out. Is that what you expect every SMW wiki
admin
to do? Do you think that's a reasonable expectation? If you think so, please
close this issue as INVALID.

You misunderstand. These are action items for the devs. And this is not easy, it is quite some work.

I've had jobs exceed 550MB of memory now. I have no idea why job it is, or how to reproduce it. Once the job queue is empty (1421969 to go), I'll restart the process again, with a single queue runner, so I'll have more detail.

If you want any additional debug information in, please submit a patch, let me know which it is, and I'll run the update with that code. Expecting to be able to restart the run tomorrow morning (CET) if it doesn't fail during the night.

(In reply to comment #8)

I've had jobs exceed 550MB of memory now.

That's probably due to bug 60844 (part of the series on the catastrophic 1.22 changes to job queue https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22).

Siebrand managed to bring the script to completion with brute force and hacks which made it skip uninteresting namespaces, but nobody has been able to work on SMW issues in the last month. We currently thing they're unrelated to this bug, I wrote to the mailing list http://sourceforge.net/mailarchive/forum.php?forum_name=semediawiki-user&max_rows=25&style=ultimate&viewmonth=201402 (ten minutes and it's not in archives yes, see http://p.defau.lt/?0iJEtsTkjCpwDIWF1UivQQ).

(In reply to comment #10)

https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/184

Thanks for following up. That issue is now solved.

As for this bug, yesterday I've started the refresh from SMWAdmin and Nikerabbit run jobs manually (through hhvm), it's now reaching completion (99.5 %) even though the memory raise hack on runJobs.php has been removed. (A null-editing bot on the affected pages is much faster, though.) This bug is solved for us, we can't help with debugging any longer; close it if you don't see actionable items.

Aklapper subscribed.

The Semantic MediaWiki developers requested in https://phabricator.wikimedia.org/T64114 to move their task tracking to https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues and to close remaining tasks in Wikimedia Phabricator. If you still face the problem reported in this task in a supported version of SMW, please feel free to transfer your report to https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues . We are sorry for the inconvenience.