Page MenuHomePhabricator

importDump.php crashes with out of memory error
Closed, DeclinedPublic

Description

Trying to import a full-revision dump of Uncyclopedia into a clean 1.20 install, it ran out of memory and crashed a short way in (17303 revisions):

php maintenance/importDump.php --memory-limit=500M pages_full.xml.gz

PHP Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 131072 bytes) in /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php on line 517

Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 131072 bytes) in /var/www/mediawiki/core/includes/objectcache/SqlBagOStuff.php on line 517

It was also running at something like 2 revisions/second, though I dunno if that had anything to do with anything.


Version: 1.20.x
Severity: normal

Details

Reference
bz42095

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:04 AM
bzimport set Reference to bz42095.
bzimport added a subscriber: Unknown Object (MLST).

full-revision dump of Uncyclopedia

How big is that?

~6GB compressed.

Also crashed for ?pedia, which is only ~100MB, though.

Same bug seen in MediaWiki 1.23-HEAD, importing from a recursive dump of the mediawiki.org/Template: namespace. The resulting XML file is only 6.8MB, but the memory used to import seems to go up superlinearly, at over 90KB/revision. There are memory leaks like a floating cardboard box.

physik wrote:

I just tried it with the most recent version an it works for me.
The Maintemamce.php scrip just passes whatever you specify as $limit to php via
ini_set( 'memory_limit', $limit );

physikerwelt: Can you let us know roughly what size your target wiki and output file were? Also, your PHP version would be helpful... And, if you are passing a new memory_limit, what do you specify?

The bug isn't that it's impossible to run the dump script, it's about a memory leak which causes rapid memory exhaustion on even small data sets.

physik wrote:

I used the most recent vagrant version. I assigned 8G main memory and 8 cores to the VM. The dataset was 500mb a sample from the most recent version of enwiki (all pages that contain math). I set the main memory limit to 8G which would have been basically the same as max. And that migh be important I used the --no-updates flag. Can you post your dataset?

I think I recall it working with the --no-updates flag since then as well. So if this is still broken, the bug may just be in how it handles updates.

If this is the case, maybe just having it always run without updates would be in order - then have the option to run the appropriate scripts when it's done or something.

physik wrote:

(In reply to Isarra from comment #7)

I think I recall it working with the --no-updates flag since then as well.
So if this is still broken, the bug may just be in how it handles updates.

If this is the case, maybe just having it always run without updates would
be in order - then have the option to run the appropriate scripts when it's
done or something.

Can you give me a pointer to the dataset. I'd like to test, how much memory you need. Maybe 500M is just not enough for a complex nested structure. I tend to write a note on that on the manpage rather than changing the code. But it's just a first guess.

Well, there's this: http://dump.zaori.org/20121114_uncy_en_pages_full.xml.gz

That's the file I was trying to import when I originally filed this bug, I believe, though it's probably not the best thing to test on due to its being enormous.