Page MenuHomePhabricator

[grabbers] PHP Fatal error: Out of memory in mediawikibot.class.php
Closed, ResolvedPublic

Description

With

php grabText.php --url=http://wikihow.com/api.php

I got

PHP Fatal error:  Out of memory (allocated 1851260928) (tried to allocate 32 bytes)

in mediawikibot.class.php.

This wiki is rather huge with about 3.5M pages (I was hoping to have better success than with dumpgenerator.py), is there any way to limit the amount of data stored in RAM?


Version: unspecified
Severity: major

Details

Reference
bz58531

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:25 AM
bzimport set Reference to bz58531.
bzimport added a subscriber: Unknown Object (MLST).

How far did the script make it? What was the last output it gave before running out of memory?

(In reply to comment #1)

How far did the script make it? What was the last output it gave before
running
out of memory?

260k pages in ns0, then a count for the next ns and presumably it was getting titles for the third.

So it looks like it's running out of memory to store the initial list before it starts grabbing the pages (workflow being that it gets the list of namespaces, then gets the list of pages, and then gets the revisions themselves and inserts everything into the database)? That would certainly understandable at these sizes, so it should probably... what, be writing the page list to a temp file or something?

What's the best practice here? What should it be doing?

I think we should insert into the database in batches so we don't have to store everything in memory, just a few parts.

Change 105153 had a related patch set uploaded by Legoktm:
grabText: Don't store entire list of pages in memory

https://gerrit.wikimedia.org/r/105153

Change 105153 merged by Jack Phoenix:
grabText: Don't store entire list of pages in memory

https://gerrit.wikimedia.org/r/105153

Can somebody confirm this is FIXED by the commit in comment 6?
Nemo?

I don't know, my local checkout no longer works. Tentatively closing.