Page MenuHomePhabricator

Allowed memory size exhausted when using API
Closed, DeclinedPublic

Description

I'm trying to run the following query from my bot account:

http://en.wikipedia.org/w/api.php?format=xml&action=query&generator=categorymembers&prop=imageinfo&iiprop=&iilimit=2&gcmtitle=Category%3AAll%20non-free%20media&gcmlimit=max&maxlag=5

and I'm getting the following error:

PHP fatal error in /usr/local/apache/common-local/php-1.17/includes/Hooks.php line 47:
Allowed memory size of 125829120 bytes exhausted (tried to allocate 87 bytes)

The same query seems to work when I set gcmlimit=1000.

I'm not sure whether something should be done about this, but if it should, it's probably lowering the limit for categorymembers or raising the memory limit.


Version: unspecified
Severity: minor

Details

Reference
bz30751

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 11:57 PM
bzimport set Reference to bz30751.
bzimport added a subscriber: Unknown Object (MLST).

I think this is more likely to be a problem in imageinfo or in the file/media backend.

The above url now works but I get the same trouble with this query, it occurs even with a gcmlimit as little as 30, so lowering the limit for categorymembers seems pointless.

http://commons.wikimedia.org/w/api.php?gcmtitle=Category%3ADjVu%20files%20in%20French&generator=categorymembers&gcmlimit=30&prop=imageinfo&action=query&iiprop=sha1&gcmcontinue=file|4d494348415544202d2042494f4752415048494520554e4956455253454c4c4520414e4349454e4e45204554204d4f4445524e45202d2031383433202d20544f4d452031342e444a5655|7531758

PHP fatal error in /usr/local/apache/common-local/php-1.17/includes/objectcache/MemcachedClient.php line 979:
Allowed memory size of 125829120 bytes exhausted (tried to allocate 4589559 bytes)

979: $c_val = gzcompress( $val, 9 );

This cat contains djvu files with text layer, at the point of failure both file and text layer are big, 40/50 MB per file and 4/5 MB of text layer per file. I guess comment 1 is right, quite possible a trouble with file meta data caching.

I can get the same trouble with action=query&titles=<Thirty two titles>&prop=imageinfo&iilimit=1&iiprop=sha1, full url here: http://fr.wikisource.org/w/index.php?title=Utilisateur:Phe/Test4&oldid=2779398

Beside that I see no code path in ApiQueryImageInfo.php which ask metadata if not explicitly asked through the url.

Yeah, the metadata blob goes into the cached File object, so it gets copied around and processed whether you ask for it or not.

Normally that's not a problem, as we don't store megabytes of random text in the metadata field. :) Unfortunately for DjVu images, we do... so if you load up a few of those files at once, *bam* they'll add up fast.

Fix is probably to separate out the extracted text storage to a structured data table, so it's not clogging up the tubes the other 99% of the time we don't need it.

Philippe, The URL I gave above still doesn't work for me. Maybe you were trying it from non-bot account that has lower limits?

(In reply to comment #4)

Philippe, The URL I gave above still doesn't work for me. Maybe you were trying
it from non-bot account that has lower limits?

Right, with a non-bot account the gcmlimit is set to 500 and the url works.

(In reply to comment #3)
Can the priority of this bug be increased ? imageinfo is very useful, sha1 can be used to update a local cache of File: with little burden on server side. We can get many useful other information which in theory doesn't involve "metadata in the file" but only "metadata which are stored in the database in the image table", the current behavior seems against the design of other part of api.php.

(In reply to comment #6)

(In reply to comment #3)
Can the priority of this bug be increased ? imageinfo is very useful, sha1 can
be used to update a local cache of File: with little burden on server side. We
can get many useful other information which in theory doesn't involve "metadata
in the file" but only "metadata which are stored in the database in the image
table", the current behavior seems against the design of other part of api.php.

You can increase it's priority, but that doesn't mean it'll get dealt with any quicker. Just FYI

And bug 30906 needs fixing first

Lowering priority on high priority bugs that have a low severity

Is this still an issue. I think we made it so djvu metadata is only loaded when absolutely needed (of the image table entry is in cache)

I can't reproduce the original issue, so I guess I'll close this bug.