Page MenuHomePhabricator

API imageinfo returns too few results when using lots of titles
Closed, InvalidPublic

Description

The following "QUERY I" returns 50 imageinfos despite it should be a max. number of 500 (and adds a query-continue-param to the result because ImageInfo of some files is not included in the result), while a similar query ("QUERY II") returns 67 imageinfos.

QUERY I:

var totalII=0;

$.get('//commons.wikimedia.org/w/api.php?action=query&cllimit=500&clprop=hidden&format=json&iilimit=500&iiprop=url%7Csize%7Cmetadata%7Ctimestamp%7Cuser%7Csha1%7Ccomment%7Cmime&iiurlheight=120&iiurlwidth=120&prop=imageinfo%7Cinfo%7Ccategories&titles=File%3ABot-Test.jpg%7CFile%3ALead%20Photo%20For%20DonateImage0-9199360352940857.jpg%7CFile%3ALead%20Photo%20For%20DonateImage0-8110956577584147.png%7CFile%3A2013-02-10-MyPOTY-link-of-EnhancedPOTY.png%7CFile%3A2012-POTY-Galleries-stats-has-to-be-uploaded.png%7CFile%3ATestfile%20upload%20by%20url.jpg%7CFile%3ABouchon%20de%20carafe%20-%202-2013-06-01.jpg%7CFile%3AMei%20Foo%20Sun%20Chuen%20Map.svg%7CFile%3AOm%20beach%20Gokarna.JPG%7CFile%3AGlobalUsageUI--2012-12-12--gallery-box--Abbaye%20de%20Cluny%202012%2026.png%7CFile%3AWikivoyage-logo.svg%7CFile%3ACommons%20Nominate%20for%20Deletion%20gadget%20de.png%7CFile%3A4-Acetylamino-3-Methyl-1-Phenylpyrazolone.svg%7CFile%3AWegweiser%20Wikivoyage%20Seitenaufbau%202012-12-03.png%7CFile%3ALidocaine%20substance%20photo%202.jpg%7CFile%3ALidocaine%20substance%20photo.jpg%7CFile%3ASulfacetamide%20substance%20photo.jpg%7CFile%3AChemDrugs%20Rillke.jpg%7CFile%3ATriamterene%20substance%20photo.jpg%7CFile%3AThymol%20substance%20photo%202.jpg%7CFile%3AThymol%20substance%20photo.jpg%7CFile%3AMethimazole%20substance%20photo.jpg%7CFile%3ASulfathiazole%20substance%20photo.jpg%7CFile%3ATetracycline-HCl%20substance%20photo.jpg%7CFile%3AStrychninnitrat%20substance%20photo%202.jpg%7CFile%3ASaccharin-Na%20substance%20photo.jpg%7CFile%3ASalicylamide%20substance%20photo.jpg%7CFile%3AResorcin%20substance%20photo.jpg%7CFile%3AProcaine-HCl%20substance%20photo.jpg%7CFile%3APiroxicam%20substance%20photo.jpg%7CFile%3APilocarpine-HCl%20substance%20photo.jpg%7CFile%3APhysostigmine%20salicylate%20substance%20photo.jpg%7CFile%3APhenol%20substance%20photo.jpg', function(r) {

$.each(r.query.pages, function(id, pg) {
  if (pg.imageinfo) {
    console.log(pg.imageinfo.length);
    totalII += pg.imageinfo.length;
  }

})
console.log('----\n', totalII);

});

Params (for easier readability):
action query
cllimit 500
clprop hidden
format json
iilimit 500
iiprop url|size|metadata|timestamp|user|sha1|comment|mime
iiurlheight 120
iiurlwidth 120
intoken
prop imageinfo|info|categories
titles File:Bot-Test.jpg|File:Lead Photo For DonateImage0-9199360352940857.jpg|File:Lead Photo For DonateImage0-8110956577584147.png|File:2013-02-10-MyPOTY-link-of-EnhancedPOTY.png|File:2012-POTY-Galleries-stats-has-to-be-uploaded.png|File:Testfile upload by url.jpg|File:Bouchon de carafe - 2-2013-06-01.jpg|File:Mei Foo Sun Chuen Map.svg|File:Om beach Gokarna.JPG|File:GlobalUsageUI--2012-12-12--gallery-box--Abbaye de Cluny 2012 26.png|File:Wikivoyage-logo.svg|File:Commons Nominate for Deletion gadget de.png|File:4-Acetylamino-3-Methyl-1-Phenylpyrazolone.svg|File:Wegweiser Wikivoyage Seitenaufbau 2012-12-03.png|File:Lidocaine substance photo 2.jpg|File:Lidocaine substance photo.jpg|File:Sulfacetamide substance photo.jpg|File:ChemDrugs Rillke.jpg|File:Triamterene substance photo.jpg|File:Thymol substance photo 2.jpg|File:Thymol substance photo.jpg|File:Methimazole substance photo.jpg|File:Sulfathiazole substance photo.jpg|File:Tetracycline-HCl substance photo.jpg|File:Strychninnitrat substance photo 2.jpg|File:Saccharin-Na substance photo.jpg|File:Salicylamide substance photo.jpg|File:Resorcin substance photo.jpg|File:Procaine-HCl substance photo.jpg|File:Piroxicam substance photo.jpg|File:Pilocarpine-HCl substance photo.jpg|File:Physostigmine salicylate substance photo.jpg|File:Phenol substance photo.jpg

RESULT: 50 imageinfos



QUERY II:

var totalII=0;

$.get('//commons.wikimedia.org/w/api.php?action=query&cllimit=500&clprop=hidden&format=json&iilimit=500&iiprop=url|size|metadata|timestamp|user|sha1|comment|mime&iiurlheight=120&iiurlwidth=120&intoken=&prop=imageinfo|info|categories&titles=File:Test.svg|File:Standard%20time%20zones%20of%20the%20world.png|File:Map%20of%20US%20gas%20chamber%20usage.svg|File:Map%20of%20US%20firing%20squad%20usage.svg', function(r) {

$.each(r.query.pages, function(id, pg) {
  if (pg.imageinfo) {
    console.log(pg.imageinfo.length);
    totalII += pg.imageinfo.length;
  }

})
console.log('----\n', totalII);

});

Params (for easier readability):
action query
cllimit 500
clprop hidden
format json
iilimit 500
iiprop url|size|metadata|timestamp|user|sha1|comment|mime
iiurlheight 120
iiurlwidth 120
intoken
prop imageinfo|info|categories
titles File:Test.svg|File:Standard time zones of the world.png|File:Map of US gas chamber usage.svg|File:Map of US firing squad usage.svg

RESULT: 67 imageinfos


Version: 1.21.x
Severity: normal

Details

Reference
bz46551

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 1:18 AM
bzimport set Reference to bz46551.
bzimport added a subscriber: Unknown Object (MLST).

The API is specifically allowed to return fewer than the requested number of results, e.g. if the size of the entire result set would be too large or (as in this case) if calculating the result set would take excessive processing.

(In reply to comment #1)
Is this new? I and the users of my tools didn't notice this issue in previous MW-Versions.

Even I as SYSOP do only get 50!

And it breaks all my tools listing user uploads (ok, they will use query-continue [properly] in future) but this will not lower "excessive processing"; it will even add more traffic. The only difference will be a slight bigger delay in my tools. Note that there is no API query for user uploads! I have to use difficult logic at the client side to make such a feature available.
See Bug 26872
and https://commons.wikimedia.org/w/index.php?title=Commons:MyGallery&withJS=MediaWiki:JSONListUploads.js
and https://commons.wikimedia.org/wiki/MediaWiki:JSONListUploads.js
VisualFileChange will possibly also suffer from this bug.

Or do you just need more API-requests to alter statistics of API usage in a way to "prove" something?

(In reply to comment #2)

Is this new? I and the users of my tools didn't notice this issue in previous
MW-Versions.

The fact that the prop=imageinfo can return fewer values than requested? No, that has been around since r46845 in early 2009.

That prop=imageinfo chooses to do so to limit the number of thumbnails generated in one query? Yes, that was added in Gerrit change 47189 merged last month.

(ok, they will use query-continue [properly] in future)

There's the proper fix for your problem.

but this will not lower "excessive processing"; it will even add more traffic.

There are concerns other than total CPU time used across the several queries. Generating too many thumbnails in a single query caused other problems.

Or do you just need more API-requests to alter statistics of API usage in a
way to "prove" something?

Assume good faith, please.

(In reply to comment #3)

Generating too many thumbnails in a single query caused other problems.

Is a general statement like "Apparently calls to File::transform can be slow" (Gerrit change #47189) and "File::transform()'s worst-case performance" (http://lists.wikimedia.org/pipermail/wikitech-l/2013-February/066102.html).

I'd like to see the tests.

And if it is that slow because it has to compute MD5 each time, wouldn't it be desireable to store these 2 letters for the ugly hashed directory structure also in the "Image" table?