Page MenuHomePhabricator

need an easy way to see if a filename is in use via the API (including InstantCommons)
Closed, DeclinedPublic

Description

Right now, it seems that the only way to check if a file name is already in use on Wikipedia is to do api.php?action=query&titles=<imageTitle>&prop=imageinfo

Then you have to do the following awkward test on the results:
if ( !data.query.pages[-1] || data.query.pages[-1].imageinfo) { // image exists

The reason for this is that the API returns a page id for a local image, or -1 for no image. However, it also returns -1 if it finds an image through InstantCommons.

It would be nice if there was a simpler way to do this. Maybe prop=filenameinuse that just returns true or false. There are two uses for this: 1) Making sure that an image someone wants to use for something on-wiki actually exists before they use it. 2) Making sure that uploading scripts don't upload over other files.


Version: unspecified
Severity: enhancement

Details

Reference
bz29640

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 11:37 PM
bzimport set Reference to bz29640.

I don't fully understand the feature request here. According to your own comment checking for image existence is kind of counter-intuitive, but it really only requires two tests. In fact, you can do it with one:

  • append &indexpageids=1 to the URL
  • Use something like if ( 'imageinfo' in data.query.pages[data.query.pageids[0]] )

The reason here is because the image can either have its real page ID if the description page exists, or -1 if the description page doesn't exist. It's perfectly possible (but atypical) to have an orphaned ocal image with no description page, in which case you'll also get the -1 behavior.

The &indexpageids=1 trick adds data.query.pageids as an array of page IDs used (specifically for the benefit of JS users), which allows you to be agnostic as to whether you get a real or a fake page ID. It will always be present in the result (even if you passed an invalid title) and it will always contain exactly one element if you ask for exactly one page.

IMO this is a WONTFIX.

I think the issue here is that imageinfo results shouldn't really be returned associated with a *page id* to begin with, as there may or may not be a local wiki page associated with the file.

When querying info on multiple images at once, you seem to end up with a series of negative indexes -1, -2, etc:

http://en.wikipedia.org/w/api.php?action=query&titles=File:North_Caucasus_topographic_map-fr.svg|File:Caucasus_Region_26-08-08.PNG&prop=imageinfo&format=jsonfm

Now there may just not be a good alternate way to fit this into the output format of the highly page-centric query actions, but it feels kinda awkward.

The advantage of it is that you *can* extend it easily from querying one file to querying multiple files since you can iterate over the same structure; but you need to know that the returned page ids will often be useless (for instance you can't save that -2 page id and use it in another lookup to get other info about the image, you need to use the title).

What I usually do rather than hardcoding a -1 check (unsafe!) is to simply iterate over the collection, knowing it may contain either 0 or 1 items, and that the 1 item may or may not actually have image info to return:

var imageinfo = false;
if (data && data.pages) {

$.each(data.pages, function(i, page) {
  if ('imageinfo' in page) {
    imageinfo = page.imageinfo;
  }
});

}
if (imageinfo) {

// do something with it

} else {

// didn't find a matching image

}

This is then fairly easy to extend to handle multiple lookups; as you go through each row check its title to know which one you're dealing with.

Yeah, the problem is that action=query is really designed to work with local pages, not transcluded files. I agree that there are ways to accomplish the goal, but none of them take less than half an hour to figure out (unless you happen to be Roan or Brion). If a volunteer is just coding something for fun, they may not have the patience to work it out. Most likely, they will write a test that doesn't cover all of the cases and they'll end up with a buggy tool.

While researching this, I learned that a lot of people still use an old pre-API AJAX method to accomplish this task:
sajax_do_call( 'SpecialUpload::ajaxGetExistsWarning', $filename, function() );

It would be nice if these old ajax.js methods were rewritten into API calls that were just as easy to use. Is it even safe to keep using the ajax.js methods or will they be deprecated at some point?

Bryan.TongMinh wrote:

No, you should not rely on the continuing existance of action=ajax. I think ajaxGetExistsWarning is the last thing that prevents us from killing it all together.

(In reply to comment #4)

No, you should not rely on the continuing existance of action=ajax. I think
ajaxGetExistsWarning is the last thing that prevents us from killing it all
together.

Nothing in core still uses action=ajax at all...we're only hanging on to it due to the plethora of extensions that still use it. However you're right....under NO CIRCUMSTANCES WHATSOEVER should you continue to use action=ajax.

The beauty of the ajax.js method is that it gives you a simple yes or no
answer. There are many other cases as well in which the API user simply wants a
true or false result rather than a tree of complex data to parse. This is one
reason why so many Bot writers use 3rd party API frameworks to interact with
MediaWiki rather than using our API directly. Perhaps one day we could
implement an action=check API method that is just for doing simple checks that
return boolean results.

action=check&prop=filenameexists...
action=check&prop=filehashexists...
action=check&prop=useremailable...

...especially since most of the action=query methods have a weird way of
expressing boolean values (empty string for true, undefined for false).

Setting as "lowest", although looking at the discussion it might even be a WONTFIX? Unless something different has happened in the past three years.

Is this alreay fixed? With prop=imageinfo you get back a "imagerepository" property, which contains the name of the repo or an empty string. Set the iiprop to an empty string to avoid too many extra data.

Usually values for wmf wikis are "local" or "shared", because the config for wmf wikis calls commons shared[1]. Using InstantCommons the content is "wikimediacommons" for foreign images. Using json you have still the issue with the negative page ids, but with indexpageids= you can get a array back where the pageids in the numeric order to get the values (see comment above for the implementation)

[1] https://en.wikipedia.org/w/api.php?action=query&meta=filerepoinfo

Umherirrender: I don't think you read the bug description. Yes, there is a way to get the information, but it is extremely convoluted.

What I want is:
api.php?action=filenameinuse&filename=<filename>

which returns:
{ "filenameinuse": true }
...or...
{ "filenameinuse": false }

It doesn't even need to handle multiple filenames.

This is an extremely common use case for bots (and pretty much any software that uploads images to Commons), so we should make this trivially easy.

There's currently a thread on the cultural-partners list discussing the easiest way to mass upload images to Commons. The apparent winner is:

  • Mass upload the images and metadata to Flickr
  • Transfer the images from Flickr to Commons with GWToolset

Have you guys looked at Flickr's API? It's so logical and simple that a 5-year-old could use it. Our API, in comparison, makes people run away screaming. I hope one day I have some time to work on this, otherwise, it would be awesome if someone else did.

Anomie subscribed.

While it's not as simple as requested, checking if the title's entry has an imageinfo is simple enough. Not supporting multiple titles would be a mistake, since the first feature request would be "support multiple titles". And once you've added support for multiple titles it's little more difficult to use the imageinfo method.

Have you guys looked at Flickr's API? It's so logical and simple that a 5-year-old could use it. Our API, in comparison, makes people run away screaming.

I think you're exaggerating on both fronts there. I doubt most 5-year-olds will have much luck trying to figure out the intricacies of Flickr's request signing, for example.