Page MenuHomePhabricator

Better API for Picture of the Day
Closed, DeclinedPublic

Description

Author: bugzilla

Description:
I'm trying to create a Grilo[1] source for the Wiki Commons Picture of the day.

Ideally, the API server side would look at bit like the one used by the Guardian:
http://explorer.content.guardianapis.com/#/search?tag=world%2Fseries%2Feyewitness%2Ctype%2Fpicture&show-fields=all

This shows me separate fields for each interesting bits of metadata (title, author, publication date, thumbnail URL, etc.)

In contrast to the Wikimedia API which requires me to do something like:
https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=expandtemplates&format=json&text=%3Citem%3E%3Cfilename%3E%7B%7BTemplate%3APotd%2F2014-02-03%7D%7D%3C%2Ffilename%3E%20%3Ctitle%3E%7B%7BTemplate%3APotd%2F2014-02-03_(en)%7D%7D%3C%2Ftitle%3E%3C%2Fitem%3E%20%3Citem%3E%3Cfilename%3E%7B%7BTemplate%3APotd%2F2014-02-02%7D%7D%3C%2Ffilename%3E%20%3Ctitle%3E%7B%7BTemplate%3APotd%2F2014-02-02_(en)%7D%7D%3C%2Ftitle%3E%3C%2Fitem%3E

And parse the unstructured items by hand to get the titles, and further:
https://commons.wikimedia.org/wiki/Special:ApiSandbox#action=query&prop=imageinfo&format=json&iiprop=timestamp%7Cuser%7Cuserid%7Ccomment%7Cparsedcomment%7Ccanonicaltitle%7Curl%7Csize%7Cdimensions%7Csha1%7Cmime%7Cthumbmime%7Cmediatype%7Cmetadata%7Ccommonmetadata%7Cextmetadata%7Carchivename%7Cbitdepth%7Cuploadwarning&iilimit=1&titles=File%3ANickel_electrolytic_and_1cm3_cube.jpg

To get the thumbnail and file URLs (I haven't figured out how to get those details for multiple files yet).


Version: unspecified
Severity: normal

Details

Reference
bz61956

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:57 AM
bzimport set Reference to bz61956.
bzimport added a subscriber: Unknown Object (MLST).

The core MediaWiki API is concerned with accessing data provided by MediaWiki, not with taking wiki pages and trying to extract structured data from their content.

You could use the tool someone has created at http://tools.wmflabs.org/potd-feed/potd.php to fetch relevant metadata. The developer of that tool would probably be willing to work with you.

Or you could try to convince the communities on Commons and other wikis with PotDs to somehow make this data available in a more structured manner. What that manner might be I don't know. Maybe it would involve Wikidata.

Or you could write a MediaWiki extension that manages structured data related to the PotD and exposes it via an API module, and then get it deployed on WMF wikis, and then convince the communities Commons and other wikis with PotDs to change their stuff to use your extension.

Or, more generically, you might pursue some method for embedding/associating structured data in/with wiki pages. There may even be people working on this sort of thing already. Again, Wikidata might well be this.

In any case, as filed against MediaWiki API this bug isn't going to be fixed, and it's not clear to me where it might be reassigned to, so I'm going to close it as WONTFIX. If someone wants to reassign it to some other component where it would be fixable, I encourage them to do so and reopen it.

Just throwing some pointers:

*FeaturedFeeds [1] is enabled on Commons − would that help you here?

*We do have some machine-readable data on Commons, see [2]

[1] https://www.mediawiki.org/wiki/Extension:FeaturedFeeds
[2] https://commons.wikimedia.org/wiki/Commons:Machine-readable_data

Steinsplitter moved this task from Uploading to Backlog on the Commons board.

CC'ing @Spage as he's interested in seeing a potential implementation of this. :)

I also contacted the original reporter of this ticket (from Bugzilla times) via email, asking whether the last reply by JeanFred was helpful.

Yup, that talks about the featuredfeed action API provided by Extension:FeaturedFeeds. The problem is the RSS feed is not the greatest format to discover the picture of the day and get its data. I'm not sure myself how to do it from JavaScript.

Another approach is to ask for the templates that commons Picture of the Day uses to ease the administration of a picture of the day, which is basically what the submitter is doing.

Well given that from a MW prespective, the picture of the day isn't really specially identified, there's not much for the api to do.

Perhaps if information about picture of the day was included in wikidata, it would be more sensical to get the info from the api.

From https://www.mediawiki.org/wiki/API:Showing_interesting_content

api.php?action=query&prop=images&format=json&formatversion=2&titles=Template:Potd

This will often be out of date (It will reflect results as they were on the date the page_links_updated field in db is set to. That field isn't even shown in the info prop of the api though).

Well there isn't exactly a great way to get this data, I think the best out of all the choices would be to tell the user to determine the day they want the featured picture in YYYY-MM-DD format, and then use:
https://commons.wikimedia.org/w/api.php?action=query&generator=images&titles=Template:Potd/2015-08-02&prop=imageinfo&iiprop=size|user|mime|extmetadata
replacing 2015-08-02 with the date they want. (The biggest downside is its fragile in that it assumes commons doesn't change their templates)