Page MenuHomePhabricator

importImages.php with option --source-wiki-url uses latest comment as metadata, violating CC licenses
Open, MediumPublic

Description

importImages.php contains code to obtain the metadata from another wiki (e.g. commons) when importing images from that wiki. However, "metadata" are erroneously defined as the comment of the most recent update. An example shows why this is misleading. When importing:

http://commons.wikimedia.org/wiki/File:Esox_lucius1.jpg

the actual metadata (which include the author "Timothy Knepp") are ignored, and instead the upload comment of the second (latest) revision, i.e. then text "bigger version" is considered a the metadata.


To reproduce: Download http://upload.wikimedia.org/wikipedia/commons/c/c5/Esox_lucius1.jpg, place it SOME_PATH (your choice)

go into root of your wiki and run (replacing YOURNAME, SOME_PATH):

php ./maintenance/importImages.php --conf ./LocalSettings.php --user YOURNAME --source-wiki-url 'http://commons.wikimedia.org/w/' SOME_PATH


Presently, the function getFileCommentFromSourceWiki( $wiki_host, $file ) is used which runs in this case:

http://commons.wikimedia.org/w/api.php?action=query&format=xml&titles=File:Esox_lucius1.jpg&prop=imageinfo&iiprop=comment

Instead a function new function getMetadataFromSourceWiki calling something like:

http://commons.wikimedia.org/w/api.php?action=query&format=xml&prop=revisions&rvprop=content&titles=File:Esox_lucius1.jpg
is needed.

We run it with the following code in importImages.inc (using index.php however):

function getMetadataFromSourceWiki( $wiki_host, $file ) {
$url = $wiki_host . '/index.php?action=raw&title=File:' . rawurlencode( $file ) ;

  1. example: http://commons.wikimedia.org/w/index.php?action=raw&title=File:Esox_lucius1.jpg

$body = Http::get( $url );
return html_entity_decode( $body);
}

which in importImages.php is then called as:

$real_comment = getMetadataFromSourceWiki_GH( $options['source-wiki-url'], $base );

replace present getFileCommentFromSourceWiki call.


Please update trunk, so we no longer have to patch.


Version: 1.18.x
Severity: normal

Details

Reference
bz30582

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:55 PM
bzimport set Reference to bz30582.
bzimport added a subscriber: Unknown Object (MLST).

Could you supply the patch you're using as a file in diff format attached to this bug?