importImages.php contains code to obtain the metadata from another wiki (e.g. commons) when importing images from that wiki. However, "metadata" are erroneously defined as the comment of the most recent update. An example shows why this is misleading. When importing:
http://commons.wikimedia.org/wiki/File:Esox_lucius1.jpg
the actual metadata (which include the author "Timothy Knepp") are ignored, and instead the upload comment of the second (latest) revision, i.e. then text "bigger version" is considered a the metadata.
To reproduce: Download http://upload.wikimedia.org/wikipedia/commons/c/c5/Esox_lucius1.jpg, place it SOME_PATH (your choice)
go into root of your wiki and run (replacing YOURNAME, SOME_PATH):
php ./maintenance/importImages.php --conf ./LocalSettings.php --user YOURNAME --source-wiki-url 'http://commons.wikimedia.org/w/' SOME_PATH
Presently, the function getFileCommentFromSourceWiki( $wiki_host, $file ) is used which runs in this case:
Instead a function new function getMetadataFromSourceWiki calling something like:
We run it with the following code in importImages.inc (using index.php however):
function getMetadataFromSourceWiki( $wiki_host, $file ) {
$url = $wiki_host . '/index.php?action=raw&title=File:' . rawurlencode( $file ) ;
$body = Http::get( $url );
return html_entity_decode( $body);
}
which in importImages.php is then called as:
$real_comment = getMetadataFromSourceWiki_GH( $options['source-wiki-url'], $base );
replace present getFileCommentFromSourceWiki call.
Please update trunk, so we no longer have to patch.
Version: 1.18.x
Severity: normal