Page MenuHomePhabricator

External wiki generates incorrect thumbnails of PDF files from Wikimedia Commons
Closed, ResolvedPublic

Description

Author: danny.leinad

Description:
To reproduce:

  1. Create new installation of MediaWiki and enable extension PdfHandler and set variable $wgUseInstantCommons=true;
  2. In this wiki try open PDF file from Wikimedia Commons and click 2nd, 3rd, 4 page
  3. Wiki will generate all thumbnails the same as the first page (example in URL)

PS My apologies, but I don't know which Bugzilla Compotent is proper, probably the issue is connected with extension PdfHandler or with variable $wgUseInstantCommons.


Version: 1.18.x
Severity: normal
URL: http://tools.wikimedia.pl/testwiki/w/index.php?title=Plik:Felipe_Ortega,_Flagged_revisions_study_results.pdf&page=3

Details

Reference
bz26548

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:24 PM
bzimport set Reference to bz26548.
bzimport added a subscriber: Unknown Object (MLST).

This sounds like an issue with the way ForeignAPIRepo and PdfHandler work together, but I can't tell offhand which one is to blame.

Giving this to Roan so it doesn't fall through the cracks and since he seems to have an idea about it.

Within WIkimedia wikis this works fine:

http://en.wikisource.org/w/index.php?title=File:Apuntes_Electronica_Transistores.pdf&page=2

So it 'can' work and the solution has apparently been found. But then again, WMF doesn't have the typical InstantCommons / API ForeignRepo config.

danny.leinad wrote:

(In reply to comment #3)

Within WIkimedia wikis this works fine:

http://en.wikisource.org/w/index.php?title=File:Apuntes_Electronica_Transistores.pdf&page=2

So it 'can' work and the solution has apparently been found. But then again,
WMF doesn't have the typical InstantCommons / API ForeignRepo config.

Wikimedia sister projects use thumbnails from Commons. Wikis with InstantCommons generate own thumbnails.

(In reply to comment #4)

(In reply to comment #3)

Within WIkimedia wikis this works fine:

http://en.wikisource.org/w/index.php?title=File:Apuntes_Electronica_Transistores.pdf&page=2

So it 'can' work and the solution has apparently been found. But then again,
WMF doesn't have the typical InstantCommons / API ForeignRepo config.

Wikimedia sister projects use thumbnails from Commons. Wikis with
InstantCommons generate own thumbnails.

Caching can be disabled when using wikimediacommons as a foreign file repo. Then the thumbnails are used directly from Commons.

(In reply to comment #4)

(In reply to comment #3)

Within WIkimedia wikis this works fine:

http://en.wikisource.org/w/index.php?title=File:Apuntes_Electronica_Transistores.pdf&page=2

So it 'can' work and the solution has apparently been found. But then again,
WMF doesn't have the typical InstantCommons / API ForeignRepo config.

Wikimedia sister projects use thumbnails from Commons. Wikis with
InstantCommons generate own thumbnails.

No. Wiki's with instantCommons get their thumbnails from commons too (either directly, or they are downloaded from commons, and then cached). But they do it in a different way.

The problem is, at the moment the api only supports passing the height and width parameter for getting the thumb url. ForeignApiRepo throws away any other rendering parameters (which includes page number).

According to bawolff, on IRC, when using ForeignDBRepo (rather than ForeignApiRepo, such as on WMF):

Yeah, ForeignDBRepo is fine
the getThumbUrl function of ForiegnApiRepo doesn't even take an argument for pages as far as i can tell
it appears that technically we support passing handler-specific rendering parameters that could be anything

I guess these parameters could be added to ForeignApiRepo like in ForeignDBRepo fairly easily.

Created attachment 8065
possible patch to make instantcommons work with paged media

Here's a first version of a patch to make this work.

The basic issue is:
*When transforming a file into a thumbnail, there are several parameters. The most basic are width and height. Paged media (like pdf's) also use a page parameter. Media handlers can define whatever parameters are useful to them (for example, oggHandler uses a seek parameter to determine which frame in a video stream to use as a thumbnail).
*The API only supports making thumbnails with the width and height parameters. No other parameters are supported.
*ForeignApiFile throws away any other rendering parameters that aren't width or height.

This patch fixes that by adding a new parameter to the api - iiurlparam. This should fix the issue. However there's a couple things I'm not sure about (which is why this is attached to the bug instead of committed).

*The syntax for specifying rendering parameters to the api is api.php?titles=file:your_image.pdf&action=query&query=imageinfo&prop=url&iiurlwidth=200&iiurlparam=page=3|some_other_parameter=bar
Basically, iiurlparam takes a | separated list of name=value pairs.
Having name=value inside an api parameter is a little different from how the rest of the api works. However we can't just have iiurl<param name here> because the parameters could be anything. Although we could just have iiurlpage since that is really the only one causing the issues at the mto be consistentoment, but that seems kind of half-hearted.
*Does allowing arbitrary rendering parameters introduce any security issues? I'm pretty sure it does not, since I believe they are validated before used, however not 100% sure how the image rendering code actually works. Just from giving useful error messages, the parameters should probably be validated inside ApiQueryImageInfo better then what I've done (as it stands, if something is really invalid, it should cause a mediatransformerror, which seems to just cause the api not to output a thumb parameter, with no warnings/errors. Maybe pass through the image handler's normaliseParams before trying to use, and error out if that returns false).
*A minor issue. I'm not sure what the deal is with the error codes in this module. http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&titles=image:Example.jpg&iiprop=url&iiurlheight=10&format=xml has a couple too many i's. So do the new error codes I introduced (to be consistent, since i'm not really sure what the conventions with api error codes actually are).

So anyways, thoughts?

Attached:

I discovered that there exists a validateParam method of the media handler's to validate parameters, which made me feel a whole lot better about this patch.

Committed in r81558.