Page MenuHomePhabricator

Images embedded in PDF have an excessive resolution
Closed, ResolvedPublic

Description

I see http://www.imagemagick.org/script/command-line-options.php#density is currently set at 600 dpi. https://gerrit.wikimedia.org/r/#/c/138149/7/lib/index.js,cm

This seems excessive, makes it very easy to reach tens or hundreds MB in PDF from few big pages. For a book, 300 dpi is generally plenty; scans for instance rarely go over 400 dpi.


Version: unspecified
Severity: normal
See Also:
http://web.archive.org/web/20110722071118/http://code.pediapress.com/wiki/ticket/463

Details

Reference
bz68576

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:34 AM
bzimport added a project: Collection.
bzimport set Reference to bz68576.

That -density argument doesn't do what you think it does. It just edits the EXIF metadata, it doesn't affect the file contents at all.

It's necessary because some commons JPEGs have silly DPI settings (like '1') which cause xelatex to try to compute absurdly large image sizes.

There is a separate size argument for mw-ocg-bundler, when the file contents get fetched. It currently defaults to 1200px.

Created attachment 16190
Chess oversized PDF

(In reply to C. Scott Ananian from comment #1)

which cause xelatex to try to compute absurdly large image sizes.

And what does xelatex do with this dpi info?

I tried https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=First-move+advantage+in+chess&oldid=619135240&writer=rdf2latex from bug 68929 and it produced an unnecessarily big PDF, 500 KB for 20 pages and 3 photos. pdfimges -j extracts a 750x1000px image from the last one: if it's actually that big, it must be reduced.

Attached:

Created attachment 16191
Comet

Also [[Comet]] is too big, at 2 MB. Is the first image really 5.6 megapixels?

Attached:

Reopening (otherwise it's hard to find) with a more generic summary (please change as appropriate).
Example: 16 MiB for [[de:Corps]], https://de.wikipedia.org/w/index.php?title=Spezial:Buch&bookcmd=render_article&arttitle=Corps&oldid=60280924&writer=rdf2latex

Is a 1200px x 1200px maximum resolution excessive? What should this be reduced to?

(As a separate issue -- in bug 68836 we are apparently not correctly passing options from the front end to the renderer backend; perhaps we're also ignoring the option set by the OCG frontend here as well?)

(In reply to C. Scott Ananian from comment #5)

Is a 1200px x 1200px maximum resolution excessive? What should this be
reduced to?

If you can't control the actual dpi, perhaps even some 250x250 would be ok.

We should aim for print appropriate resolution, since that's one benefit of obtaining a PDF. Is there no way to obtain the resolution closest to 300 PPI for the output format we're targeting?

1200px x 1200px is 300dpi for a 4" wide image (single column). That is "print-appropriate" -- but many of our articles happen to have a large number of extremely high quality images. I think what we actually need to do is render by default at a lower resolution and "opt-in" to full resolution images by exposing a dpi setting in the book creator.

Change 162296 had a related patch set uploaded by Cscott:
Reduce default image resolution to 150dpi.

https://gerrit.wikimedia.org/r/162296

Change 162296 merged by jenkins-bot:
Reduce default image resolution to 150dpi.

https://gerrit.wikimedia.org/r/162296

With the patch in comment 10, [[:de:Corps]] is 11.4MB (down from 16MB in comment 4) but [[First-move advantage in chess]] is still 560k (same as comment 2) with the same 750px x 1000px image. That should have been reduced to 600px by the patch in comment 10, so something is still not quite right here.

Correction, [[First-move advantage in chess]] is down to 384k and the image is 600x800 now. I was looking at an older cached version.

The total size of the JPG images in [[First-move advantage in chess] is 104k+72k+36k = 212k. So that's 172k of "other stuff", presumably mostly fonts. That doesn't seem unreasonable. As Nemo notes above, since we scale oversize images we're already doing better than mwlib used to.

There's a GC bug, fixed by a patch I'm deploying today, so the ganglia info might not be strictly accurate.

Nevertheless, our PDFs are chunky. I haven't seen any information that they are *more* chunky than the old mwlib images, however -- if anything, I believe them to be rather *smaller*. It's just a consequence of embedding print-quality images.

If I remember correctly, mwlib fetches images at a standard size (max 1200px wide). From our experience larger images did not lead to a noticable increase in output quality. Download PDFs might have used even smaller images.

I believe this was fixed as part of bug 72377.

  • This bug has been marked as a duplicate of bug 72377 ***

(In reply to C. Scott Ananian from comment #16)

I believe this was fixed as part of bug 72377.

Probably!

(In reply to Nemo from comment #4)

Example: 16 MiB for [[de:Corps]],
https://de.wikipedia.org/w/index.php?title=Spezial:
Buch&bookcmd=render_article&arttitle=Corps&oldid=60280924&writer=rdf2latex

It's now 1.8 MiB. :)