Page MenuHomePhabricator

Use VipsScaler to scale TIFFs
Closed, ResolvedPublic

Description

14:58 < bawolff> greg-g: The most likely change I imagine people will want is it to start working for tiffs
14:58 < bawolff> greg-g: Which I believe needs a mild amount of dev work to happen
14:58 < bawolff> greg-g: Which I believe needs a mild amount of dev work to happen


Version: unspecified
Severity: enhancement
See Also:
T51118: Limit number of image scaling attempts
T47212: TIF less focused than JPG equivalent (due to conditional sharpening applied to JPEGs and not TIFFS)

Details

Reference
bz52045

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:47 AM
bzimport set Reference to bz52045.
bzimport added a subscriber: Unknown Object (MLST).

Bryan.TongMinh wrote:

If I recall correctly, TIFFs are handled by the PagedTiffHandler extension, which should have some basic support for Vips.

Change 86413 had a related patch set uploaded by Brian Wolff:
Make PagedTiffHandler work with VipsScaler.

https://gerrit.wikimedia.org/r/86413

Change 86416 had a related patch set uploaded by Brian Wolff:
[WIP] Add support for page numbers to VipsScaler.

https://gerrit.wikimedia.org/r/86416

  • Bug 25647 has been marked as a duplicate of this bug. ***

IMO, this is a feature request which looks more and more like a bug report!

Here are the first historical picture uploaded by the Swiss National Library (WiR program):

Not really what I would call "an encouraging start"...

Would be really great is someone could take a few minutes to review this almost 5 months old patch.

https://gerrit.wikimedia.org/r/#/c/86413/ is still awaiting patch review by somebody from the Multimedia team. See comment 6 why this is important for GLAM.

(In reply to Andre Klapper from comment #8)

https://gerrit.wikimedia.org/r/#/c/86413/ is still awaiting patch review by
somebody from the Multimedia team. See comment 6 why this is important for
GLAM.

Its actually waiting on me to finish it (note the [wip] tag). Which I hope to do real soon now(tm)

With regard to the related story (https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/482), as a work-around I am going to try doing a limited run of the images from NYPL but using just one thread. My "problematic" run was using at least 10 threads in parallel. This seems a pragmatic approach if the operational load issue was primarily the quantity of files failing to render per minute.

Should I be able to use the GWToolset this way, it may be possible to reduce the likelyhood of any future incidents by making the use of low numbers of threads when uploading "very large" tiffs a good practice for all users with the GWToolset right.

(In reply to Fæ from comment #10)

With regard to the related story
(https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/482),
as a work-around I am going to try doing a limited run of the images from
NYPL but using just one thread. My "problematic" run was using at least 10
threads in parallel. This seems a pragmatic approach if the operational load
issue was primarily the quantity of files failing to render per minute.

Should I be able to use the GWToolset this way, it may be possible to reduce
the likelyhood of any future incidents by making the use of low numbers of
threads when uploading "very large" tiffs a good practice for all users with
the GWToolset right.

This bug isnt really about the gwtoolset issue.

(In reply to Bawolff (Brian Wolff) from comment #9)

(In reply to Andre Klapper from comment #8)

https://gerrit.wikimedia.org/r/#/c/86413/ is still awaiting patch review by
somebody from the Multimedia team. See comment 6 why this is important for
GLAM.

Its actually waiting on me to finish it (note the [wip] tag). Which I hope
to do real soon now(tm)

Ok, now it needs reviewers.

Change 135289 had a related patch set uploaded by Brian Wolff:
Make tiff thumbnail in 2 steps.

https://gerrit.wikimedia.org/r/135289

Change 86416 merged by jenkins-bot:
Add support for page numbers to VipsScaler.

https://gerrit.wikimedia.org/r/86416

Change 86413 merged by jenkins-bot:
Make PagedTiffHandler extend TransformationalImageHandler

https://gerrit.wikimedia.org/r/86413

Change 135289 merged by jenkins-bot:
Make tiff thumbnail in 2 steps.

https://gerrit.wikimedia.org/r/135289

Change 164476 had a related patch set uploaded by Brian Wolff:
Experimentally enable vips for larger (>50MP) tiff files

https://gerrit.wikimedia.org/r/164476

Change 164476 merged by jenkins-bot:
Experimentally enable vips for larger (>50MP) tiff files

https://gerrit.wikimedia.org/r/164476

So this is now live on commons experimentally.

There are some really large files (>350 megapixel) that still appear to time out.

It gives broken results for [[File:Amelia_earhart_received_by_president_coolidge.tif]] and [[file:Carr wilbur j honorable.tif]]. This appears to have something to do with how colour spaces are interpreted in grey scale tiffs (Still kind of unclear). It also seems to only appear after sharpening the images. Still not sure what the deal is here.

Experimentation locally suggests that running things through

vips im_clip2fmt tempInput.v finalOutput.png 2

may fix the issue mentioned in comment 2

(For reference, the 2 means convert to IM_BANDFMT_USHORT. Previously the convolution had converted it to IM_BANDFMT_FLOAT, which seems to sometimes cause problems when outputting as png. I'm unclear as to the why.)

(In reply to Bawolff (Brian Wolff) from comment #21)

Experimentation locally suggests that running things through

vips im_clip2fmt tempInput.v finalOutput.png 2

may fix the issue mentioned in comment 2

(For reference, the 2 means convert to IM_BANDFMT_USHORT. Previously the
convolution had converted it to IM_BANDFMT_FLOAT, which seems to sometimes
cause problems when outputting as png. I'm unclear as to the why.)

Except that breaks other images.

What appears to be happening (I think, behaviour locally and on cluster is kind of different, but related), is that image gets read into vips, using a form of either 8bit integer samples, or 16bit integer sample. Then im_convf converts it to floats to do sharpening, and then when output, it gets converted back to integer samples, but incorrectly.

What I think we need to do:

run vips im_header_int format temp.v before the convolution, save the result, then use im_clip2fmt to convert it back after the convolution.

I can confirm that all huge TIFF pictures which are in:
https://commons.wikimedia.org/wiki/Category:Media_contributed_by_Zentralbibliothek_Z%C3%BCrich_%28original_picture%29

... are now *all* thumbnailed correctly.

This is a really important fix because this bug was really weaking us in our GLAM activities in Switzerland. I hope you can keep this patch in production and thank you very much to all people involved in this fix.

@Bawolff, hope to get the chance to meet you at next european hackathon. I definitly owe you a beer.

No Gerrit patches left to review here; resetting status.
What's left to do is described in comment 22.

New problems detected with a few big TIFF (but not the biggest), but not sure where/what is the bug.

For example here:
https://commons.wikimedia.org/wiki/File:CH-NB_-_1802,_Zweite_Beschiessung_von_Z%C3%BCrich,_12._und_13._September_-_Collection_Gugelmann_-_GS-GUGE-KELLER-H-E-7.tif

But we have ~50 pictures of this category which are not thumbnailed at all:
https://commons.wikimedia.org/wiki/Category:CH-NB-Collection_Gugelmann

Error creating thumbnail: convert: /tmp/localcopy_7570ffa9a5d6-1.tif: wrong data type 7 for "RichTIFFIPTC"; tag ignored. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/706.
convert: /tmp/localcopy_7570ffa9a5d6-1.tif: Bogus "StripByteCounts" field, ignoring and calculating from imagelength. `TIFFReadDirectory' @ warning/tiff.c/TIFFWarnings/706.
convert: /tmp/localcopy_7570ffa9a5d6-1.tif: Read error at scanline 2357; got 11676 bytes, expected 21900. `TIFFFillStrip' @ error/tiff.c/TIFFErrors/496.

@Kelson this seems unrelated (and likely a problem with the source file). But please open a separate ticket for it if you think it is warranted.

Aklapper set Security to None.

(In reply to Bawolff (Brian Wolff) from comment #21)

Experimentation locally suggests that running things through

vips im_clip2fmt tempInput.v finalOutput.png 2

may fix the issue mentioned in comment 2

(For reference, the 2 means convert to IM_BANDFMT_USHORT. Previously the
convolution had converted it to IM_BANDFMT_FLOAT, which seems to sometimes
cause problems when outputting as png. I'm unclear as to the why.)

Except that breaks other images.

What appears to be happening (I think, behaviour locally and on cluster is kind of different, but related), is that image gets read into vips, using a form of either 8bit integer samples, or 16bit integer sample. Then im_convf converts it to floats to do sharpening, and then when output, it gets converted back to integer samples, but incorrectly.

What I think we need to do:

run vips im_header_int format temp.v before the convolution, save the result, then use im_clip2fmt to convert it back after the convolution.

This is split to T116947 (After rediscovering what the cause was. Should of re-read this bug first. Would have saved some time). Also patch at https://gerrit.wikimedia.org/r/249676

Example shared by Beat: https://commons.wikimedia.org/wiki/File:UBBasel_Map_Kanton_Bern_1672_Kartenslg_Schw_Cb_4.tif doesn't produce thumbnails for pages other than the first. A convert command on this files consumes about 8 GB RAM for me (just to have -compress zip).

Bawolff claimed this task.

This seems like its been fixed, way long ago... All the example images render now.