Page MenuHomePhabricator

OOM while thumbnailing huge progressive / interlaced JPEGs
Closed, DeclinedPublic

Description

Several images have suddenly decided to simply refuse to display, but if you download (though, oddly, NOT if you simply click on the "Full resolution" link to view it, at least in Firefox), they work fine.

Examples:

http://commons.wikimedia.org/wiki/File:Suikoden.jpg http://commons.wikimedia.org/wiki/File:Somagahana_Fuchiemon_restored.jpg http://commons.wikimedia.org/wiki/File:Somagahana_Fuchiemon.jpg

It's been pointed out that there are interesting error messages:

http://upload.wikimedia.org/wikipedia/commons/thumb/d/d5/Suikoden.jpg/411px-Suikoden.jpg

gives:

'''Error generating thumbnail'''

Error creating thumbnail: convert: Insufficient memory (case 4) `/mnt/upload5/wikipedia/commons/d/d5/Suikoden.jpg'.

convert: missing an image filename `/mnt/upload5/wikipedia/commons/thumb/d/d5/Suikoden.jpg/411px-Suikoden.jpg'.

Think you can fix it? ~~~~


Version: unspecified
Severity: normal

Details

Reference
bz17645

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:30 PM
bzimport set Reference to bz17645.
bzimport added a subscriber: Unknown Object (MLST).

nadezhda.durova wrote:

This is a problem that's inhibiting access to featured content. ~~~~

Do not use interlaced (a.k.a. progressive) JPEG compression. This option greatly increases the amount of memory required for decompression, and thus reduces performance both for the server and for clients such as browsers. All three cited test cases use this compression mode.

I have uploaded one of the three files with interlacing removed:

http://commons.wikimedia.org/wiki/File:Suikoden_(no_interlace).jpg

As you can see, it works just fine. You can do this with ImageMagick using:

convert Source.jpg -interlace none Destination.jpg

Omitting the -interlace, i.e. a null convert, also appears to work.

More examples from #wikimedia-tech: [[File:Panorama_-_Ch%C3%A2teau_des_ducs_de_Bourbon_%C3%A0_Montlu%C3%A7on_depuis_l%27esplanade.JPG]], [[File:1966_map_of_the_Appalachian_Development_Highway_System.jpg]].
Isn't there a list of interlaced images? They could be replaced with non-interlaced versions by some bot.

  • Bug 36733 has been marked as a duplicate of this bug. ***

Would it be possible to change the interlace automatically during the upload? I run into this problem quite a few time since it looks like some version of GIMP save everything in the interlace mode by default.

bug #24228 can be fixed as a dupl. of this one?!

  • Bug 24228 has been marked as a duplicate of this bug. ***
  • Bug 37367 has been marked as a duplicate of this bug. ***

Tim, which of the JPEG SOF tags identify a non-interlaced image (good for us)? http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/JPEG.html#SOF

0x0 = Baseline DCT, Huffman coding
0x1 = Extended sequential DCT, Huffman coding
0x2 = Progressive DCT, Huffman coding
0x3 = Lossless, Huffman coding
0x5 = Sequential DCT, differential Huffman coding
0x6 = Progressive DCT, differential Huffman coding
0x7 = Lossless, Differential Huffman coding
0x9 = Extended sequential DCT, arithmetic coding
0xa = Progressive DCT, arithmetic coding
0xb = Lossless, arithmetic coding
0xd = Sequential DCT, differential arithmetic coding
0xe = Progressive DCT, differential arithmetic coding
0xf = Lossless, differential arithmetic coding

('exiftool -fast2' is a couple orders of magnitude faster than 'identify -verbose'.)

esby wrote:

@Nemo_bis:
Quoting Tim : Do not use interlaced (a.k.a. *progressive*)

(In reply to comment #10)

@Nemo_bis:
Quoting Tim : Do not use interlaced (a.k.a. *progressive*)

Sorry, I don't see how this answers my question. Do you mean that all sequential, lossless etc. encodings there are ok (and why)?

I dont think lossless is ok. I stumbled upon some lossless jpegs lately which could not be read with any program. (Sry, but I cant remember the SOF tag)

IMHO, either a server software supports rendering huge progressive JPEGs or it refuses them while uploading or it converts them directly after uploading.

With Upload Wizard and some modern browsers you can even try to detect those file at the client side before uploading. VirusTotal is e.g. computing a hash at the client before they upload the file in order to save server capacity. So it should be possible to read JPEG file headers.

Progressive JPEGs aren't created by digital cameras. Thus, their origin is in imaging-software. It is often just unchecking a check box. But the user has to know this. Current behaviour is NOT OK.

I would be inclined reopening this bug.

Created attachment 11219
List of Commons non-baseline images above 5 MB

Here's the first list I made with exiftool (27884 images above 5 MB).

attachment url.txt ignored as obsolete

Created attachment 11220
List of Commons non-baseline images above 5 MB

Better as explicit attachment for archiving.

Attached:

Created attachment 11500
List of 559678 Commons non-baseline images below 5 MB

Attached:

someone at commons is now converting everything:
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Convert_all_interlaced_JPGs

This can't be desired behaviour, come on, wake up.

esby wrote:

That seems indeed a bit much to convert files that are technically perfectly fine (as the thumbnail is properly generated)..

Since when does a single programmer get to set policy for the entirety of Wikimedia?

There is a bug here. Even if progressive images take up more memory, the fact that the system is not waiting and allocating the correct amount of memory is a bug.

Progressive JPEGs are going to be uploaded whether you want them to be or not. Most images on Wikipedia are uploaded from the web, and most web JPEGs are progressive, as progressive JPEGs make smaller files.

In fact, I personally have no intent to stop using progressive JPEGs since I've been using them since 2007 without incident. Lots of things editors do puts a large memory load on the server. We aren't required to try to make it easier on the system.

I've been using progressive JPEGs on Wikimedia for years, and I've not run into a problem. If I do, then maybe I'll convert, but not until then. I'm not going to condone a programmer changing policy in order to avoid fixing a bug.

And don't say you haven't changed policy. You put a demand on all Wikimedia users that they do a certain thing a certain way, even though the other way works. That's a policy change. It's even listed at the Commons Help:JPEG.

Left out something: If there is definitely a bug, you have two choices. You can try to fix it, or you leave it open so that someone else can fix it. You do not close a legitimate bug by telling people that they are required to work around it.

And this is a legitimate bug, as there's no way the servers were coincidentally that close to capacity every time the bug reporter tried to generate the thumbnail. Either enough memory is not being allocated or there's a bug requiring a lot more memory for this file than for other progressive JPEGs which work just fine.

trlkly: No idea which "policy" thing you talk about, but maintainers of a codebase are free to decide that they are actively against fixing a valid bug in the software if this would create side effects ("reduces performance both for the server and for clients such as browsers") that they considered worse.

Nope. Not in open source. (In reply to comment #21)

trlkly: No idea which "policy" thing you talk about, but maintainers of a
codebase are free to decide that they are actively against fixing a valid bug
in the software if this would create side effects ("reduces performance both
for the server and for clients such as browsers") that they considered worse.

They are allowed to refuse patches if they think the patches have downsides, yes, but not to arbitrarily declare that all such patches must have that downside. And note the word "they" rather than "he." This was a single person making the decision, without even entertaining the idea that someone might have a way to handle it.

And, in fact, there are multiple ways of getting around the issues he stated. There's no inherent reason that progressive JPEGs take longer to render than baseline JPEGs. It isn't the case on any modern software. It isn't the case that they must take up a lot more memory as, unlike thumbnailing, converting between the two can be done without full decompression. Thus the memory requirements are as low as you can stand having to go back to the disk to read more of the file.

Furthermore, thumbnailing a progressive JPEG often requires less of the JPEG to be rendered, since you only have to render up to the resolution just above the thumbnail. Progressive JPEGs essentially have their own thumbnails baked in.

There are multiple solutions that could deal with this problem without causing significant drain on the system. Most of them came in after the guy arbitrarily closed the bug without waiting for ideas on how to mitigate the problems.

A bug should be left open if it is legitimate. Closing the bug prevents anyone else from coming up with a solution that mitigates all problems.

(In reply to comment #22)

They are allowed to refuse patches if they think the patches have downsides,
yes, but not to arbitrarily declare that all such patches must have that
downside. [...]

And, in fact, there are multiple ways of getting around the issues he stated.
There's no inherent reason that progressive JPEGs take longer to render than
baseline JPEGs. It isn't the case on any modern software.

Have you brought this up with ImageMagick, then? You could also submit a patch to them, as you mention that.
(Note, there's also VIPS but I don't think we ever use it for JPEG. https://blog.wikimedia.org/2013/09/12/vipsscaler-implementation-wikimedia-sites/ )

(In reply to comment #19)

Since when does a single programmer get to set policy for the entirety of
Wikimedia?

Since before it was called Wikimedia. That's not to say it's a good decision-making system. I'm happy to hear other opinions or for others to submit patches in this area.

Progressive JPEGs are going to be uploaded whether you want them to be or
not.

It's not ideal to have bots convert them. I would prefer it if they were rejected on upload.

And don't say you haven't changed policy. You put a demand on all Wikimedia
users that they do a certain thing a certain way, even though the other way
works. That's a policy change. It's even listed at the Commons Help:JPEG.

Sure, changing policy is a hack, in the absence of a feature which would reject these files on upload.

If they were rejected on upload, then we could set a threshold based on available server memory, instead of having bot authors guess at what that threshold should be.

(In reply to comment #22)

Furthermore, thumbnailing a progressive JPEG often requires less of the JPEG
to be rendered, since you only have to render up to the resolution just above
the thumbnail. Progressive JPEGs essentially have their own thumbnails
baked in.

Maybe if the browsers or the image scaling software we use took advantage of this, then you would have a point. But as it stands, it's not really a good subject for a bug against MediaWiki. It would be a good subject for a bug against ImageMagick.

There are multiple solutions that could deal with this problem without
causing significant drain on the system. Most of them came in after
the guy arbitrarily closed the bug without waiting for ideas on how
to mitigate the problems.

Everyone should feel free to submit ideas about bugs that are closed "WONTFIX".

A bug should be left open if it is legitimate.

I think WONTFIX was an appropriate way to describe the situation.

Closing the bug prevents
anyone else from coming up with a solution that mitigates all problems.

By what mechanism? It's not like we're preventing comments on the bug, or telling upstream projects like libvips or ImageMagick to reject your patches.

(In reply to Tim Starling from comment #24)

It's not ideal to have bots convert them. I would prefer it if they were
rejected on upload.

From the usability point of view, that's horrible. I am happy when users understand what JPEG and PNG is at all. Coming from Facebook, they call everything a "Pic" and when you reject progressive JPEGs with a message like: "Progressive JPEGs must not be uploaded here, instead use baseline because it's better for our servers", I am sure we will succeed in confusing 90% of the new uploaders receiving this message.

BTW, do we still use ImageMagic for JPEGs or VIPS?

Gilles raised the priority of this task from Lowest to Unbreak Now!.Dec 4 2014, 10:12 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Lowest.Dec 4 2014, 11:21 AM