Page MenuHomePhabricator

SVGZ (gzipped SVG) support
Closed, DeclinedPublic

Description

Author: ejsanders

Description:
Is it possible / worthwhile? With some very large SVGs being uploaded and the
possibility of client-side SVG rendering - it might be a good idea. Due to XMLs
repetitive nature, compression is usually fairly high.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=61442

Details

Reference
bz4947

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:05 PM
bzimport set Reference to bz4947.
bzimport added a subscriber: Unknown Object (MLST).

denelson83 wrote:

I second this idea. It would make more efficient use of space on Wikimedia's
servers.

steveb05+wikibugs wrote:

I created a patch, attached. Please post feedback here or en:User_talk:Brownsteve

INSTRUCTIONS TO USE THE SVGZ PATCH

  1. Install/upgrade MediaWiki - latest version is best
  2. Install ImageMagick 6.2 or later. This is required for SVGZ support. You may also need to install librsvg / rsvg.
  1. Apply this patch (if the patch has been accepted, SKIP THIS STEP) cd /path/to/mediawiki patch -p0 svgz.patch
  1. In LocalSettings.php, add these lines: $wgStrictFileExtensions=false; $wgEnableUploads = true;
  1. You MUST enable your web server to properly serve SVGZ files with Content-Encoding:gzip and MIME type image/svg+xml. This is NOT enabled by default. Edit httpd.conf or .htaccess (search Google if stuck)

EXAMPLE: In Apache's httpd.conf, add these lines:

AddType image/svg+xml .svg .svgz
AddEncoding gzip .svgz
<Files *.svgz.*>
RemoveEncoding .svgz
</Files>

  1. Restart your web server (on Linux, sudo /etc/init.d/httpd restart)
  2. That's it! Have fun. Report bugs/problems/etc. to en:User_talk:Brownsteve

BUGS IN THIS PATCH!!!!
SVG is not a registered MIME type with IANA.
Apache only includes IANA-approved MIME types by default.
You must configure your httpd.conf by hand until this is fixed.

We assume all browsers that can render SVG graphics
can also Accept-Encoding:gzip.

steveb05+wikibugs wrote:

Patch to enable SVGZ support

attachment svgz.patch ignored as obsolete

steveb05+wikibugs wrote:

Instructions to apply the patch

attachment svgz.INSTALL ignored as obsolete

steveb05+wikibugs wrote:

Comment on attachment 2817
Patch to enable SVGZ support

Index: maintenance/FiveUpgrade.inc

  • maintenance/FiveUpgrade.inc (revision 18367)

+++ maintenance/FiveUpgrade.inc (working copy)
@@ -714,7 +714,7 @@

  1. Height and width
		$gis = false;
  • if( $mime == 'image/svg' ) {

+ if( $mime == 'image/svg' || $mime == 'image/svg+xml' ) {

			$gis = wfGetSVGsize( $filename );
		} elseif( $magic->isPHPImageType( $mime ) ) {
			$gis = getimagesize( $filename );

Index: includes/MimeMagic.php

  • includes/MimeMagic.php (revision 18367)

+++ includes/MimeMagic.php (working copy)
@@ -23,7 +23,7 @@
image/gif gif
image/jpeg jpeg jpg jpe
image/png png
-image/svg+xml svg
+image/svg+xml svg svgz
image/tiff tiff tif
image/vnd.djvu djvu
text/plain txt
@@ -51,7 +51,8 @@
image/gif [BITMAP]
image/jpeg [BITMAP]
image/png [BITMAP]
-image/svg image/svg+xml [DRAWING]
+image/svg+xml [DRAWING]
+image/svg [DRAWING]
image/tiff [BITMAP]
image/vnd.djvu [BITMAP]
text/plain [TEXT]
@@ -368,6 +369,26 @@

			$mime = "application/x-msmetafile";
		}

+ if (substr($head,0,2) == "\x1f\x8b" &&
preg_match('/\.svgz$/si', $file) && $f = gzopen($file, "rt") ) {
+ GZip Magic Signature; probably a GZipped SVG file
(.svgz)
+
+
The host web server software is responsible to output
+ the proper "Content-Encoding: gzip" HTTP header for
.svgz
+
True SVGZ images without the ".svgz" extension WILL
fail because
+ only this extension triggers the server's correct
encoding header.
+
Example: In Apache httpd.conf add:
+ AddType image/svg+xml .svg .svgz
+
AddEncoding gzip .svgz
+ <Files *.svgz.*>
+
RemoveEncoding .svgz
+ </Files>
+ $chunk = gzread( $f, 4096 );
+ gzclose( $f );
+
+
look for svg tag
+ if( preg_match( '/<svg\s*([^>]*)\s*>/s', $chunk ) )
$mime = "image/svg+xml";
+ }
+

		if (strpos($mime,"text/")===0 || $mime==="application/xml") {

			$xml_type= NULL;

@@ -399,8 +420,8 @@

					#print "<br>ANALYSING $file ($mime):

doctype= $doctype; tag= $tag<br>";

  • if (strpos($doctype,"-W3CDTD

SVG")===0) $mime= "image/svg";

  • elseif ($tag==="svg") $mime=

"image/svg";
+ if (strpos($doctype,"-W3CDTD
SVG")===0) $mime= "image/svg+xml";
+ elseif ($tag==="svg") $mime=
"image/svg+xml";

					elseif (strpos($doctype,"-//W3C//DTD

XHTML")===0) $mime= "text/html";

					elseif ($tag==="html") $mime=

"text/html";

				}

Index: includes/Image.php

  • includes/Image.php (revision 18367)

+++ includes/Image.php (working copy)
@@ -272,7 +272,7 @@

  1. Height and width
			wfSuppressWarnings();
  • if( $this->mime == 'image/svg' ) {

+ if( $this->mime == 'image/svg' || $this->mime ==
'image/svg+xml' ) {

				$gis = wfGetSVGsize( $this->imagePath );
			} elseif( $this->mime == 'image/vnd.djvu' ) {
				$deja = new DjVuImage( $this->imagePath );

@@ -619,7 +619,7 @@

		if (!$mime || $mime==='unknown' || $mime==='unknown/unknown')

return false;

		#if it's SVG, check if there's a converter enabled
  • if ($mime === 'image/svg') {

+ if ($mime === 'image/svg' || $mime === 'image/svg+xml') {

			global $wgSVGConverters, $wgSVGConverter;

			if ($wgSVGConverter && isset(

$wgSVGConverters[$wgSVGConverter])) {
@@ -1150,8 +1150,7 @@

		$err = false;
		$cmd = "";
		$retval = 0;
  • if( $this->mime === "image/svg" ) {

+ if( $this->mime === "image/svg" || $this->mime ===
"image/svg+xml" ) {

			#Right now we have only SVG

			global $wgSVGConverters, $wgSVGConverter;

Index: includes/ImageFunctions.php

  • includes/ImageFunctions.php (revision 18367)

+++ includes/ImageFunctions.php (working copy)
@@ -139,7 +139,6 @@

/**

    • Compatible with PHP getimagesize()
  • * @todo support gzipped SVGZ
    • @todo check XML more carefully
    • @todo sensible defaults *

@@ -156,6 +155,15 @@
$chunk = fread( $f, 4096 );
fclose( $f );

+ if (substr($chunk,0,2) == "\x1f\x8b" && preg_match('/\.svgz$/si',
$filename)) {
+ it's compressed; decompress it
+ $f = gzopen( $filename, "rt" );
+ if ( !$f ) return false;
+
+ $chunk = gzread( $f, 4096 );
+ gzclose( $f );
+ }
+
Uber-crappy hack! Run through a real XML parser.
$matches = array();
if( !preg_match( '/<svg\s*([^>]*)\s*>/s', $chunk, $matches ) ) {

steveb05+wikibugs wrote:

A better patch using zlib

attachment svgz.patch ignored as obsolete

steveb05+wikibugs wrote:

Instructions to apply the patch (newer)

Attached:

(In reply to comment #6)

Created an attachment (id=2881) [edit]
A better patch using zlib

  • thinks like changing image/svg to image/svg+xml hasn't anything to do with

svgz, should be applied to own bug

  • preg_match('/\.svgz$/si', $filename), a bit unnecessary using preg_match here.

steveb05+wikibugs wrote:

The mimetype conversion is required so as not to confuse Apache. It just works
around another bug 7554, which has been sitting untouched for a while.

steveb05+wikibugs wrote:

Patch to enable SVGZ support - against r44559

Updated patch against r44559 (2008-12-13). I believe this should satisfy Carl's concerns, now that bug 7554 is resolved. Please note Apache2 will still need configuration with these lines (see Comment #3 above):

AddType image/svg+xml .svg .svgz
AddEncoding gzip .svgz
<Files *.svgz.*>
RemoveEncoding .svgz
</Files>

attachment svgz-r44559.patch ignored as obsolete

steveb05+wikibugs wrote:

Check if PHP was compiled with ZLib support

Also check function_exists( 'gzopen' ) per ^demon on MediaWiki-General.

Attached:

Just tried it out. I see two issues so far. The bigger one is that thumbnailing doesn't seem to work. The other problem is file extensions: it seems I can upload gzipped SVG as either .svg or .svgz, but not as .svg.gz (which is what I originally tried). I'm not sure what the most reasonable thing to do here would be: I'd think either gzipped SVG should only be allowed as .svgz (or .svg.gz), or we should just treat normal and gzipped SVG as identical, and probably automatically rename all three suffixes to just .svg (and maybe even go ahead and automatically gzip any SVG files not already uploaded that way). I'm mostly inclined towards the latter option, if only because it seems silly to hardcode such a trivial difference into the file page title.

ayg wrote:

(In reply to comment #12)

Just tried it out. I see two issues so far. The bigger one is that
thumbnailing doesn't seem to work. The other problem is file extensions: it
seems I can upload gzipped SVG as either .svg or .svgz, but not as .svg.gz
(which is what I originally tried). I'm not sure what the most reasonable
thing to do here would be

What we do for .jpeg, .jpg, .JPEG, .JPG, etc. is just store the extensions differently despite there being no difference in the file type. :) Which is the bug for "don't make file extension part of file name"?

I don't see why users should decide whether to gzip SVG at upload time, though. Surely it should just be transparently compressed as it's served to the user, like with styles/scripts? .svgz and .svg.gz could then be accepted as aliases for .svg on upload, and the files could be decompressed for storage. Or compressed, or whatever, but consistently.

steveb05+wikibugs wrote:

(In reply to Comment #12) thumbnailing "works for me." We must track this down, but I don't know what could be going on. It looks like you need ImageMagick >= 5.5.7 to thumbnail SVGZ images. Are you using a different $wgSVGConverter?

When uploading images to my testbed, my $wgDebugLogFile shows this:

SvgHandler::rasterize: convert -background white -geometry 180 '/var/lib/mediawiki/images/c/c1/France.svgz' PNG:'/var/lib/mediawiki/images/thumb/c/c1/France.svgz/180px-France.svgz.png' 2>&1
wfShellExec: convert -background white -geometry 180 '/var/lib/mediawiki/images/c/c1/France.svgz' PNG:'/var/lib/mediawiki/images/thumb/c/c1/France.svgz/180px-France.svgz.png' 2>&1

I don't think we should support the .svg.gz extension. Apache thinks this an archive mimetype. It then serves it up with the wrong headers and confuses Firefox, instead of decompressing inside the browser. And it might open the door to uploading any arbitrary archive file.

(In reply to Comment #13)

If SVGZ support is added, I imagine a "WikiProject Convert All Images to .svgz" might spring up, or someone might write a bot. We should just be consistent here and now, and avoid any wasted labor later on. So should the internal storage be SVG or SVGZ? I guess this is mostly a matter of opinion, so here are my thoughts:

-HTML, CSS, etc. are documents, rapidly changing and volatile (this is a wiki, after all.)
-It makes good sense to compress HTML at serve-time through Content-Encoding:gzip
-Images, including vector graphics, are much more static
-For efficiency's sake, we should compress images only once if possible. It doesn't make sense to recompress a SVG 6000 times on every serve (or even if it's cached.) We should only recompress when the file changes.
-User data, e.g. Wikipedia, has wider scope than live web sites: dbdumps, cdroms, etc. We need to consider these use-cases as well. Permanent compression could be a major advantage here.
-PNG, GIF, et al. all have compression features; this maximizes their usefulness and spread
-Wikipedia is a major driving force behind SVG; it would help further popularize the format if we support SVGZ

SVGZ is pretty ugly to handle because it doesn't have its own Content-Type... the server is supposed to serve them out with Content-Type: image/svg+xml *and* Content-Encoding: gzip... which has the added confusion that the user-agent would transparently decompress it... and if you save it to disk you'll get the decompressed version.

So, even if everything's working right on the server end, when you download you may not get back the same file you uploaded. Potentially now you've got an ".svgz" file on disk which is actually not compressed... Eww!

As a trivial test, I gzipped an SVG file and uploaded it to my web server, running Apache 2 on Ubuntu 8.10:

http://leuksman.com/misc/test2.svgz

With a stock Apache configuration, it's served out as image/svg+xml *without* the encoding setting. Firefox 3.0.3 interprets it as raw SVG, which of course is invalid XML (being a big binary blob) and doesn't render it.

After adding the Apache config bits above to add the Content-Encoding header, I find that Firefox renders it now. (Yay!) But, if I save the file to disk, it saves the *compressed* version with a ".svgz.svg" extension, which now fails to load since it's marked as uncompressed but is in fact compressed.

Safari 3.2.1 and Opera 9.5 render the image fine inline, but when saving to disk give me an *uncompressed* version with ".svgz" extension.

So... I think things are not quite mature enough here. :(

IMHO the cleanest way to go would be to transparently decompress .svgz files on upload, normalize everything to .svg, and have the web server transparently gzip .svg files when serving out, if we like, to save bandwidth. (Most of the time we don't even serve the .svg out -- we serve a .png rasterization -- so this wouldn't be a heavy burden.)

ayg wrote:

(In reply to comment #14)

-For efficiency's sake, we should compress images only once if possible. It
doesn't make sense to recompress a SVG 6000 times on every serve (or even if
it's cached.) We should only recompress when the file changes.

This will be the case in practice either way, pretty much. The HTTP response (with the compressed version of the file) should be cached indefinitely by Squid. Anyway, as Brion points out, we don't serve actual SVGs on page views and aren't going to start anytime soon:

  • Limited benefit without IE support, which last I checked looks to arrive in approximately 2027 if Microsoft doesn't invent a proprietary alternative in the intervening time.
  • Browser support needs to be as fast as rendering a bitmap, so there's no performance regression (this is far from the case right now with arbitrary SVGs).
  • It would be a pain to do this until we can assume all clients support SVG, because we would have to serve bitmaps to some users and SVG to others. Then we'd have problems with cache fragmentation and inconsistent appearance (depending on the features supported by various browsers vs. our SVG renderer).
  • Security! We would need an SVG sanitizer that we know is reliable, to avoid script injection and fun stuff like that.

So the number of times SVG will actually be served to users is likely to be very low.

-User data, e.g. Wikipedia, has wider scope than live web sites: dbdumps,
cdroms, etc. We need to consider these use-cases as well. Permanent
compression could be a major advantage here.

No image dumps exist now at all, do they? If they did, SVGs could be gzipped in the dump (or heavier compression could be used if convenient).

x00000000 wrote:

(In reply to comment #15)

SVGZ is pretty ugly to handle because it doesn't have its own Content-Type...
the server is supposed to serve them out with Content-Type: image/svg+xml *and*
Content-Encoding: gzip... which has the added confusion that the user-agent
would transparently decompress it... and if you save it to disk you'll get the
decompressed version.

Transparently decompressing documents with Content-Encoding is a bug.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11 :

The content-coding is a characteristic of the entity identified by
the Request-URI. Typically, the entity-body is stored with this
encoding and is only decoded before rendering or analogous usage.

However, support for HTTP/1.1-compliant transparent encoding via TE and Transfer-Encoding headers is still minimal, so Content-Encoding continues to be abused for that with no consistent user-agent behavior (depending on media type).

Mozilla behavior for SVGZ depends on user choice between "Web Page, complete" (saves a mangled version of decompressed SVG) and "Web Page, SVG only" (saves original SVGZ, but suggests a filename with .svg appended).

IMHO the cleanest way to go would be to transparently decompress .svgz files on
upload, normalize everything to .svg, and have the web server transparently
gzip .svg files when serving out, if we like, to save bandwidth.

That is the same as sending SVGZ with correct headers from the user-agent's point of view (until TE/Transfer-Encoding are supported), just more expensive on the server side.

(In reply to comment #17)

IMHO the cleanest way to go would be to transparently decompress .svgz files on
upload, normalize everything to .svg, and have the web server transparently
gzip .svg files when serving out, if we like, to save bandwidth.

That is the same as sending SVGZ with correct headers from the user-agent's
point of view (until TE/Transfer-Encoding are supported), just more expensive
on the server side.

We wouldn't bother with the compression until/unless browser behavior is consistent, which it isn't right now. Keeping everything uncompressed server-side makes the potential transition much simpler if that day ever comes.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

VitaliyFilippov's patch, based on XMLTypeCheck

Hi all!
I've made my own patch for this. It's simpler and based on XMLTypeCheck, it fully supports SVGZ, and also correctly detects gzipped Dia diagrams.
Review it please.

attachment mediawiki-xmlgzip-support.diff ignored as obsolete

Hmm, it kinda looks like that won't be able to distinguish between a gzipped file with .svg extension (wrong) and a gzipped file with .svgz extension (right), or an uncompressed file with .svgz extension (wrong) and an uncompressed file with .svg extension (right).

It also doesn't look like the SvgMetadataExtractor class will automatically pick up compression -- it uses XMLReader directly -- so we won't be able to extract width/height and any generic metadata that may be in the file. If we don't have a size, we can't render it.

Yeah, but there's also no separate mime type for SVGZ...

Width/height extraction works at least in 1.16... It looks like it also uses XmlTypeCheck... Is it changed in trunk?

Yep, that changed back in 1.17. Always make and test patches against trunk to make sure you're working with current code.

sumanah wrote:

Vitaliy, thanks for your patch. Do you have time to revise it so it works against trunk?

Updated patch for r103314 (VitaliyFilippov)

Updated the patch.
Is it OK to use stream wrappers for SVGMetadataExtractor?
(I mean compress.zlib://)

Attached:

Hmm, looks like it ought to work (haven't tested just yet). Looks like SvgMegadataExtractor ought to work, though I'm uncertain about that file subset cutoff thing.

From my previous comments in comment 22 it looks like the extension issue still stands: there doesn't appear to be logic to ensure that '.svg' files are uncompressed and '.svgz' files are compressed.

Additionally it may be more likely for .svgz files to be misconfigured on the server, or possibly served out incorrectly via streaming (eg when using img_auth.php on private sites, or fetching images from the image stash API or Special:Undelete) -- IIRC correct way to serve an .svgz is as:

Content-Type: image/svg+xml
Content-Encoding: gzip

if we only record that the file is of type image/svg+xml but don't track that gzipiness, we'll be serving the gzip data and clients won't handle it right.

I think my preferred handling for .svgz would be to transparently decompress them on upload and rename them into .svg files... :P

(.dia doesn't have this problem as there's not a separate extension or special HTTP header configuration for compressed files!)

On the streams -- main thing to check is what behavior you get if zlib support is not enabled in PHP. If it still works with uncompressed files, then fine -- if not then it should only kick in the compress.zlib: or gzopen if it knows they will work.

Testing the patch on current trunk... does not work for me.

My LocalSettings.php contains:

$wgFileExtensions[] = 'svg';
$wgFileExtensions[] = 'svgz';

Selecting a gzipped SVG image I've saved as .svgz lets me go through initial upload, but kicks me back with this error:

"File extension ".svgz" does not match the detected MIME type of the file (image/svg+xml)."

Renaming the same gzipped file to ".svg" extension (incorrect) allows me to upload it. It rasterizes via ImageMagick, but if I load the image directly into Firefox I get an XML error because the file is actually binary gzip data:

XML Parsing Error: not well-formed
Location: http://stormcloud.local/trunk/images/e/ef/Wrong.svg
Line Number 1, Column 1:\uffff

Renaming the uncompressed original to ".svgz" extension (also incorrect) fails like the real .svgz file did with:

File extension ".svgz" does not match the detected MIME type of the file (image/svg+xml).

(In reply to comment #26)

Created attachment 9470 [details]
Updated patch for r103314 (VitaliyFilippov)

Updated the patch.
Is it OK to use stream wrappers for SVGMetadataExtractor?
(I mean compress.zlib://)

I had written an answer, which somehow isn't here :S
Trying to summarise:

  • Noching against stream wrappers usage.
  • We should be able to work with normal svg even without zlib extension.
  • You are using fread() on a gzopen() handle. Which works, but is an undocumented feature.
  • Another option would be to use gzopen everywhere, and not check gzippiness.
  • The ( $size > $wgSVGMetadataCutoff ) check can be fooled by the compression. This could be extracted from the header in some cases, but the comment about a fake File instance being passed doesn't give me confidence.

Attached:

(In reply to comment #27)

(.dia doesn't have this problem as there's not a separate extension or special
HTTP header configuration for compressed files!)

Inkscape also doesn't make any difference between compressed and uncompressed SVG images. It opens uncompressed *.svgz and compressed *.svg without any problem :)

That isa quirk of Inkscape that you cannot rely on.

Can we at least enable gzip Transfer-Encoding for plain SVG files?

(In reply to comment #33)

Can we at least enable gzip Transfer-Encoding for plain SVG files?

That would be very logical yes! I've filed bug 54291 in the Wikimedia servers component for configuring such a thing... in theory if configured on the server it'll be more transparent to users than explicit .svgz saving.

In T6947#94022, @brion wrote:

(In reply to comment #33)

Can we at least enable gzip Transfer-Encoding for plain SVG files?

That would be very logical yes! I've filed bug 54291

Which is T56291 and solved. I no longer see a reason to offer svgz upload.

Aklapper lowered the priority of this task from Low to Lowest.Aug 2 2015, 6:20 PM
In T6947#94022, @brion wrote:

(In reply to comment #33)

Can we at least enable gzip Transfer-Encoding for plain SVG files?

That would be very logical yes! I've filed bug 54291

Which is T56291 and solved. I no longer see a reason to offer svgz upload.

Hence I am boldly declining this request.