Page MenuHomePhabricator

SVGMetadataExtractor is taking too much memory/time on large svgs, rendering certain pages inaccessible
Closed, ResolvedPublic

Assigned To
None
Authored By
Lupo
Feb 17 2011, 8:26 PM
Referenced Files
F7457: svgmetadatalimit.diff
Nov 21 2014, 11:25 PM
F7455: SVGreader.patch
Nov 21 2014, 11:25 PM
F7456: end_metadata.patch
Nov 21 2014, 11:25 PM

Description

See the URL given above. Error was reported on the French village pump at the Commons,

http://commons.wikimedia.org/w/index.php?title=Commons:Bistro&oldid=49919318#Et_sous_Firefox_.3F

The page User:Sting just is not served. After a loooong time (about 4-5 minutes), one gets a WikiMedia error page saying

Request: GET http://commons.wikimedia.org/wiki/User:Sting, from <MY IP OMITTED> via amssq43.esams.wikimedia.org (squid/2.7.STABLE7) to 91.198.174.35 (91.198.174.35)
Error: ERR_READ_TIMEOUT, errno [No Error] at Thu, 17 Feb 2011 20:09:23 GMT

The user's page contains quite a few images. Don't know if that might be a problem.

Behavior confirmed in Firefox 3.6.13 (Mac OS X), Safari (Mac OS X), Firefox 3.6.4 (Win XP), IE6, Opera 10.60 (Win XP); both logged in and logged out.

The page is also not served through the secure server

https://secure.wikimedia.org/wikipedia/commons/wiki/User:Sting

it returns relatively quickly a completely unstyled page saying

Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /wikipedia/commons/wiki/User:Sting.

Reason: Error reading from remote server

Apache/2.2.8 (Ubuntu) mod_fastcgi/2.4.6 PHP/5.2.4-2ubuntu5.12wm1 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g Server at secure.wikimedia.org Port 443

Marked as "major" because although so far this concerns only one page, I think it's worth investigating before we find other pages. It's not clear to me whether this is some networking problem, or a caching (squid) problem, or a wikitext parsing problem, or some other problem in the MediaWiki code.


Version: unspecified
Severity: major
URL: http://commons.wikimedia.org/wiki/Category:Maps_of_Puerto_Rico

Details

Reference
bz27508

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 11:25 PM
bzimport set Reference to bz27508.
bzimport added a subscriber: Unknown Object (MLST).

It appears that this is caused by the reference

[[:File:Puerto_Rico_ecosystems_map-fr.svg]]

in User:Sting

Indeed, http://commons.wikimedia.org/wiki/File:Puerto_Rico_ecosystems_map-fr.svg also does not load.

However, even eliminating this link still leaves problems. Try clicking on the thumbnail at
http://commons.wikimedia.org/wiki/User:Lupo/q

or on the two links "SVG version" or "in French - raster": None of them is served! That's because they
all reference http://commons.wikimedia.org/wiki/Template:Other_versions/Puerto_Rico_ecosystems_map which in turn has references File:Puerto_Rico_ecosystems_map-fr.svg in a gallery.

Note that this also makes

http://commons.wikimedia.org/wiki/Category:Maps_of_Puerto_Rico

inaccessible.

It's also not possible to save an edit page if the wikitext contains an active (not commented out) wikilink to that file.

The user page where the problem was originally noticed

http://commons.wikimedia.org/wiki/User:Sting

has been edited in the meantime to circumvent the problem.

However, links to this SVG file still cause problems, such as

http://commons.wikimedia.org/wiki/Category:Maps_of_Puerto_Rico

being inaccessible.

When i try to upload the (13 mb svg) file on my local wiki. I get an error with svg metadata extractor exceeding max execution time, so I think its an issue with the new svg metadata extractor.

Should maybe not try to extract metadata if file is beyond a certain size.

Created attachment 8241
patch to fix this

Attached:

Stop after getting metadata

(ensure you're at least at r83254)

We could also avoid this if we stopped parsing once we got the metadata tag.
There may be files with several <metadata> tags, though, for which we would only fetch the first one.

Attached:

Created attachment 8245
Patch to only look at so much of the svg file for metadata.

How about we only look in the first 512 kb for metadata information

*Most svg files (ignoring the crazy maps) aren't even anywhere near 256 kb big
*The SVG metadata <title> and <desc> tags are almost always at the very beginning
*256 kb (Which i chose arbitrarily) of svg can be parsed pretty much instantaneously by our SVGReader class (in my tests anyways using eval.php)

Patch attached that does that. After using the patch I can successfully uploaded the Puerto_Rico_ecosystems_map-fr.svg to my wiki where before i ran into an execution time exceeded in SVGMetadataExtract type error. (Still took a long time, but i thing that's mostly from convert, which eventually gets killed by ulimit.sh) And parsing that svg using SVGReader is pretty much instantanous when done from eval.php (as i mentioned earlier in this comment) where before it took something like 7 minutes.

Attached:

I committed that in r83374. Marking this fixed as that should fix the issue (at least on my test wiki it does, using [[:File:Puerto_Rico_ecosystems_map-fr.svg]])

Gilles raised the priority of this task from High to Unbreak Now!.Dec 4 2014, 10:21 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to High.Dec 4 2014, 11:22 AM