Page MenuHomePhabricator

Chunked upload of SVGs triggers INVALIDXML exception, but file is valid
Closed, ResolvedPublic

Description

Author: cmue81

Description:
Uploading a 70mb SVG file created with Inkscape fails, due to "uploadinvalidxml" exception thrown. The SVG contains a large embedded png, so variing the chunk size between 500kb to 5mb does not matter: lots of upload chunks will simply contain a base64 encoded string only.

I suspect this to be the reason for verifyPartialFile to reject chunks as invalidxml, but ultimately am not 100% sure: Shouldn't a test on well-formed-ness of XML be always done on the stashed (complete) file rather than on the chunks? This test for wellformedness within the scope of verifyPartialFile has been introduced by bug 58553, maybe regression!?

The backtrace to this is about verifyChunk() -> verifyPartialFile() -> detectScriptInSVG() -> XmlTypeCheck->wellformed == false.

Relevant code entries are to be found here:

function verifyChunk():
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadFromChunks_8php_source.html#l00364
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadFromChunks_8php_source.html#l00369
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadFromChunks_8php_source.html#l00373

function verifyPartialFile():
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l00481
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l00507

function detectScriptInSVG():
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l01255
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l01265

Thanks for dedicating time to fix this.


Version: 1.24rc
Severity: normal

Details

Reference
bz65724

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
OpenNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:20 AM
bzimport set Reference to bz65724.
bzimport added a subscriber: Unknown Object (MLST).

cmue81 wrote:

Linking (instead of embeding) the image like done currently for

https://commons.wikimedia.org/wiki/File:Bahnstrecke_Zeitz-Camburg_-_Verlauf_um_1930_-_Umnutzung_als_Radweg_ab_Juli_2012_-_hinterlegt_mit_MTB_Detailkarten.svg

also fails, since librsvg does not handle HTTP(S) links within xlink:href

However opening the svg from commons servers straight into Firefox (29) works (surprisingly). Firefox fetches HTTP(S) resources within SVGs and renders the result just fine. This leads to the weird situation that a locally opened image appears entirely different from the png rendered preview by commons.

But this is probably within the scope of librsvg bugtracker, not mediawikis. Just so to mention that what seemed to be a clean workaround (according to SVG spec at least) does not resolve this issue either. As of now there is no way to bring this image to Commons, but there was: before bug 58553 got "fixed".

(In reply to Christian from comment #1)

However opening the svg from commons servers straight into Firefox (29)
works (surprisingly). Firefox fetches HTTP(S) resources within SVGs and
renders the result just fine. This leads to the weird situation that a
locally opened image appears entirely different from the png rendered
preview by commons.

But this is probably within the scope of librsvg bugtracker, not mediawikis.
Just so to mention that what seemed to be a clean workaround (according to
SVG spec at least) does not resolve this issue either. As of now there is
no way to bring this image to Commons, but there was: before bug 58553 got
"fixed".

No, we specificly disable external images. Its a modification we made to librsvg (If your browser fetchs external images, that's its bussiness, but if WMF servers are fetching arbitrary things, that opens up some issues that are slightly scary security wise. Since 99% of the time svgs don't really need that sort of feature anyways, we disabled it).

As of now there is
no way to bring this image to Commons, but there was: before bug 58553 got
"fixed".

Until this issue is resolved, try uploading the file with [[commons:Special:Upload]]. The old upload form will still upload files without chunked uploading for files < 100 mb.

cmue81 wrote:

I've tried uploading the old upload form, but get reproducable Gateway timeout (my upstream is at 1mbit). This is why I've tried chunked upload.

As for specifically disabling external images for SVGs: Why? You're breaking the standard doing this. This is a security issue of the respective JPG or PNG libraries you're talking about. They need to be as recent and as secure as possible for this.

If they are not, then an "attacker" (i.e. commons user) could simply upload his/her malicious png/jpg using the upload form and the image would be processes by those same libraries anyway (!)

A just reason for refraining HTTP(S) references in librsvg would be the abscence of a guarantee on availability of the external resource over time.
This could be solved using two methods, the second one being the stricter one:

  1. cache external refs on thumbnail generation, check for updates on external server on thumbnail re-generation
  2. allow external refs to images residing on wikimedia servers only

The second method should be achievable even without a regexp match by simply doing a "starts with" check on the "xlink:href" value for "http://commons.wikimedia.org/" or "http://commons.wikimedia.org/", virtually this would not cost any performance. If a regexp check is tolerable performance-wise, then support for subprojects of the wikimedia eco-system might be included as well.


Ultimate security is a black box. Wikipedia is about sharing.

(In reply to Christian from comment #3)

I've tried uploading the old upload form, but get reproducable Gateway
timeout (my upstream is at 1mbit). This is why I've tried chunked upload.

As for specifically disabling external images for SVGs: Why? You're
breaking the standard doing this. This is a security issue of the
respective JPG or PNG libraries you're talking about. They need to be as
recent and as secure as possible for this.

If they are not, then an "attacker" (i.e. commons user) could simply upload
his/her malicious png/jpg using the upload form and the image would be
processes by those same libraries anyway (!)

A just reason for refraining HTTP(S) references in librsvg would be the
abscence of a guarantee on availability of the external resource over time.
This could be solved using two methods, the second one being the stricter
one:

  1. cache external refs on thumbnail generation, check for updates on

external server on thumbnail re-generation

  1. allow external refs to images residing on wikimedia servers only

The second method should be achievable even without a regexp match by simply
doing a "starts with" check on the "xlink:href" value for
"http://commons.wikimedia.org/" or "http://commons.wikimedia.org/",
virtually this would not cost any performance. If a regexp check is
tolerable performance-wise, then support for subprojects of the wikimedia
eco-system might be included as well.


Ultimate security is a black box. Wikipedia is about sharing.

Its hardly an unreasonable burden, given that commons considers embedding raster files in svgs innappropiate in most cases that they are used.

This isnt the right place for arguing about this (since its off topic for this specific bug report). Bring it up on wikitech-l if you feel strongly about it.

It isn't anything unique to large SVGs, any SVG will trigger this problem. I've been testing locally with SVGs like the following (and very small chunks):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="189" height="31"><defs/><text x="9" y="21" style="font-size:15px;fill:#000000;stroke:none;font-family:DejaVu Sans Mono">2014-10-03 14:51:00</text></svg>

Fixing it from a technical perspective is easy enough (as suggested, the specific check here should be done from verifyFile() rather than verifyPartialFile()), but I'm not sure it'll pass security review. Let's try.

Change 164569 had a related patch set uploaded by Anomie:
Don't try to verify XML well-formedness for partial SVG uploads

https://gerrit.wikimedia.org/r/164569

Change 164569 merged by jenkins-bot:
Don't try to verify XML well-formedness for partial SVG uploads

https://gerrit.wikimedia.org/r/164569

The fix should be deployed to WMF wikis with 1.25wmf8, see https://www.mediawiki.org/wiki/MediaWiki_1.25/Roadmap for the schedule.

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:11 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:21 AM