Page MenuHomePhabricator

Import/Export should support zipped XML
Open, LowPublicFeature

Description

The Import and Export functions should support zipped XML as an option. This
would solve several issues when importing/exporting a large number of
pages/versions:

For Export:

  • Browsers may try to display the XML in a fancy tree. For a several-MB-file,

this may bog down the computer or crash the browser. It's pointles, anyway

  • Browsers often mangle the XML when saving it. In FireFox, saving the page from

the source view leads to broken results, and saving from the normal XML view
will only work if you manually select the (non-obvious) "HTML only" option.

  • Downloading a ziped file will be faster, even if zlib compression is enabled

on the server, because the browser will not uncompress it.

  • A ziped file will trigger a download dialog, which makes more sense for an

export than showing XML in the browser.

For Import:

  • If export supports zip, import should too
  • The Ziped file will be a lot smaller. People may have upload limits in php, in

apache, or for their web account.

  • Zipped files can be detected and handeled automatically

An additional option for importing a file that is on the server's file system
may be handy too, especially for people who don't have shell access to the
server, but can upload stuff via FTP (as is quite often the case). But there may
be security issues with this.


Version: 1.6.x
Severity: enhancement

Details

Reference
bz3893

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:54 PM
bzimport set Reference to bz3893.
bzimport added a subscriber: Unknown Object (MLST).

victor.stinner wrote:

Patch for Special:Import to support gzip and bzip2 compression

I'm new in MediaWiki code. This patch used some lines of phpMyAdmin source code
(adapted for MediaWiki): file format detection. I wrote a PHP stream (file
StringStream.php) to reuse MediaWiki code instead of writing my own
ImportStreamSource class.

Be carefull: stream_wrapper_register function needs "PHP 4 >= 4.3.2, PHP 5".

TO DO:

  • Maybe show different error messages if gzip/bzip2 decompression isn't

supported

  • Write new ImportStringStreamSource (based on ImportStreamSource) to be

compatible with PHP < 4.3.2 (?)

  • Test it :-)

I wrote this class to upload large XML file (+ 8 MB), but it didn't solved my
problem. I have to split XML into several parts...

Haypo

attachment mediawiki_import_gzip_bzip2.patch ignored as obsolete

It looks like it's decompressing the entire file to memory first; this seems
really inefficient, and you can hit your memory_limit or run the server into
swap on a large file.

I'd recommend instead using PHP's stream wrappers (compress.zlib:// and
compress.bzip2:// for .gz and .bz2 respectively), as importDump.php already
does on the command line.

slavaz wrote:

gzipped import/export

attachment gzip.patch.gz ignored as obsolete

slavaz wrote:

Hi all. See attach - this my way for gzipped import/export. :)

slavaz wrote:

Patch for gzipped import/export

I' so sorry... :( Previous patch I'm attached in gzip format.
Now, patch attached in plain/text :)

attachment gzip.patch ignored as obsolete

slavaz wrote:

Patch for gzipped import/export

I' so sorry... :( Previous patch I'm attached in gzip format.
Now, patch attached in plain/text :)

attachment gzip.patch ignored as obsolete

Patch doesn't look right; the import infrastructure already has
support for reading gzipped data using fopen wrappers. Better to
reuse that rather than changing all the functions around.

Export data is also gzipped normally if the user-agent requests it,
as is all other output. Why double-zip it?

Zipping the output explicitely avoids several problem: if only transfer-encoded
as zip, the browser will probably a) unpack it unnecessarily b) try to show it
as an xml tree and c) may even save it as something html-like. Sending it out as
an archive will cause the browser to simply pop up a download dialog, which is
what the user expects, and also much faster and safer (integrity-wise).

Double-zipping should be avoided, though. I don't know on which level mediawiki
handles transfer-encoding, but ziping should be avoided for archives, images,
etc... basically, everything that's not text/*. This should be doable - maybe
PHP is even smart enough to handle this automatically?

slavaz wrote:

the import infrastructure already has support for reading gzipped data

gzopen() are transparently open gzipped and not-gzipped files. All ok. :)
But in import I'm don't check for present functions 'gzopen', 'gzread'...
Double-gzipped.. Gm... I have seen this problem, therefore have put a degree of
compression 0 in export-function.
Now I solve it.

Mass compoment change: <some> -> Export/Import

(In reply to comment #8)

Zipping the output explicitely avoids several problem: if only transfer-encoded
as zip, the browser will probably a) unpack it unnecessarily b) try to show it
as an xml tree and c) may even save it as something html-like. Sending it out as
an archive will cause the browser to simply pop up a download dialog, which is
what the user expects, and also much faster and safer (integrity-wise).

You can instead use the Content-Disposition header, as we currently do.

$wgRequest->response()->header( "Content-disposition: attachment;filename={$filename}" );

Comment on attachment 1409
Patch for gzipped import/export

Marking dupe patch obsolete

I guess it's fine to allow the user to select "compress output file" or something on the special:export form. I would make it the user's choice though, rather than replacing the existing plain text format as this patch does.

sumanah wrote:

Comment on attachment 1082
Patch for Special:Import to support gzip and bzip2 compression

marking patch obsolete per Ariel's review

sumanah wrote:

Comment on attachment 1410
Patch for gzipped import/export

Marking patch obsolete per Ariel's review

I believe this is a great idea. It will be nice to import zipped files. A big cheerup for the programmer who will take this on!

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM
Aklapper removed a subscriber: wikibugs-l-list.