Page MenuHomePhabricator

Special:Export of a single page with huge history occasionally forgets </page> and </mediawiki> closing tags
Open, MediumPublic

Description

Titles which gave problems

I exported some pages (one at time) with huge histories (700-1200 MiB) on http://wiki.guildwars.com/ and I got this: http://p.defau.lt/?T1uU_47VU1PSgvqoeIHbog

If you want to dig (not that it's useful, we know Special:Export and Special:Import are just crazy)...
Number of revisions etc.: http://p.defau.lt/?xmcOCZY6XANVbTILv_7Vtg
I fixed the xml in a stupid way: http://p.defau.lt/?eT1TFHvidKzMM4uZsoIpYg
Titles are attached (the first ones) and you can try to export them although I don't suggest to. Server is quite fast, I downloaded them at about 400 KiB/s if I remember correctly (usually such exports almost kill the server).
I'm compressing the xml files and I'll publish them soon.


Version: unspecified
Severity: minor

Attached:

Details

Reference
bz29961

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:34 PM
bzimport set Reference to bz29961.
bzimport added a subscriber: Unknown Object (MLST).

This probably indicates a fatal PHP error during output...

If you have direct server access or can reach the site admin, check the web server error logs for messages.

A fatal error that has as effect only those two missing tags, or do you mean that perhaps some more horrible thing happened (for instance some missing revisions)?
Files are here: http://www.archive.org/details/wiki.guildwars.com
Now I'll check the revisions and I'll try to contact sysadmins.

EN.WP.ST47 wrote:

Probably that some more horrible thing happened. We're not unfamiliar with import and export crashing spontaneously on certain revisions, and it would be reasonable to assume that if export choked on a revision, it would simply stop output, resulting in no closing page tag, but with the last successfully exported revision being fully output. Did the pages get fully exported?

*nod* export aggressively disables output buffering so it can push out arbitrarily long data. A failure between revisions, or between buffer flushes, would be expected to produce output that just stops between revisions or such.

Yes, there were several thousands missing revisions. The actual numbers would be:
ArenaNet:Guild Wars 2 suggestions/Scratchpad (7832)
ArenaNet:Guild Wars suggestions/bigArchive (6107)
User talk:Gaile Gray (18192)
User talk:Isaiah Cartwright/Temp (8257)
User talk:Linsey Murdock/Temp (10474)
User talk:Regina Buenaobra/Temp (11452)

I redownloaded one of them but seems to have failed again. Probably not much to do except downloading history in chunks, but it would be nice to have a proper xml in some way.
Also, [[m:Help:Export#2._Perform_the_export]] says: «Open the XML file in a text editor. Scroll to the bottom to check for error messages», so apparently there are or used to be some error messages.

(In reply to comment #6)

Also, [[m:Help:Export#2._Perform_the_export]] says: «Open the XML file in a
text editor. Scroll to the bottom to check for error messages», so apparently
there are or used to be some error messages.

Some there still are, see bug 39639.