Page MenuHomePhabricator

Export pages should allow to export all pages
Closed, ResolvedPublic

Description

Author: dev

Description:
For small wikis, "export pages" should allow to export all pages.


Version: unspecified
Severity: enhancement

Details

Reference
bz10574

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:50 PM
bzimport set Reference to bz10574.
bzimport added a subscriber: Unknown Object (MLST).

robert wrote:

How do you define "small wikis" - would this be a configuration setting or some hard value. I would guess some wikis would rather not allow someone to download all of their content - especially while still small and relatively unknown.

dev wrote:

: exports all pages

This patch allows to export all pages if you enter : as page name. However, there is still no description that this is possible.

attachment mediawiki-exportallpages.diff ignored as obsolete

ayg wrote:

I think you left some debugging code at the bottom there. Also, it would be vastly more efficient to just directly select all pages using $exporter->allPages(), rather than generating a list of all pages (possibly a megabyte or more in memory on reasonable-sized wikis) and then going through those one-by-one.

If this is made available it needs to be a configuration option, obviously. The interface should be improved, made a separate button instead of a magic title.

dev wrote:

adds a button "add all pages" like "add pages from category"

absolutely right that we should rather give a gui option instead of a magic ":" page-name.

attachment export_addallpages.diff ignored as obsolete

dev wrote:

I could just successfully migrate my mediawiki 1.6.10 to a mediawiki 1.11.0 (phase3). For that, I exported all pages from my old wiki and imported them into my new wiki. So, exporting can help you during migrations. Please take my patch to mainstream mediawiki.

dev wrote:

option to export all pages

attachment export-all.diff ignored as obsolete

ayg wrote:

First of all, as I said, you should use $exporter->allPages(), not generate a list of all of them and go through those one by one.

Second of all, as for the configuration, I'm thinking it would be best to have that be in the form of a maximum number of pages that can be exported via Special:Export. "Export all" could then be shown if the number of pages on the wiki is less than the provided number. I say this because I saw nowhere in the code to restrict this, so I experimentally attempted to export the current version of over 200,000 files (the entire contents of Category:Living people) from the English Wikipedia. It failed after like ten minutes with an XML parse error of some kind. Probably that isn't desirable? If you get a dump of all page names, which is simple, you can easily try to dump the current version of every page on the English Wikipedia through Special:Export.

dev wrote:

use $exporter to export all pages on request

Here's my new version of the patch, it now uses $exporter->allPages(). I tested it for every combination I could think of. If you want to set some limits, please give me a hint how I can find the number of pages on the wiki. Else, please commit!

attachment wiki-export.diff ignored as obsolete

robchur wrote:

You seem to be duplicating a lot of the export code path; just add conditional checks to the actual export statements. Use the Xml class methods, not the deprecated wfSubmitButton() et al.

You can use SiteStats::pages() to get the number of pages without doing an expensive COUNT(*). I would suggest introducing $wgExportMaxPages, which defaults to false (unlimited), and checking against it.

dev wrote:

allow exporting all pages if less than $wgExportMaxPages

implemented all of RobChurch's ideas. Thanks, Rob and Simmetrical.

attachment export.diff ignored as obsolete

dev wrote:

after chatting with robchurch

This is also preventing the user from adding manually more than $wgExportMaxPages

Attached:

robchur wrote:

Working on incorporating this into some forthcoming improvements to Special:Export

dev wrote:

I really think this is a great addition - can you give me svn access so I can commit that ?
I have already an svn account at KDE - e.g. at http://websvn.kde.org/trunk/KDE/kdepim/ktimetracker/ you can see that I (tstaerk) only commit thoroughly-tested code :)

There's a (very) dodgy, unlocalised patch that I wrote one bored night for exporting a whole category up at http://www.devanywhere.com/ViewPub.php?id=22 if anybody's vaguely interested in cleaning it up and using it.

bugs wrote:

(In reply to comment #14)

I really think this is a great addition - can you give me svn access so I can
commit that ?
I have already an svn account at KDE - e.g. at
http://websvn.kde.org/trunk/KDE/kdepim/ktimetracker/ you can see that I
(tstaerk) only commit thoroughly-tested code :)

You add an attachment, it is not necessary to have SVN access for a few bugs. However, if you feel you need it, contact brion at wikimedia.org

dev wrote:

I do not feel I need svn access, I just would like to see my patch (attachment 6) to be committed.

dev wrote:

Rob, thanks for working on mediawiki, you brought a lot of innovation to it! Sad that you left

Thorsten

Mass compoment change: <some> -> Export/Import

See also Bug 22881 - Greatly improved Export and Import for 1.14.1 (with support for advanced page selection, exporting and importing file uploads, and detection of "conflicts" during import). There's a patch written by me which is related to or fixes your issue.

*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

sumanah wrote:

Thorsten, I'm sorry for the wait on this. I'm adding the "need-review" keyword to ask developers to review your code and approach.

I came up with my own 'export all' solution before seeing this bug, but this seems the right place to post it. Similar in spirit to the earlier patches, but has a simple checkbox to export everything. Uses the load balancer per the large-history option, and has a global (defaults to false) that prevents the checkbox from appearing unless set to true. This feature is really needed for smaller wikis, else everyone ends up doing ugly workarounds such as putting all pages into a dummy category, or dumping each namespace separately. Diff is against r107904

Created attachment 9791
Allow export of all pages

attachment url.txt ignored as obsolete

Created attachment 9792
Allow export of all pages

Disregard previous URL, better to have the patch

Attached:

(In reply to comment #24)

This feature is really needed for smaller wikis, else everyone ends up doing
ugly workarounds such as putting all pages into a dummy category, or dumping
each namespace separately.

Would you mind applying this (referencing this bug, natch) so that we can look at it in CR?

closing this since the fixes are made.