Page MenuHomePhabricator

Export specified revisions of a page
Open, LowPublicFeature

Description

Author: lupin.wp

Description:
This patch makes Special:Export act on oldid instructions in the url.


Version: unspecified
Severity: enhancement

Details

Reference
bz4837

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:04 PM
bzimport set Reference to bz4837.
bzimport added a subscriber: Unknown Object (MLST).

lupin.wp wrote:

patch

attachment e ignored as obsolete

lupin.wp wrote:

fix sql injection, hopefully

attachment d ignored as obsolete

The interface looks broken; a single revision ID to go along with
zero or more page names?

lupin.wp wrote:

I was envisioning using this with urls like

http://en.wikipedia.org/wiki/Special:Export/Main_Page?oldid=37120185

so I don't see how you can put more or less than one page name in there.

a) the WikiExport API accepts lists
b) you can POST as much as you like to the form handler

lupin.wp wrote:

patch supporting multiple old revisions

OK, this patch should fix those problems.

attachment e ignored as obsolete

This patch will break backup dumps, as it changes the export API and constants
without updating the scripts using it.

Requiring a revision ID on each page seems like an unnecessary and undesireable
breaking change. If this interface were used, it should be an optional parameter.

$oldid isn't checked for validity in several places in the API and gets dropped
unescaped into SQL.

Notice errors will be thrown and either displayed or logged if oldids aren't
passed or are too short for the list.

The table input interface doesn't look very nice.

lupin.wp wrote:

attempting to address brion's concerns

A revision is not required on each page - if the nth line in that field is
blank or not present, then the nth article is either retrieved in full or the
current revision is retrieved depending on $curonly, as currently happens.

$oldid is converted into an integer with intval(). Is that not sufficient? I'm
an sql neophyte, so I'm not sure what needs doing there, but I've added a few
more checks.

If $oldid is not a valid revision ID for the article, then nothing is added the
output, just as nothing is added to the output if a non-existent article is
requested. This seems OK to me.

I didn't know about error messages being spewed everywhere. How can I see them?
I've checked for keys being present, so hopefully that'll be fixed now.

Yes, the table is ugly, but I couldn't see a nicer way to do it. Do you have
any ideas? I've just omitted the interface on the special page altogether in
this patch, although it's useful for testing.

attachment pat2 ignored as obsolete

This still breaks the API; several functions have added a required parameter which
will be rarely used.

There's no change to the phpdoc about the new and changed parameters.

Looks like it will only work if the requested revision happens to be current or the
full-history option is used (which is disabled site-wide on Wikimedia sites for
now).

lupin.wp wrote:

don't require $oldid in pagesByName

attachment pat2 ignored as obsolete

lupin.wp wrote:

another attempt

phpdoc strings added, of sorts. I think all the new parameters are documented
and optional now.

I've successfully tested this with $wgExportAllowHistory set to true and to
false.

attachment pat2 ignored as obsolete

Still adding required $oldid parameters onto existing API functions, breaking
compatibility. The doc comments say 'optional' but the function definitions don't.
It'll spew warnings when they're called from existing code.

More generally, I think what I'd like to see instead of tacking revision IDs onto
the page list is to have an alternate mode entirely where you *just* pass a list of
revision IDs.

lupin.wp wrote:

much nicer patch

Thanks Brion. A new mode is a much better way to do things.

attachment pat2 ignored as obsolete

lupin.wp wrote:

more flexible patch

This patch allows queries through Special:Export which don't return the article
text and which don't return the site information header. It adds two url
parameters: notext and nosite.

The idea is that this will make such queries more efficient when such
information is not desired.

attachment pat2 ignored as obsolete

Export exists solely to provide that text, so I don't think those would be useful.

lupin.wp wrote:

"much nicer patch" reinstated

I've made the older "much nicer patch" supercede the "more flexible patch",
removing the notext and nosite flags. This patch just lets you send revision
IDs to Special:Export.

Attached:

robchur wrote:

*** Bug 5599 has been marked as a duplicate of this bug. ***

Mass compoment change: <some> -> Export/Import

What is the rationale for this?

  • Bug 15687 has been marked as a duplicate of this bug. ***

See also Bug 22881 - Greatly improved Export and Import for 1.14.1 (with support for advanced page selection, exporting and importing file uploads, and detection of "conflicts" during import). There's a patch written by me which is related to or fixes your issue.

sumanah wrote:

Lupin, I'm so sorry we're so late in responding to your patch. I tried to apply it and it would no longer apply cleanly to trunk -- I think part of that is the formatting (see https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Posting_a_patch ). If you'd like to work on this afresh, please visit MediaWiki-General in FreeNode IRC and chat with developers about the best approach. Thanks, and again, sorry!

Does this depend on bug 38669 (API export action does not support exporting specific revisions) or is that unrelated?

(In reply to Andre Klapper from comment #23)

Does this depend on bug 38669 (API export action does not support exporting
specific revisions) or is that unrelated?

It seems to be.

In T6837#92224, @brion wrote:

a) the WikiExport API accepts lists

But the resulting XML does not comply with the XML schema...

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:01 AM