Page MenuHomePhabricator

Export regression: history included on GET requests with action=submit
Closed, ResolvedPublic

Description

Author: scotthatton

Description:
The XML Special:Pages export is bundling all revisions instead of just the latest,
whatever the user asks for in the export. This started on Sunday 6 August.

Here is an example in a Wiki which uses Wikipedia pages "On the fly":
http://www.wikinfo.org/wiki.php?title=Doune. The page "Doune" didn't exist in Wikinfo
(at time of writing this anyway). Wikinfo try to import from the XML and this happens.

Here is an example of my own website in a test page:

http://www.globalguide.org/test.html?id=100213

(see bottom of my page)


Version: unspecified
Severity: normal

Details

Reference
bz6946

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 9:18 PM
bzimport set Reference to bz6946.
bzimport added a subscriber: Unknown Object (MLST).

scotthatton wrote:

*** Bug 6938 has been marked as a duplicate of this bug. ***

jimmy.collins wrote:

See r15959 (Added experimental history paging API, subject to change).

scotthatton wrote:

This change has caused problems to everybody using GetWiki 1.0 - a lot of people.

Can this be put back ASAP? My website, for instance, which extracts Wikipedia data in
this way is no longer functional.

I see other sites, such as WikInof similarly affected.

scotthatton wrote:

The Special:Export page itself has a checkbox: "Include only the current revision,
not the full history". This should be the default (no history) with a manual
override - i.e, software like GetWiki should not see all revisions - the current
revision only should be the default.

scotthatton wrote:

Who is going to fix GetWiki? What an unhelpful comment.

It is this change to MediaWiki which has caused the problem in the first place.

robchur wrote:

Wait a moment...because a feature was added to MediaWiki, users are outright
complaining AGAIN? We're well within our rights to add stuff to the software.
External sites using our interfaces to take content will need to keep up with
the developments.

However, this sounds like a duplicate of a more clarified bug report posted
after this one. Finding that and marking this as a duplicate is an exercise left
up to the reader.

scotthatton wrote:

I am not "users", I am a person who relies (relied) on GetWiki to import my data. I
thought we were all in this together - I didn't realise it was an "us" and "them".

This avenue is now closed to me and I have to spend the next few days inventing a new
solution.

P.S. What was the problem that this upgrade fixed?

achuggard wrote:

As of r16018 (the version I have for my wiki anyways) I am still able get only
the most current revision of an article by pointing my browser at <Wiki
root>/Special:Export?pages=<Article Name>&curonly=1&action=submit

This is related to (but kind of the opposite of) Bug 9671

This is not a bug on MediaWiki; it's a bug on GetWiki. Its parsing code is
hopelessly broken, and can only work by chance. See my comment on
[[Wikipedia:Village pump (technical)#XML export format change]] for details.
Marking INVALID.

Workaround (untested): change $wgExportwiki on GetWiki to
http://en.wikipedia.org/w/index.php?title=Special:Export&curonly=1&action=submit&pages=
instead of http://en.wikipedia.org/wiki/Special:Export/; this should make
MediaWiki return exactly what it was returning before. HOWEVER, this is only a
temporary workaround; if GetWiki is not fixed, it'll probably break again the
next time anything is changed on MediaWiki's Special:Export. It's meant only as
a stopgap fix. You really should dedicate some resources to fixing that code,
before it breaks again.

scotthatton wrote:

Cesar,

Thanks very much for your help. Your workaround indeed solves the problem
(temporarily). Hopefully GetWiki will issue a version 2.0. Alternatively I will try
to become a PHP programmer in the interim!

wimroffel wrote:

In the past Wikipedia recommended webmasters not to download the complete html page but to use Special:Export in order not to burden their servers too much. Now all
the information about those directives is gone and we are left with a broken Special:Export.

This is not a GetWiki bug. This MediaWiki/Wikipedia creating a mess.

robchur wrote:

No, this is MediaWiki evolving to support more complicated access to the export
interface. If there is a problem keeping up with the interfaces we provide, then
consider other options; paid OAI updates or downloading our XML dumps are two such.

The format has not changed, it is exactly the same as it was. If
something is not working the same today as it did two weeks ago,
please be very specific.

Also please check the *CURRENT STATUS RIGHT NOW* as bugs
introduced earlier in the month were fixed.

Yes, this is a GetWiki bug. Go read what I wrote on the Village pump. The
GetWiki code to read the exported XML is completely wrong, so that even the
smallest change can make it break. MediaWiki didn't change anything on the
format; it's still using the same 0.3 schema, and it's not MediaWiki's fault if
GetWiki cannot follow the schema.