Page MenuHomePhabricator

Special:Export should return 404 on non-existent pages
Open, LowPublic

Description

Author: Astronouth7303

Description:
Special:Export should return a 404 if the article doesn't exist. Currently it returns an XML response with the site metadata only: https://en.wikipedia.org/wiki/Special:Export/Redlink

Details

Reference
bz3161

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:46 PM
bzimport set Reference to bz3161.
bzimport added a subscriber: Unknown Object (MLST).

Astronouth7303 wrote:

Patch to implement (for 1.5)

This is a patch to implement the wanted functionality. It is based off of the
current 1.5 branch (CVS v1.37 of SpecialExport.php).

attachment SpecialExport.php.patch ignored as obsolete

Astronouth7303 wrote:

Patch to implement (for 1.5)

Removed unrealated change (whitespace formatting). My bad.

attachment SpecialExport.php.patch ignored as obsolete

rowan.collins wrote:

See also bug 2585, to which I just posted a simple patch for returning 404 for
normal page and Special: page requests.

Also, why "HTTP/1.x" rather than "HTTP/1.1"?

Astronouth7303 wrote:

(In reply to comment #3)

See also bug 2585, to which I just posted a simple patch for returning 404 for
normal page and Special: page requests.

Cool.

Also, why "HTTP/1.x" rather than "HTTP/1.1"?

No particular reason.

rowan.collins wrote:

I see no sense in which there is a "dependency" relationship between this and
bug 2585; personally, I would have treated them as the same issue, but a
cross-reference seems plenty. [Of course, if something like my patch is
implemented, Special:Export should probably use $wgOut->send404header() for
consistency; that would imply that bug 2585 blocks this rather than vice versa.]

Probably better to let the export continue so the informative bits get output as expected, instead of
a mysterious blankness.

Also a 404 might not make sense if this is a POST submission; it should probably only be done for GET
requests where the target to load comes from the subpage suffix (Special:Export/Foobar).

Astronouth7303 wrote:

(In reply to comment #6)

Probably better to let the export continue so the informative bits get output

as expected, instead of

a mysterious blankness.

You could use the "-" article (since that's for generated CSS or JS) for just
informative output. (No article.) And you can still send out content with a 404
(just remove the return statement).

Also a 404 might not make sense if this is a POST submission; it should

probably only be done for GET

requests where the target to load comes from the subpage suffix

(Special:Export/Foobar).

Yes and no. It may not make sense on a user post submission. If a bot was using
POST (instead of GET), it would still make perfect sense. As I said before, you
can still send content on a 404.

(In reply to comment #7)

You could use the "-" article (since that's for generated CSS or JS) for just
informative output. (No article.) And you can still send out content with a 404
(just remove the return statement).

Not returning any output would be a weird inconsistency, IMHO. I can't imagine any
reason to make the choice to be inconsistent in that way; since we can return output
with a 404 (and should, as informative output is usually expected with a 404) choosing
not to return anything would be very unusual.

Yes and no. It may not make sense on a user post submission. If a bot was using
POST (instead of GET), it would still make perfect sense. As I said before, you
can still send content on a 404.

According to my reading of the spec, a 404 response is supposed to be about the URI;
all POSTs will be going to the same place, and it always exists, so I don't think a
404 would ever be correct there.

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No
indication is given of whether the condition is temporary or 
permanent. ...

Astronouth7303 wrote:

Patch to implement (for 1.5)

Finally fixed the two issues above.

  • Return other info on 404
  • Won't 404 on POST

attachment SpecialExport.php.patch ignored as obsolete

avarab wrote:

(In reply to comment #9)

Created an attachment (id=881) [edit]
Patch to implement (for 1.5)

Finally fixed the two issues above.

  • Return other info on 404
  • Won't 404 on POST

Please use proper $wgRequest wrappers for checking if the request was posted.

Astronouth7303 wrote:

(In reply to comment #10)

Please use proper $wgRequest wrappers for checking if the request was posted.

I don't remember seeing such a wrapper...

You're refering to WebRequest::wasPosted().

I'm wondering if I should actually just check if the value exists on $_GET,
since then I should return a 404 then even if it is POST.

Anyway, new patch on that coming.

Astronouth7303 wrote:

Patch to implement (for 1.5)

Defers POST detection to WebRequest. Nothing about returning 404 on POST if
article is from URI.

attachment SpecialExport.php.patch ignored as obsolete

Mass compoment change: <some> -> Export/Import

All I know here in 1.15alpha, is that if I enter a hundred page names,
and even one of them is invalid, I want to see an error page
mentioning what my problem is, instead of a getting a XML download,
which is missing that one page, which I will only discover weeks
later, when I am on some mountaintop retreat with no chance of getting
the right file.

TTO renamed this task from Special:Export return 404 on non-existant pages to Special:Export should return 404 on non-existent pages.Dec 25 2016, 1:12 PM
TTO updated the task description. (Show Details)
TTO edited subscribers, added: TTO; removed: wikibugs-l-list.

If so, we need some way to allow export of just the site metadata. Sure, you can get it from the API these days, but not in the same format as the XML dumps.