Page MenuHomePhabricator

Deleted or never-existed categories to return HTTP 404 Not Found
Closed, ResolvedPublic

Description

Author: W

Description:
When a category (has been deleted or has never existed) and it has no members (ie. there are no articles in the category) then an HTTP 404 code should be returned.

Consider http://webcache.googleusercontent.com/search?q=cache:tA61J-kDRkMJ:en.wikipedia.org/wiki/Category:Elysiidae which is Google's cache of en:Category:Elysiidae. Google is continuing to report it even though the category was deleted more than two years ago. It is reported because a status code of 200 OK is returned. Presumably the idea of that was to allow for categories with members but without a category page.

Suggested solutions:

Solution 1 - cheap and, in my view, completely satisfactory. Forget the "no members" criterion. Return a 404 Not Found code for every category where no page exists. Most browsers completely ignore the difference between a 200 and a 404 code - an human viewing a category with members but with no page will see the information and never know that a 404 was returned. Search engines seeing such a page will honour the 404 and not index - is that any great loss?

Solution 2 - rigorous. For a category where no page exists, check first if it has members and send 200 or 404 as appropriate.


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=2585

Details

Reference
bz26729

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:22 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz26729.
bzimport added a subscriber: Unknown Object (MLST).

W wrote:

Bug 2585 is a bit long in the tooth and has already been implemented certainly for the (article) namespace and probably for all except category. Also the question of "no page but has members" is unique to categories. So I felt that a fresh discussion might help. Incidentally see http://en.wikipedia.org/wiki/User_talk:RHaworth/Archive_to_2011_Jan_14#Category:Wikipedia_sockpuppets_of_.E2.80.A6 which is where I got involved - one person fussing over being accused of sock-puppetry.

I remember MSIE has an option which decides whether a standard (included in MSIE distribution) page or the page sent together with HTTP 404 should be used when it meets HTTP 404.

Note, as it stands (in trunk), categories will return a 404 if the following 3 conditions are met:
*The category page doesn't exist
*There are currently no articles in the category
*There never was any articles in the category (or to be technical, there is no entry in the category table)

It would be trivial to make the check be the category page doesn't exist and there are currently no articles in the category, if we want to do that.

(In reply to comment #3)

I remember MSIE has an option which decides whether a standard (included in
MSIE distribution) page or the page sent together with HTTP 404 should be used
when it meets HTTP 404.

I believe that only happens if the body of the response is less than 512 bytes. Presumably most of our pages are above that threshold. In any case we already do this for articles, so I can't imagine its an issue

fixed in r80406

As a note. If you're editing a category (like any other page), you will get a 200 status code.