Page MenuHomePhabricator

Non existent user subpages should return 404 (again) instead of 200 response
Closed, ResolvedPublic

Description

With Change-Id: I527aa9d9c19c5cef7bebde78ef22f426bcbb3cd6 all non existent (or deleted) user pages return a 200 response.

This is not needed for user subpages (like in URL) and is irritating spiders. I suggest to return a 404 for non existent/deleted user subpages (again).


Version: 1.21.x
Severity: normal
URL: https://de.wikipedia.org/wiki/Benutzer:Raymond/Jabach-Medaille
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=45241
https://bugzilla.wikimedia.org/show_bug.cgi?id=59082

Details

Reference
bz46491

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:14 AM
bzimport set Reference to bz46491.
bzimport added a subscriber: Unknown Object (MLST).

I agree, that non-existent User/*SUB*pages - no matter if the user exists or not - should be answered with a 404.

Non-existent userpages of existent users must however be answered with a 200 to allow them as OpenID Identifier.

In case of doubts please discuss details with Ryan, thanks.

Irritating spiders in which way? Users that exist really should return 200 and not 404, even if their pages don't exist. We should likely provide some default information on those pages, like contribution lists, etc.

Speaking with my experience as Wikipedian, OTRS member and Commons Oversighter:

  • User add private information on their userpage. Later they see: "Oh, it's on Google (!!!)" and ask in panic for deletion. In the past it was easy to delete the page and request removal from the Google search index with the Google Webmaster Tools because a deleted page return a 404. Now this is impossible due to the return code 200. At least it takes longer until the spider recognizes the change.
  • I am not sure that your idea of "We should likely provide some default

information on those pages, like contribution lists, etc" would enjoy all Wikipedianer. I know on dewiki a lot of people who likes her red user page link.... A user page can be a sanctuary ...

  • This bug is only about user *sub*pages.

Agreed. Sub pages shouldn't return 200s unless they have content. That's a bug.

From bug 45255 comment 6:

"""
One consequence of this is that non-existent user pages are being indexed by
Google. A friend of mine pointed out to me that when you search for his name,
you get results from en.m.wikiquote.org, en.wiktionary.org, en.wikivoyage.org,
and beta.wikiversity.org, even though he doesn't have user pages on any of
those.
"""

Copied from Gerrit changeset 50305:

"""
Why is it ideal to use the user's user page as an OpenID identity URL? /wiki/User:Foo is already reserved for a page title. I'd think a different URL prefix (e.g., /wiki/OpenID:Foo or /wiki/Special:OpenID/Foo or whatever) would be cleanest/sanest here.

The now inconsistent behavior with responses here (based on whether a particular page is in a particular MediaWiki namespace) doesn't seem great. :-/
"""

The idea is that everyone knows what their user page is. It's also kind of silly that we have empty pages for existing users anyway. Does any other web site actually do this?

We should probably set empty 200 users pages to noindex, along with fixing the subpages.

(In reply to comment #7)

The idea is that everyone knows what their user page is. It's also kind of
silly that we have empty pages for existing users anyway. Does any other web
site actually do this?

Who cares about other web sites? I still think this should be under the control of the users.

More important: Whatever solution you will find: Keep links to user page red as long as they have no real content.

We should probably set empty 200 users pages to noindex, along with fixing
the
subpages.

Please! I got another complain from a dewiki user today because he found already deleted user subpages in Google. Thanks :-)

(In reply to comment #7)

The idea is that everyone knows what their user page is.

Yeah. OpenID is supposed to simplify site logins, so creating a special URL that people would have to memorize and use for OpenID purposes would miss the point.

It's also kind of silly that we have empty pages for existing users anyway.
Does any other web site actually do this?

Our user pages are a strange hybrid between editable content and user information. Other websites produce static (to other users) pages that can be content-free of everything but the user's name, depending on how they're set up, but our pages can go so far as being nonexistent. It's an awkward mixture, and until such time as we generate static user profiles (almost certainly never) I'd say that your suggestion:

We should probably set empty 200 users pages to noindex, along with fixing
the subpages.

is a good compromise.

(In reply to comment #8)

(In reply to comment #7)

The idea is that everyone knows what their user page is. It's also kind of
silly that we have empty pages for existing users anyway. Does any other web
site actually do this?

Who cares about other web sites? I still think this should be under the
control
of the users.

More important: Whatever solution you will find: Keep links to user page red
as
long as they have no real content.

Yep. I don't believe this functionality has changed, and I don't think we'll change that.

We should probably set empty 200 users pages to noindex, along with fixing
the
subpages.

Please! I got another complain from a dewiki user today because he found
already deleted user subpages in Google. Thanks :-)

I'll try to get a patch in for this soon.

Related URL: https://gerrit.wikimedia.org/r/64529 (Gerrit Change I900f1542b077b569ed64306ecf9f965ddabe59f8)

"One consequence of this is that non-existent user pages are being indexed by
Google. A friend of mine pointed out to me that when you search for his name,
you get results from en.m.wikiquote.org, en.wiktionary.org, en.wikivoyage.org,
and beta.wikiversity.org, even though he doesn't have user pages on any of
those."

How about returning noindex + 200 for non-existing userpages of existing users ?

(In reply to comment #12)

How about returning noindex + 200 for non-existing userpages of existing
users
?

I think it was wanted to put some user info there... but this should IMO be a different bug, this one is about sub pages.

Change merged by Anomie... whether root user pages which don't exist should get some default content (statistics or so) and whether they belong into search engines is a different issue.

(In reply to comment #14)

Change merged by Anomie... whether root user pages which don't exist should
get some default content (statistics or so) and whether they belong into search
engines is a different issue.

I split this out to bug 48667 and copied the people I thought might be interested.