Magic word to add noindex to a page's header
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Nov 28 2006, 5:55 PM

Description

Author: marten_berglund

Description:
On user pages (and maybe som other namespaces as well) it should be possible to
use a magic word, something like NOGOOGLE, in order to make the google robot
not indexing that page. For instance, I have on my user page a set of subpages,
sand boxes where I play and test or make drafts to what later could be real
wikipedia articles. So I don't want google to index these pages. They now appear
early in google's search result.

On a html-page, the solution to this is to add the line
<pre>

<meta name="robots" content="noindex,nofollow">

</pre>

Could someone implement something like NOGOOGLE to be used by users who
don't want their user pages indexed?

Version: unspecified
Severity: enhancement

Details

Reference: bz8068

Related Objects

Duplicates Merged Here: T16209: Addition of __NOSPIDER__ token for pages

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:31 PM

• bzimport added a project: MediaWiki-General.

• bzimport set Reference to bz8068.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.Nov 28 2006, 5:55 PM

robchur wrote:

No. Namespaces which robots are asked not to index can be configured, however in
this case, if it's public, then it's indexable. A NOINDEX type magic word
has been discussed before and rejected simply because it's subject to abuse and
misunderstanding.

Google are quite quick at re-crawling bits of Wikipedia content, so if a draft
page has moved to the article space, they'll reflect it within a few days, usually.

marten_berglund wrote:

But let's say that the magic word NOINDEX has no effect but on subpages
belonging to the User namespace, and nowhere else. For instance, only on pages
like: http://xx.wikipedia.org/wiki/User:N_N/a_subpage.

Is that a possible compromise?

robchur wrote:

No, it's up to the people who manage the web site to determine what is and is
not indexed by search engines, and Wikimedia wikis generally have everything
indexed bar pages such as VfD/AfD/whatever the trendy TLA for deletion debates
is, which external viewers don't typically understand.

There is _no reason_ to disable indexing of your user page or any other page in
that namespace. What you are posting to a public web site is public. If you
don't want anyone else to be able to read it or edit it or whatever, _don't post
it_.

Reopening this as we're considering it or similar as an improvement over lots of manual editing of global robots.txt.

cohesion wrote:

We get complaints frequently via otrs about people wanting various logs removed that malign their companies etc. They usually serve a purpose, but it's not like it's content. Just an example. http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Spam/LinkReports

Having a NOINDEX magic word is probably the best strategy if we want to differentiate what content ought to appear in search engines in more than a very crude way. Routinely editing robots.txt is no solution, and I consider it undesirable to simply block out very broad categories of material (such as everything that is not an article).

Bryan.TongMinh wrote:

I looked into the code, but it appears that $wgOut->setRobotPolicy is called at the very beginning of Article::view. That is a lot lines before the page content is parsed and magic words are evaluated. Anybody an idea how to do this?

It should be possible to call it again to override it with specific data. You'd have to do this when pulling wiki output out of the ParserOutput object (otherwise the parser cache will always eat everything).

Bug 14209 has been marked as a duplicate of this bug. ***

ayg wrote:

Fixed in r37973. I patterned the code after NEWSECTIONLINK, and it seems to work fine.

Wbm1058 merged a task: T16209: Addition of __NOSPIDER__ token for pages.Jan 2 2016, 8:12 PM

Wbm1058 added a subscriber: • bzimport.

Danny_B removed parent tasks: T16899: When __INDEX__ and __NOINDEX__ both occur, the last one in the source should win, T16900: __INDEX__ and __NOINDEX__ should not override $wgArticleRobotPolicies.Jul 25 2016, 12:41 PM

Magic word to add noindex to a page's headerClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Magic word to add noindex to a page's header
Closed, ResolvedPublic
Actions