Page MenuHomePhabricator

robots.txt -- exclude ro:VfD
Closed, ResolvedPublic

Description

Author: gutza

Description:
Please add the following to robots.txt, in order to exclude the votes for deletion on the Romanian Wikipedia:

  1. ro:
  2. http://bugzilla.wikimedia.org/show_bug.cgi?id=<Bug #>

Disallow: /wiki/Wikipedia:Pagini_de_%C5%9Fters
Disallow: /wiki/Wikipedia%3APagini_de_%C5%9Fters
Disallow: /wiki/Discu%C5%A3ie_Wikipedia:Pagini_de_%C5%9Fters
Disallow: /wiki/Discu%C5%A3ie_Wikipedia%3APagini_de_%C5%9Fters


Version: unspecified
Severity: normal

Details

Reference
bz12546

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 10:02 PM
bzimport set Reference to bz12546.
bzimport added a subscriber: Unknown Object (MLST).

Is it only those four pages? If so, it's more easily done with $wgArticleRobotPolicies.

gutza wrote:

Actually it's only two pages, they're duplicated because I've seen other languages' entries with the column escaped and unescaped. Is there any way we can modify that variable locally?

jeluf wrote:

Done.

'wgArticleRobotPolicies' => array(

'rowiki' => array(
	'Wikipedia:Pagini de şters' => 'noindex,follow',
	'Discuţie Wikipedia:Pagini de şters' => 'noindex,follow',
),

),

gutza wrote:

For some reason, the solution doesn't work. This bug was closed on January 16th, but I'm able to find a VfD page *created* on January 25th using Google: http://www.google.com/search?q=OuTopos+%22regasesc+pe+wikipedia%22

gutza wrote:

Reminder. I changed the priority because we're getting pressure over this, I expect you know why this is being requested on various Wikipedia installations so I won't reiterate the rationale.

rsocol wrote:

I consider that this is not an "enhancement", but a "normal" bug that should be fixed, in order that ro.wiki behaves similar to most other wikis. We are really getting pressure over this (as it says in robots.txt: "Folks get annoyed when VfD discussions end up the number 1 google hit for their name. See bugzilla bug #4776"), so please add those lines to robots.txt as soon as possible.

gutza wrote:

It's been three and a half months since this bug was opened, and we can still find out VfD pages on Google: http://www.google.ro/search?q=%22No+hard+feelings%2C+Bogdane%22

gutza wrote:

Actually, that was an old VfD -- just checked with several newer ones and they don't show anymore. Closing bug.

gutza wrote:

Strangely enough, although on 2008-04-19 I was unable to find the VfDs I searched for, Google's behavior proves to be erratic: an old VfD which we had deleted on account of non-flattering comments on a living person re-surfaced on Google searches. Please, pretty please with sugar on top, will someone add those lines to robots.txt?

I can confirm the <meta> robots tags on the *two and only two* requested pages *are* marked as noindex.

Did you really mean to request *those two and only two pages* or them *and all subpages*?

rsocol wrote:

I am sure that Bogdan means to request to add those pages *and all subpages*.

jeluf wrote:

Robots.txt does not affect pages that Google has already spidered. When a page that has already been deleted on the wiki shows up in Google, changes to our robots.txt will not change the search result.

http://www.google.com/support/webmasters/bin/answer.py?answer=508&src=top5
certainly implies that content will be removed from index once it's listed in robots.txt (after the site's crawled again, naturally).