Page MenuHomePhabricator

Exclude bot-generated spam reports on meta from indexing via robots.txt
Closed, ResolvedPublic

Description

Author: mike.lifeguard+bugs

Description:
Please exclude the following from being indexed by modifying robots.txt:
*[[m:Talk:Spam blacklist]] and subpages
*[[m:User:COIBot/LinkReports]] and subpages
*[[m:User:COIBot/COIReports]] and subpages
*[[m:User:COIBot/UserReports]] and subpages
*[[m:User:SpamReportBot/cw]] and subpages
*And the talk pages for all of the above


Version: unspecified
Severity: enhancement

Details

Reference
bz14076

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:13 PM
bzimport set Reference to bz14076.
bzimport added a subscriber: Unknown Object (MLST).

admin wrote:

I agree with excluding crawlers from the bot report pages. I disagree with removing crawlers from [[m:Talk:Spam blacklist]] and its subpages. See my reasoning at http://meta.wikimedia.org/w/index.php?title=Talk:Spam_blacklist&oldid=992898#Excluding_our_work_from_search_engines

mike.lifeguard+bugs wrote:

OK, per discussion, please do not exclude [[m:Talk:Spam blacklist]] or subpages. The rest listed above is fine to add.

mike.lifeguard+bugs wrote:

Currently we are planning on having a bot edit some 47000 pages to add NOINDEX - it would be much easier to have the bot reports not indexed with this addition to robots.txt

We are still deciding whether to have the talk page and/or archives indexed or not, but we can manage that with the magic word; no action is required from the sysadmins.

mike.lifeguard+bugs wrote:

Actually, it is ~25000 pages (enwiki was included in the 47000 figure), but the point remains.

mike.lifeguard+bugs wrote:

Fixed by r37973 - NOINDEX is applied to the pages in question through a template (& new ones use the magic word directly).