Page MenuHomePhabricator

tell spiders not to index Wikipedia footer and navigation
Closed, DeclinedPublic

Description

Author: info

Description:

  1. In Google, search en.wikipedia.org for 'privacy':

http://www.google.com/search?hl=en&q=site%3Aen.wikipedia.org%20privacy

Results:
153,000,000 results!

This is because "Privacy" is in the footer, so Google matches every page.

Expected:
Turn off indexing of common areas. I added some notes on how to do this to
http://www.mediawiki.org/wiki/How_best_to_search_or_spider_mediawiki_systems and
http://en.wikipedia.org/wiki/Robots_Exclusion_Standard#Directives_within_a_page ,
for Google the key is <!--googleoff: index--> ... <tt><!--googleon: index--> and
old spiders use <NOINDEX>.

You could counter-argue that if a word appears on a page and the user pastes it
into a search engine, then the engine MUST find that page. But I think the
value of eliminating all those search results outweighs this.


Version: unspecified
Severity: normal
URL: http://www.mediawiki.org/wiki/How_best_to_search_or_spider_mediawiki_systems

Details

Reference
bz5707

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:11 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz5707.
bzimport added a subscriber: Unknown Object (MLST).
  • NOINDEX is not valid in XHTML.
  • the google only comment is ... only for google. That would not really fix the

issue.

Hmmm.... I think there are several million people who use Google for their
searches (call me naive...)

robchur wrote:

(In reply to comment #2)

Hmmm.... I think there are several million people who use Google for their
searches (call me naive...)

The point as raised was that Google isn't the be-all and end-all; people can and
do use alternative search engines, which that particular special case wouldn't
affect. So yes, it would fix the problem for a lot of cases, but not all.

Marking as WONTFIX, looks like google is smart enough to give
us back pages related to "privacy".