Page MenuHomePhabricator

Allow crawling of bugzilla.wikimedia.org select content
Closed, DeclinedPublic

Description

The robots.txt rules are unnecessarily restrictive. As bugzilla is being deprecated, and only a portion of its content migrated to phabricator, it's essential that we allow third parties to do their job. All crawlers, or at least ia_archiver (wayback machine), should be allowed to crawl:

  1. any content which
  2. doesn't specifically cause load issues and
  3. is not being semantically migrated to phabricator.

Ideally we'd drop requirement (3) but let's start somewhere.

Example URLs which shouldn't be blacklisted:

  • /page.cgi?id=voting/bug.html*
  • /duplicates.cgi*
  • /report.cgi* (unless load)
  • /weekly-bug-summary.cgi*
  • /describecomponents.cgi*

In fact, is there any reason not to allow everything, minus:

  • /show_bug.cgi
  • /showdependencytree.cgi
  • /query.cgi

?


Version: wmf-deployment
Severity: enhancement
URL: http://web.archive.org/save/https://bugzilla.wikimedia.org/duplicates.cgi

Details

Reference
bz72507

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:49 AM
bzimport set Reference to bz72507.
bzimport added a subscriber: Unknown Object (MLST).

Wikimedia has migrated from Bugzilla to Phabricator. Learn more about it here: https://www.mediawiki.org/wiki/Phabricator/versus_Bugzilla - This task does not make sense anymore in the concept of Phabricator, hence closing as declined.