Page MenuHomePhabricator

Mark redlinks with rel="nofollow"
Closed, DeclinedPublic

Description

Author: elliotgoodrich

Description:
I would like to see rel="nofollow" tags on links that point to editing and
history pages, as otherwise (unless using short URLs and blocking /w/ with
robots.txt) you can't stop search engines indexing those unwanted pages.


Version: unspecified
Severity: enhancement
URL: http://www.psconclave.com/wiki/Main_Page

Details

Reference
bz6545

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:17 PM
bzimport set Reference to bz6545.
bzimport added a subscriber: Unknown Object (MLST).

Bug 2585 is likely more appropriate.

elliotgoodrich wrote:

I know this is similar, however the search engines will still try to crawl those
pages, and therefore creating extra work for the server, increased bandwidth etc.

On everypage there are at least 2 links that you should not follow (if a SE),
the edit and history, then you need to add all the nonexistant page links to
that list, and all the section edits, and the SE could end up following more bad
then good links.

Quote from the other page...
"Search engines DO care. They won't index pages sent with a 404 (or at least they
shouldn't)"

This would just be an extra measure to stop unwanted indexing. On Google
sitemaps using robots.txt to disallow the w/ directory also leads to thousands
of results on the "URLs restricted by robots.txt" page, making it hard to
diagnose incorrect URL blocking.

(sorry if I sound stroppy, I'm smiling really :P)

robert wrote:

Implements rel="nofollow" on 'new' links

This patch adds rel="nofollow" to all red links. Has undergone basic testing, I see no way this could cause problems. After searching all files I have identified one which also uses class="new" - Skin.php - although it uses it in a preg_match in a way which would still match with rel="nofollow" appended.

Attached:

robchur wrote:

Note that rel="nofollow" does NOT mean, "don't follow this link", it means, "don't assign the target weight in any kind of ranking algorithm", e.g. Google PageRank.

Edit pages and page histories ought to be emitting appropriate "noindex" meta-tags, which will prevent them being indexed by robots.txt-compliant robots.

robert wrote:

Just to clarify, is that a WONTFIX or should I wait for someone with more authority.

robchur wrote:

No, I'm just commenting that rel="nofollow" doesn't *mean* what you think it means, and that the requested change ought to be redundant with our existing code.

robert wrote:

Ok, I guess there are two elements to the decision of accepting this patch or not. It is mainly redundant so in some cases might add a few bytes to a page increasing bandwidth and page load times, on the other hand some search engines might do what the tag says on the front of it rather than what is intended. Also some search engines may not understand the meta tags, having both doesn't appear to cause any harm.

Is this still of interest now that bug 2585 has been fixed?

  • Bug 19065 has been marked as a duplicate of this bug. ***

mac.med02 wrote:

*** This bug has been marked as a duplicate of bug 2585 ***

robert wrote:

This is not a duplicate.

Resolution should be WONTFIX if it is not going to be implemented, and made explicitly by someone with the relevant jurisdiction.

mac.med02 wrote:

Is it not a duplicate? As far as I can tell, the bug opener wanted to block search engines from crawling through redlinked URLs. As Rob Church says above, rel=nofollow does not do this. Bug 2585 makes redlinks a 404, effectively stopping search spiders. To me, that sounds like it fixes the problem. And btw, I am just as entitled as anyone to share my opinion here. There are different levels of authority on-wiki, yes (admin, crat, etc.) but this is a open forum for discussion, and I am permitted to form my own opinion. If I was incorrect marking it as a duplicate, I'm sorry and I accept that blame, but I dislike the implication that I am not allowed to mark it as such.

robert wrote:

Sorry, that perception was not intentional; generally marking bugs as WONTFIX is done only be senior developers, and I wanted to make that clear.

In any case this bug was about the specific implementation of the desired feature, rather than the result. For what it's worth I would support closing this bug as WONTFIX at this point.

(In reply to comment #9)

Is this still of interest now that bug 2585 has been fixed?

Bug 2585 got reverted, but that's neither here nor there. These two bugs can be implemented independently of another. Although, solving that is another way of getting at the desired result: not having the bot index the edit page.

(In reply to comment #8)

Ok, I guess there are two elements to the decision of accepting this patch or
not. It is mainly redundant so in some cases might add a few bytes to a page
increasing bandwidth and page load times, on the other hand some search engines
might do what the tag says on the front of it rather than what is intended.
Also some search engines may not understand the meta tags, having both doesn't
appear to cause any harm.

But it doesn't really have any added benefit. It just adds more bytes to the page. We already serve noindex,nofollow on the edit page itself, so serving it along with the link to the edit page would be redundant at best. As Rob pointed out in comment #5, the tag doesn't tell the bot to not follow the link. In all likelihood, the bot is probably loading the link anyway unless you're blocking it somewhere else; a noted workaround from the OP anyway.

WONTFIX.