Page MenuHomePhabricator

Old revisions should use 'noindex,nofollow' instead of 'noindex,follow' meta robots tag
Closed, ResolvedPublic

Description

Author: ekim

Description:
I turned off nofollow, but I thought it would still put nofollow on the old
history pages and diffs. There should be no reason for a history page to not
have the nofollow tag on links, since it defeats the purpose of policing the
wiki if the spam link is still valid.


Version: 1.5.x
Severity: normal
URL: http://www.pooshlmer.com/touhouwiki/index.php?title=Talk:Night_Sparrow&oldid=7599

Details

Reference
bz4280

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:01 PM
bzimport set Reference to bz4280.
bzimport added a subscriber: Unknown Object (MLST).

ekim wrote:

Forgot to mention that just the page could be marked as nofollow, which isn't
done either.

robchur wrote:

So what exactly is it you want addressed? What did you do, and what did you
expect to happen? What happened, and how did it differ from what you expected?
Ergo, what would you like changed?

Neither of the comments above are particularly useful to us.

ekim wrote:

All history and diff pages should be marked nofollow as well as noindex, no
matter what the setting of wgNoFollowLinks.

avarab wrote:

(In reply to comment #3)

All history and diff pages should be marked nofollow as well as noindex, no
matter what the setting of wgNoFollowLinks.

Why should the $wgNoFollowLinks setting not matter for whether or not they got
nofollow?

ekim wrote:

I thought the history pages were not meant for external display, but were just
for internal reference and vandalism fixes. If you don't set nofollow, search
engines will go to the history page and follow the link, which lets the spammers
have an effect even on an actively maintained wiki.

More stuff here:

http://wiki.chongqed.org/NoIndexHistory

Changed bug summary to reflect the part that will make a
difference. :)

ekim wrote:

Sorry, don't have CVS here. Hopefully it's not too much of a pain.

Article.php:
758c758

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

1384c1384

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

1429c1429

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

1470c1470

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

1505c1505

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

DifferenceEngine.php:
131c131

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

253c253

< $wgOut->setRobotpolicy( 'noindex,follow' );

$wgOut->setRobotpolicy( 'noindex,nofollow' );

SpecialPage.php:
249c249
< $wgOut->setRobotpolicy( "noindex,follow"

);

$wgOut->setRobotpolicy( "noindex,nofollo

w" );
364c364

< $wgOut->setRobotPolicy( "noindex,follow" );

$wgOut->setRobotPolicy( "noindex,nofollow" );
  1. use 'diff -u' to create a unified diff; this includes context information vital to applying

to a different version

  1. attach the file, don't paste it into a comment

gangleri wrote:

*note*
Many spamed pages are reverted and the spam links are still in the old
revisions. One more reason to have "noindex,nofollow" for old revisions.

ekim wrote:

Diff file changing to nofollow

Attached:

avarab wrote:

Hardcoding this seems like the wrong way to do it, Wikimedia sites don't have
their histories indexed because we use a special prefix for them and simply
disallow that in robots.txt. Some wikis might want search engines to be able to
index their old revisions so either:

  • Move your histories/other action!=view stuff off to a special prefix
  • Provide a patch which doesn't hardcode nofollow on something some wikis might

want search engines to follow

ekim wrote:

An option to allow it, then? I thought part of the point of doing nofollow was
to prevent spam from working on wikis set up by non-technical people, who don't
want to mess with that stuff.

Hardcoding is correct; there's no real reason to have those marked with 'follow', and it
could cause infinite loops etc.

This seems to have been done for old page views a while ago.
Added the rest (diffs, some special pages) in r14628.