Page MenuHomePhabricator

Add option to apply nofollow only to external links added in revisions marked unpatrolled
Open, LowPublicFeature

Description

In accordance with Google's suggestion to use or not use nofollow depending on trustworthiness of the user or edit ( http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569 ), this is a proposal to add an option (turned on or off by a new configuration setting) to apply nofollow only to external links added in revisions marked unpatrolled.

Is there any reason why this shouldn't be implemented as a core feature? I'm thinking, a boolean el_patrolled field should be added to externallinks. The value will be 1 if (a) the URL was added by an autoconfirmed user, or (b) the URL was added by a newbie/anon, but that URL was already in the table as patrolled on another page. Otherwise, the value will be 0, and nofollow will apply to the link.

The value will be switched to 1 as soon as any page is patrolled that contains that URL, and then nofollow will NOT apply to that link anymore. I suppose caches would need to be cleared accordingly.

I don't think the spam whitelist should be used in determining whether nofollow will be applied to a link, because even whitelisted domains are susceptible to being used for spamming. E.g., a person can link to a spammy page he added to a mostly legitimate website. Right now, nofollow is generally used for all external links (since $wgNoFollowLinks defaults to true) so this would still be a significant improvement in the precision of this anti-spam method.


Version: 1.21.x
Severity: enhancement

Details

Reference
bz42599

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:09 AM
bzimport set Reference to bz42599.
bzimport added a subscriber: Unknown Object (MLST).

Does the premise of this proposal, that most spamlinks are in unpatrolled revisions, seem sound? I think this might work well especially if a beefed-up autopromote scheme (e.g. one that doesn't count edits outside of content namespaces toward one's edit count) were to be implemented; https://www.mediawiki.org/wiki/Extension:EnhancedAutopromote could be revised to make that possible. Also, perhaps this could be used in conjunction with $wgNoFollowNsExceptions to apply nofollow to non-content namespaces, since spammy links in userspace and so on might have a greater tendency to go unreverted.

I think there should be a configuration setting to disable this feature if wiki system administrators want to just apply nofollow or dofollow to ALL external links (as per the status quo) depending on the value of $wgNoFollowLinks.

Also, I guess if they've set $wgUseRCPatrol to false, then this feature should automatically disable itself.

(In reply to comment #3)

Also, I guess if they've set $wgUseRCPatrol to false, then this feature
should
automatically disable itself.

Yes; and sysadmins would be interested in this only if their wiki has a strict definition of what's patrollable which matches the assumptions here. Patrolling however would cause reparsing, no idea what consequences this has.

I doubt the impact of reparsing on performance would be all that major (depending on what you consider major), since typically most users' edits autopatrol anyway. The anons and newbies' edits have usually been a relatively small proportion of the edits on the wikis I've seen. Comparing https://en.wikipedia.org/w/api.php?action=query&list=logevents&letype=patrol&lelimit=500 to https://en.wikipedia.org/w/api.php?action=query&list=recentchanges&rctype=edit&rclimit=500 , if you let the following variables be thus:
*A: Timestamp of most recent patrol event
*B: Timestamp of 500th most recent patrol event
*C: Timestamp of most recent edit
*D: Timestamp of 500th most recent edit

A - B = E
C - D = F

E / F ~ 16. So in other words, patrol activity is about 1/16th as heavy as editing activity. Hmm, would that be a dealbreaker for WMF to do that much reparsing?

On the other hand, wouldn't patrol actions only require reparsing if the anon/newbie added a new external link? I can do an analysis to find out how often that occurs, if the information would be helpful.

(In reply to comment #4)

Yes; and sysadmins would be interested in this only if their wiki has a
strict
definition of what's patrollable which matches the assumptions here.

We could add a hook that lets extensions implement other methods for designating external links as patrolled.

A downside to using this option is that good external links wouldn't have the nofollow applied until patrolled. I suspect that big sites like Wikipedia will reject this option because they'll worry about stealth spammers marking spammy external links as patrolled (whether by autopatrol or as part of a tag team that includes a patroller). I suspect that small sites with little or no spam will prefer to just keep $wgNoFollowLinks set to false, since that allows them to reap any advantages of dofollow as quickly as possible (i.e. without waiting for pages to be patrolled). https://www.mediawiki.org/wiki/Manual:Costs_and_benefits_of_using_nofollow I dunno if any mid-sized sites would want this option; if so, please post a comment, so that can be taken into consideration in deciding whether it's worth the effort of implementing. Thanks.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:14 AM
Aklapper removed subscribers: Kosikfl, leucosticte.