Page MenuHomePhabricator

Flagged bots to have exeption from spamlist
Closed, DeclinedPublic

Description

The spamblocklist was created to combat spambots and people who add spam. If a page contains a spamlink users editing that page will be prevented from doing so. Admins are given an exception to this, which is fine. Bot flagged users (aka machines) should also be given the same exception.

When a bot preforming a routine task this protection gets in the way. It is not like bot flagged accounts will add spam links and if they do they will not only be promptly blocked but also loose their flag. I think we can easily trust bot flagged accounts.


Version: unspecified
Severity: enhancement

Details

Reference
bz13706

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:03 PM
bzimport set Reference to bz13706.
bzimport added a subscriber: Unknown Object (MLST).

Are admins exempt? I can't save pages that contain blacklisted URLs on wikis where I am an admin.

AFAI've experienced, admins are not exempt from this. Thus, there's no reason for bots to be exempt.

On the other hand, I think both should be exempt (and also given a warning about spamlink's presence in the page) for obvious reasons.

There is no reason why a flagged interwiki bot should care about spam urls on pages. By the very nature of spam urls a human should review and decide weather or not to remove them.

I can understand a reason for exempting bots from spam filters.

  1. Vandal inserts spam url into a pile of pages.
  2. Admin adds the url to the blacklist (doesn't get rid of all the spam urls right away).
  3. Bot tries to edit the page to do some maintenance to the page completely independent of the spam.
  4. Bot receives a error page it may not be able to handle and either fails to do important maintenance, or crashes.

A bot editing a page which already has a Spam URL shouldn't crash just because a vandal already placed a url in the page. Could potentially break archiving bots or other ones which run actively to do cleanup or even revert unrelated vandalism.

A bot that crashes because an edit was blocked is a broken bot that needs to be fixed. It's perfectly normal and expected for some edits to fail:

  • the page may be protected
  • the account may be blocked
  • there may be transitory errors on the server
  • etc

Normal workflow would be for the bot to log that the edit failed and/or try again later.

No the bot does not crash. It is denied the edit continuously for something that has nothing to do with the bot.

If the bots normal work flow is to
*...rename or remove images
*...add/remove/replace interwikilinks
*...any other non-controversial maintenance task

there is no reason to prevent it from making such edits just because

*...a vandal added a spam link some few years ago
*...an admin accidentally added wrong entry to the spam blocklist

Additionally bots editing protected pages is addressed at Bug #13137 separately.

If the intention is to find all pages containing a link that matches the spamlist regex strings, that is a seperate issue and can be a seperate bugzilla entry.

If there's no reason for it to affect a bot, why should it affect anyone? The hit allows the issue to be found and cleaned up.

This proposed special case makes no sense and will not ever be implemented.

RESOLVED WONTFIX.

Yes, it is true that if a bot shouldn't be affected by something, then a user shouldn't either.

However there is a difference.

When encountering a spam block, a bot does not know what to do with it. Bots are not programmed to automatically remove spam links when they are blocked, and never should be programmed in that way. Removing spam links is a human job as it is a human's job to discern whether the link should be removed, or if the link actually belongs and the spam entry should be revised (such as a spam block which is to broad in range and blocks good sites).

So, while both affected, the human is able to fix the situation, however the bot is not able to do that. Because it cannot do that, it either halts, or skips something it should be doing. And not all bots have someone actively watching them for when they halt.

Splarka noted that bots could comment out links rather than remove them.
That resolution could be alright.

Though my initial thought about the bug (misread the 'Flagged' part) was that rather than an explicit thing, this would actually be a 'spamexempt' flag similar to the 'editprotected' flag where a wiki could enable it if they feel that there is a reason for it to be enabled.

I think it is extremely unreasonable to expect each automated bot script to contain a "comment out" mechanism for spam urls.

Bots are not people. Bots are only allowed to deal with a specific task. I cannot make my bot preform a task just because I feel like it, bot policies do not allow such a thing. Each bot script is expected and required to do something very specific. The code is expected and required NOT to do anything else. This issue is getting in the way of commons deletions and image renames, interwikilinking and other tasks that are UNRELATED to spamlists.

There are also legitemate reasons to add spam urls to wikipedia pages. For example in a talk page discussion about a spambot attack users may choose to list and discuss spam urls the spambot(s) are adding. These urls will eventually make their way to the spamlist.

Same talk page or talk archive may contain an image that needs renaming or removing. You are saying the bot should be banned from replacing an image or adding an interwiki link simply because the page contains a spamlink.

If your bot can't handle that some of its edits may be rejected, then it's completely unsuitable for use on a wiki. Pages may be protected, databases may be locked, IPs may be blocked -- that's the normal state of things.

These are *ALL* beyond the control of the bot. A bot that's not completely unsuitable for use will simply log the pages it can't make edits to, and the human can deal with them later.

Discussion is ended.

*** Bug 14691 has been marked as a duplicate of this bug. ***