Page MenuHomePhabricator

Rename Spam blacklist to "Disallowed websites"
Closed, DeclinedPublic

Description

Author: mike.lifeguard+bugs

Description:
Discussion in many places, overwhelmingly in favour. For example, on otrs-en-l starting here: https://lists.wikimedia.org/mailman/private/otrs-en-l/2008-June/004293.html

Or earlier: https://lists.wikimedia.org/mailman/htdig/otrs-en-l/2007-April/000893.html


Version: unspecified
Severity: normal
See also: T16281: Rename "Bad image list" to something better

Details

Reference
bz14719

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:15 PM
bzimport added a project: SpamBlacklist.
bzimport set Reference to bz14719.
bzimport added a subscriber: Unknown Object (MLST).

herbythyme wrote:

This really is something that is overdue. It causes considerable annoyance to
those who are listed which in turn means that the volunteers frequently have to
put up with abuse.

Ultimately these list are not about "spam". They are about links which the
community deem excessive/unnecessary. This (& the Mediawiki equivalent pages)
should be changed as soon as possible.

Thanks

Beetstra.wiki wrote:

I would also urge the renaming to take place ASAP. Thanks.

mike.lifeguard+bugs wrote:

Yes, this was intended to apply to the local and global blacklists.

With regards to the global blacklist on meta, I think this should be done in a way that wikis using our blacklist may still do so. The configuration
$wgSpamBlacklistFiles = array(

		"http://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1", // Wikimedia's list

);
should still work even though the real page is located somewhere else.

This may also be done in such a way that multiple blacklists can be collected together such that if bug 14322 is implemented, then wikis using the meta blacklist can still get it with the above configuration.

This would allow backward-compatibility (important primarily for third-party users), even though the list may actually be located in a different location, or several different locations.

herbythyme wrote:

For a view on the impact this has on those whose sites are listed & therefore on the volunteers who have to deal with it please see this. http://meta.wikimedia.org/w/index.php?title=Talk%3ASpam_blacklist&diff=1077313&oldid=1076452

It would seem wise to deal with this as soon as possible. Thanks

mike.lifeguard+bugs wrote:

"Disallowed websites" was suggested as a new name, and seems acceptable to all, however either of "URL exclusion list" or "URL Blacklist" would still be acceptable.

guy.chapman wrote:

+1 for this. I am an OTRS and enWiki admin, plus an OTRS volunteer. There are numerous cases where sites are justly and entirely uncontroversially added to the blacklist to control abuse, but where doing so must necessarily label the site as spam. Some scenarios:

  • A site owner is not responsible for the spamming of his site, but the site has been massively and inappropriately linked. The site owner asserts that he is not a spammer, but the site meets our local definition of spam.
  • URL shorteners, legitimate and useful sites, nonetheless tagged as spam because we cannot restrict their use to circumvent the blacklist without tagging them as spam.
  • Sites which systematically violate copyright and are locally or meta blacklisted to control legal liability. Many different users may be adding these sites, they are not being spammed per the normal external definition but blacklisting is essentially uncontroversial.

The list needs to be renamed. I understand that there is resistance on the basis that this might imply a change in policy, but the reality is that policy and consensus already allow for blacklisting of sites which may not meet the usual real-world definition of spam, even if it meets our own internal definitions. Nobody is proposing any change to the criteria, only the removal of a word which carries stigma in order to head off a steady stream of complaints from site owners. ~~~~

bastique.bz wrote:

According to Brion (sitting beside in Heathrow Airport), renaming it is not the most important change, although it is important. he will be addressing this soon, but probably not until after Wikimania :)

jwales wrote:

+1 This needs to be fixed. Decisions about what links to include in Wikipedia are rightfully in the hands of the community, and we need to be able to use this list in a way that is consistent with our policies. Calling something "spam" is not something that many community members are comfortable doing, even when something is spam, because we try not to engage in that kind of personal attack. Additionally, as outlined by Guy Chapman above, we use this to cover cases that meet our local definition of spam, but which might not be viewed as spam by the website itself. Insulting them while also taking away their web traffic from Wikipedia is not right.

wkroyfokker wrote:

I think the name itself will clear out many of the conflicts we have. It also allows to show the real meaning of this page in such a way that even sysops can understand much better what can be added or not.

As an enwiki admin and an OTRS volunteer, I also concur with the rename. I am partial to "Disallowed websites" option for both irrelevant personal reasons (see: http://meta.wikimedia.org/w/index.php?title=Talk%3ASpam_blacklist&diff=1079547&oldid=1079546 ) as well as the more important reasons listed above by Guy and Jimbo.

cometstyles wrote:

I hate the name though since in the long run it will sound a bit silly "URL Blacklist" or "Excluded URL list" might be better though the word "spam" really needs to be removed....

Per comment 7 assigned to brion.

As for some other comments: your point is clear. Please use the 'vote' option in bugzilla, if this could replace a "+1".

mike.lifeguard+bugs wrote:

This would be superseded by bug 4459/bug 13811 - using a special page for the blacklist rather than a wiki page.

lastword wrote:

Bump. This issue hasn't gone away in the last 18 months; any chance of progress?

IMO this request has always been a 'can't see the forest for the trees' thing. The primary purpose of a link blacklist always was, and always will be, to reduce link spam activity by preventing linking to sites known to have been used in link spam.

It probably makes a lot more sense to step back and think about what this thing is for and how it works.

What's actually the problem? I think it's simply poor communication: a certain fraction of sites that are being blacklisted are edge cases where folks are trying to prevent some very particular kind of abuse, but there's no good way to explain to an editor how the blacklist entry got there or whether how they're making use of the link is actually related to that abuse pattern or not.

Changing the name doesn't solve that in any way. It'll be just as frustrating when the link you thought was just fine is on a "disallowed website list" with a poor audit trail that's very hard to get out of as when it was on a "spam blacklist" with a poor audit trail that's very hard to get out of.

I'd recommend ripping out the current "giant list of regexes" and use some actual data structures to record the blacklist entries, as we do for the more heavyweight but flexible AbuseFilter.

This brings several clear benefits:

  • Information about the origin and history of each blacklist entry will be available:
    • when was it blocked and by whom? who can I talk to about getting it undone?
    • what was their reasoning? do other people agree with it?
    • does the particular issue that triggered the ban still apply? if we can see what it was, we might be able to find out and get it resolved!
  • the ability to treat different cases differently:
    • Legitimate URL redirectors don't need to be disallowed entirely... Redirectors are a common part of today's web ecosystem, and continuing to ban them is just laziness that hurts our users.

The 'engineer's concern' that if we only paid attention to the final redirect target, an evil site could evade a blacklist by changing its redirect targets is tractable by 1) checking both original and target URLs and 2) *marking known good and known abusive redirector sites*. Why should we blacklist every bit.ly or whatever URL when we know they're consistent and a lookup of the redirect won't magically change to a spam/virus link?

  • Sites that are blacklisted for abusive/annoying/legal issues during a particular event or in a particular area can actually be marked with details about the event or area. A short-term issue probably doesn't need a permanent block.
  • A hard block isn't always really needed; marking pages for review when a slightly-sketchy or sometimes-rude-and-attackish link gets added is probably nicer on everyone than just preventing linking and requiring an administrator escalation to resolve a legitimate case.

and of course:

  • an actual user interface for creating and testing entries will reduce administrator errors that accidentally blacklist the wrong sites.

Editing a giant page of regexes is just asking for trouble, let's be honest. It's fragile and easy to break -- while we are able to detect that a regex doesn't compile and skip it, a regex that compiles but matches things you didn't think it would can be even more disruptive.

mike.lifeguard+bugs wrote:

(In reply to comment #15)

I'd recommend ripping out the current "giant list of regexes" and use some
actual data structures to record the blacklist entries, as we do for the more
heavyweight but flexible AbuseFilter.
...
Editing a giant page of regexes is just asking for trouble, let's be honest.
It's fragile and easy to break -- while we are able to detect that a regex
doesn't compile and skip it, a regex that compiles but matches things you
didn't think it would can be even more disruptive.

Some thoughts about requirements are available at http://www.mediawiki.org/wiki/Regex-based_blacklist as well. Thank god someone is taking this seriously.

Should this be closed as a WONTFIX and point to bug 16717 & bug 4459 for resolving the larger problems here?

guy.chapman wrote:

(In reply to comment #15)

IMO this request has always been a 'can't see the forest for the trees' thing.
The primary purpose of a link blacklist always was, and always will be, to
reduce link spam activity by preventing linking to sites known to have been
used in link spam.

This is true up to a point, however :

Changing the name doesn't solve that in any way. It'll be just as frustrating
when the link you thought was just fine is on a "disallowed website list" with
a poor audit trail that's very hard to get out of as when it was on a "spam
blacklist" with a poor audit trail that's very hard to get out of.

It won't change the behaviour but it will remove one source of complaints. Spamming has a particular and unwholesome meaning. What we call link spamming, which is unambiguously abusive, is not the same as spamming (sending unsolicited email) and may in fact be the result of actions by someone other than the owner of a given domain. So the *name* of the blacklist is inherently an issue.

Some of the complaints are of course vexatious, but not all. And yes, not seeing the list in clear text would be a partial fix but the discussions of the issue will still be under "spam-foo" (which, incidentally, we really ought to fix since it requires no technical change).

On the subject of redirectors, some sites allow the redirection to be changed. There's every reason not to use redirectors within our projects, not least because when someone hovers over a link they should see the domain they are going to.

I completely agree about the technical issues of the blacklist interface, though.

Nemo_bis claimed this task.
Nemo_bis subscribed.

Closing per Brion. Request specific to a very narrow usecase of one wiki.

Nemo_bis set Security to None.

Closing this as "declined" seems to be plainly wrong, as this is a problem at several projects. Change the name and remove the cause of the disputes.

See also T173080: Replace words "Blacklist" by "Denylist" and "Whitelist" by "Allowlist" and T190521: Change name of spam-blacklist/spam-whitelist to link-blacklist/link-whitelist