Page MenuHomePhabricator

API for SpamBlacklist
Closed, ResolvedPublic

Description

Similar to action=titleblacklist (https://en.wikipedia.org/w/api.php?action=titleblacklist&tbtitle=Foo), a simple API to check if the provided url is blacklisted.


Version: unspecified
Severity: enhancement

Details

Reference
bz54441

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:09 AM
bzimport set Reference to bz54441.

Change 85512 had a related patch set uploaded by Jackmcbarn:
Add an API action to test blacklisted URLs

https://gerrit.wikimedia.org/r/85512

Does this cross reference the whitelist?

(In reply to comment #2)

Does this cross reference the whitelist?

The current patch does.

Betacommand TimStarling: to what are you referring to in your last comment on 54441
Betacommand * bug 54441
Elsie Betacommand: There's no stated use-case.
Betacommand I can easily see several
Betacommand reviewing links on a wiki
Betacommand seeing which are and which are not blacklisted
Betacommand and if a domain is blacklisted is a particular link whitelisted
Betacommand and finding what blacklist rules are hitting a given link, (IE finding what caused a link to get caught by the blacklist)
Betacommand I have seen several cases where either an error or minor oversight in a blacklisting caused collateral, and finding the correct rule can be an issue

(In reply to comment #5)

Betacommand reviewing links on a wiki
Betacommand seeing which are and which are not blacklisted

That application is not efficiently supported by the proposed patch. You need a batch lookup, you don't want to be doing hundreds of API queries on a page with hundreds of links.

Could adding a programmatic way to check for blacklisted URLs lead to smarter spam? I think there's a concern that adding this functionality, in batch form or otherwise, would make a wiki more susceptible to abuse. Thoughts?

(In reply to comment #7)

Could adding a programmatic way to check for blacklisted URLs lead to smarter
spam? I think there's a concern that adding this functionality, in batch form
or otherwise, would make a wiki more susceptible to abuse. Thoughts?

Anyone can currently download https://meta.wikimedia.org/wiki/Spam_blacklist and parse it locally which would be much faster than trying to use an API to check if a link is blacklisted.

The only concern I would have is if a a file blacklist is being used, there's a good chance it isn't public. Might be worth automatically disabling the API module if any file blacklist is being used.

Beetstra.wiki wrote:

@MZMcBride - Spammers are inherently smart (they make money with it, that is their drive). We've seen many tricks to try and get around the blacklist. That generally has 2 effects: first, we blacklist the evading stuff without even considering to warn, and indef any accounts involved without discussion, and secondly, delisting of any of the domains will be denied forever - if you really want to continue your abuse to that level and show that much persistence, it is just plain game over. And anyway, this is possible already by, as others suggest as a solution, to just download the lists manually and do the same trick yourself. There are even tricks which one could consider to program into the software (make the software follow links to the endpoint - if it is a redirect site, like tinyurl.com, pointing to a blacklisted domain, block the edit .. etc.)

@Tim - why does the API not do the same as the saving mechanism which checks against the various blacklists/whitelists? Should have the same speed .. Though batch-lookup would be a good option as well (push a whole page through the parser and see what is blacklisted in XML output).

@Legoktm: (pff .. WP:BEANS). You can not avoid that, one could use a locally installed version of the software to do the work for you. Seen a current recurring case of spam, it may even be that some do that type of things, spammers seem to know how to figure out what is not blacklisted already.

What I think the API should provide is something like 'if I send this link/text-with-links through the parser and would try to save it on XX.wikipedia.org, what blacklist (global and local) and whitelist (local) rules would be matched on it?'

Change 85512 merged by jenkins-bot:
Add an API action to test blacklisted URLs

https://gerrit.wikimedia.org/r/85512