Page MenuHomePhabricator

Log spam blacklist hits
Closed, ResolvedPublic

Description

Author: silsor

Description:
It would be extremely useful to have a log of every time the spam blacklist blocks
an edit, with the URL that was blocked.

This would pin down problems on all the different wikis with filters that are too
broad.

This would also tell us whether our filters are actively blocking spam, and how
often different spams are attempted, and where.

This would also help us know when to expire filters from the spam blacklist (say,
after six months or a year of inactivity).


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63086

Details

Reference
bz1542
TitleReferenceAuthorSource BranchDest Branch
Revert rPHDEP024c5b0d1d05cbac148128eb0f77df7ac578220brepos/phabricator/deployment!36aklapperT354277wmf/stable
Rebuild DataHub 0.10.4 to pick up new JRE imagesrepos/data-engineering/datahub!5btullisrebuild_containersmain
Notebook explaining an approach to access MariDB replicas from the Hadoop Cluster.repos/data-engineering/dumps/mediawiki-content-dump!22xcollazoadd-mariadb-notebookmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:14 PM
bzimport added a project: SpamBlacklist.
bzimport set Reference to bz1542.

silsor wrote:

A recent oops with the spam blacklist revealed that spambots are hammering [[PHP]] on en at
least (the blacklist was temporarily not available and the article was spammed within six
minutes). IP checks also indicate that these attacks are being done using zombie machines
(various ISP addresses, including an AOL IP, varying often). If this log were added it might
consume a fair bit of disk.

robchur wrote:

Matches *are* logged, in the standard debug log. I suppose we could introduce some sort of aggregation table to facilitate reports on most-spammed domains and other things like that.

mike.lifeguard+bugs wrote:

Can it be made available on the toolserver? We would probably want to run queries on so much data anyways.

And what about the [[MediaWiki:Titleblacklist]]? Could we have a log for it?

Is there any bug about it?

Helder

mike.lifeguard+bugs wrote:

(In reply to comment #4)

And what about the [[MediaWiki:Titleblacklist]]? Could we have a log for it?

Is there any bug about it?

See bug 21206

Huhu, if the data are available somewhere, it would be kind if someone links to it from here and closes this bug.

It would be really a great help to have such a log (similar to the log of the abuse filter/edit filter).
Especially for coping with unblocking requests a log would be a great tool.

I think the approach should be to log matches using abusefilter extension if that one is loaded.

(In reply to comment #9)

The abusefilter is not a good replacement for the spam blacklist. Both tools are important and they complement one another. I guess, if we moved all SBL entries to AF. This would slow down the whole thing. So there should be a individual log for the SBL for _all_ entries. This would be a great help in reducing the length of the very long lists at meta and w:en. And it would help in coping with blacklist removal requests.

Related URL: https://gerrit.wikimedia.org/r/69303 (Gerrit Change I7b1ee2b3bb02b693d695bf66d157e2c33526c919)

I've uploaded a patchset that implements logging to the standard Special:Log. I think using AF logging is a bit overkill, since all you really care about here is the link that is being added.

That sounds great! :-)
When will this be active in w:de?

(In reply to comment #13)

That sounds great! :-)
When will this be active in w:de?

We don't know which version it will be in until the change gets merged. Once that's done you can check https://www.mediawiki.org/wiki/MediaWiki_1.22/Roadmap

Change 69303 merged by jenkins-bot:
Log blacklist hits to Special:Log

https://gerrit.wikimedia.org/r/69303

Marking this as fixed since the patch has been merged.

https://gerrit.wikimedia.org/r/#/c/83353/ is for enabling this on WMF sites.

Searching all prevented additions for a given domain is what this request was about. But as far as I can see (e.g. at [https://en.wikipedia.org/wiki/Special:Log/spamblacklist]), it's still not possible to search for a given url or at least a domain, right?

oops, sorry, didn't see, that it's not yet available...

Searching for a specific domain will not be possible due to how the logging data is stored, however it would be trivial (I already started working on one) to write a toolserver/labs tool that allowed such searches.

(In reply to comment #17)

Searching all prevented additions for a given domain is what this request was
about. But as far as I can see (e.g. at
[https://en.wikipedia.org/wiki/Special:Log/spamblacklist]), it's still not
possible to search for a given url or at least a domain, right?

I believe the script [[commons:MediaWiki:Gadget-rightsfilter.js]] makes it possible to filter the SPAM logs to find those which refer to a specific URL.

@Legoktm: The "spamblacklistlog" user right was given only to sysops instead of everyone. What is the motivation for this? Is there any thing bad which could happen if I compile a list of URLs which were blocked and publicized it?

@Legoktm: The "spamblacklistlog" user right was given only to sysops instead of everyone. What is the motivation for this? Is there any thing bad which could happen if I compile a list of URLs which were blocked and publicized it?

At the time I didn't think it would be a great idea of having a list of spammed URLs publicly available...but I don't believe that now, it should probably be at the same visibility as the AbuseLog really. Publishing lists seem fine, iirc it's also replicated to labs.