Page MenuHomePhabricator

Abuse filter: Increase 5% limit to allow filtering for very short posts
Closed, ResolvedPublic

Description

We would like to prevent users from posting feedback that is less than 5 characters long.

Filter editor Sole Soul moved some code from filter 460 to filter 458 to make that happen:
http://en.wikipedia.org/wiki/Special:AbuseFilter/463

However, that filter hit the limit of >5% of actions very quickly, and was automatically disabled (perhaps unfairly,see related Bug 37615).

To address this issue, Oliver and I tried filtering posts with different numbers of characters, but these actions still resulted in the filter being automatically disabled.

We would like to find a solution to this issue, either by changing the Regex script, or by solving the related issue for Bug 37615:
https://bugzilla.wikimedia.org/show_bug.cgi?id=37615

You can read more about abuse filter for article feedback on our feature
requirements page:
http://www.mediawiki.org/wiki/Article_feedback/Version_5/Feature_Requirements#Abuse.2FSpam_Filters


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=37615

Details

Reference
bz40672

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 12:53 AM
bzimport set Reference to bz40672.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

We would like to find a solution to this issue, either by changing the Regex
script, or by solving the related issue for Bug 37615

Fabrice: Who exactly would need to be in for agreement here?

Hi Andre,

I believe we will need to make a modification to the Abuse filter extension to increase the cutoff for disabling articles to a higher value than 5% -- possibly up to 10%.

Right now, filter 458 only disallows posts with 2 characters or less, because it gets automatically disabled if we try 3, 4 or 5 characters. We really want to disallow posts with 5 characters or less ASAP, and ultimately even 10 characters or less. One way to accomplish that is to increase the value for disabling articles.

To quote extension creator Andrew Garrett: "The AbuseFilter has a special mechanism for new filters in which filters that match more than X% of the actions that they are compared against are disabled. It is presumed that any filter that matches more than X% of actions is out of control. The current value of X is 5. In order to determine whether a filter matching more than X% of actions is actually out of control or just unlucky, we need a decent sample size. So the minimum sample size is Y, the variable that we changed from 2 to 25."

The goal would be to have a higher cut-off for feedback than for edits -- so we don't disrupt the current cutoff used for edits, only increase the cutoff for feedback posts …

We are now waiting for Andrew Garrett and Matthias Mullie to offer a recommendation on that point, as well as assess the complexity of this proposed revision.

If we're only talking about a couple hours of development, I think we should do it, so we don't have to keep resetting the filters manually. I suspect that we will need a higher limit anyway before we can deploy AFT5 to 100%.

I suggest to make this configurable per "Filter group".

It makes sense to treat different kinds of "text" (e.g. articles vs feedback) differently.

I've pushed a couple of patches:

The emergency shutdown values for regular article submission would remain unchanged, the values for feedback would become:

  • 10% rather than 5%
  • sample size from 25 to 50

How does that sound?

Thanks, Matthias, this sounds great to me!

Andrew, do these revisions work for you as well?

If so, could you please review them and/or propose edits?

Nicely done!