Page MenuHomePhabricator

Article Feedback 5 - Abuse filter issues for common vandalism
Closed, ResolvedPublic

Description

Please find out why the two vandalism filters now on en-wiki (460 and 461) do not seem to be auto-flagging obscenities like the f*** word. If there is something wrong with the filter code, please make the necessary fixes so that we can start trapping these foul words prior to deployment, using the Auto-flag feature for now.

460 Feedback: Common Vandalism
http://en.wikipedia.org/wiki/Special:AbuseFilter/460

461 Feedback: Vandalism in all caps
http://en.wikipedia.org/wiki/Special:AbuseFilter/461

WE just met with Erik and Howie about this today, and they view this as the highest priority item before we can deploy more widely. Since community editors are not viewing this as a high priority (yet), we agreed that we need to do that work ourselves for now, until they realize the importance of this feature.

Once you have figured out a solution, we recommend that you create a few more filters for trapping obscenities, such as the ones listed in this Google doc.
https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0AiGAdIp7VYlbdDdKUm9naXhxOXVweWZ5YkU3Wk5lSlE#gid=0

Here are a few candidates to consider, which are getting a lot of hits on en-wiki:
384 Addition of bad words
135 Repeating characters
12 Replacing a page with obscenities
189 BLP vandalism or libel

The goal of this assignment is to do what it takes to auto-flag (or disallow) as many known obscenities as possible from the feedback stream for the next deployment, scheduled for June 19.

Read more about our plan for this abuse filter feature on our feature requirements page:
http://www.mediawiki.org/wiki/Article_feedback/Version_5/Feature_Requirements#Abuse.2FSpam_Filters

Thank you!


Version: unspecified
Severity: major
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=37579

Details

Reference
bz37615

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 12:25 AM
bzimport set Reference to bz37615.

reha wrote:

"Common Vandalism" only checks for three repeats of the word, which is why it's getting so few hits (for both edits and AFT).

I've created the following *logging only* filters. If they look to be performing well, we can turn on actions for them:

http://en.wikipedia.org/wiki/Special:AbuseFilter/472 -- Addition of bad words
http://en.wikipedia.org/wiki/Special:AbuseFilter/473 -- Repeating characters
http://en.wikipedia.org/wiki/Special:AbuseFilter/474 -- Comment with only obscenities (version of "Replacing a page with obscenities")
http://en.wikipedia.org/wiki/Special:AbuseFilter/475 -- Vandalism or libel (version of "BLP vandalism or libel"; note that this may be unusable due to false positives, as we don't have any category information)

I just set filters 460, 461 and 472 to disallow, rather than auto-flag, because none of these obscenities seem appropriate to have in an article feedback comment. Now that we are at 10%, let's try to re-engage filter editors to help improve on this short list of abuse filters.

My apologies for causing a problem with the abuse filters, as reported on Village Pump [http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#WP:Article_Feedback_Tool.2FVersion_5] and on my talk page [http://en.wikipedia.org/wiki/User_talk:Fabrice_Florin#Edit_filter_changes].

My intention had been to modify four abuse filters we use for the Article Feedback tool, by having them 'disallow' the use of obscenities, rather than just 'auto-flag' them as abuse. While I was at it, I attempted to remove any word that was not clearly an obscenity, which is what caused the error on [[Special:AbuseFilter/474|filter 474]]. Many thanks to [[User:Someguy1221|Someguy1221]] for reverting that change, I really appreciate it! (And thanks to [[User:This, that and the other|This, that]] for explaining this was due to a '/b' character in the script.

I just checked the other three filters (460, 461 and 472), and get a "No syntax errors detected" message, so they appear to be working as intended. Still, it would be wonderful if an experienced filter editor could take a look at these three filters, to see if we could improve them in any way (see our [https://bugzilla.wikimedia.org/show_bug.cgi?id=37615 Bugzilla ticket] for this issue).

Our community liaison, [[User:Okeyes (WMF)|Okeyes (WMF)]], will reach out to filter editors in coming days to invite them to help create more abuse filters for [[WP:AFT5|Article Feedback Version 5]] (AFT5). This new feature is now being deployed widely on the English Encyclopedia, to invite readers to contribute productively on Wikipedia, and we could really use some help to filter inappropriate words.

Again, I am very sorry for making this mistake, and deeply grateful to you all for reporting it and taking immediate action. Going forward, I will let experienced editors make changes to the edit filter scripts.

I am elevating this bug to the highest priority, because the abuse filter in its latest implementation is still not correctly filtering out swear words properly, and this is having a negative impact on the editor workload.

Reaper Eternal, an experienced filter editor on Wikipedia, was kind enough to help address this issue by consolidating all vandalism filters into a single filter, #460 - Feedback: Common Vandalism:

http://en.wikipedia.org/wiki/Special:AbuseFilter/460

However, this consolidation created an other problem, because the amount of posts containing swear words was so high that this filter was automatically disabled as a safety measure (matching more than 5% of actions).

So one possible solution for addressing this issue would be to separate out the swear words into separate filters, so that they would not exceed that limit. This is what filter editor Sole Soul did, by moving the 'new_size <5' condition to a separate filter, as shown in the filter notes below.

"Merging from 461, 472, 473, 474, and 475. --Reaper 2012-08-17

Move the "new_size < 5" condition -one of the most matched conditions- to a separate filter (458). The aim is to make this filter match < 5% of actions. -Sole Soul"

Sorry, forgot to include the second part of my message.

I recommend that we move some of the swear words into different filters, and see if we can reduce the load on this filter #460, so it can be re-enabled. These other filters should be called 'Feedback: "Common Vandalism 2", 3, 4, and so on. Make sure to have them all set to disallow, preventing users from posting them, as we do now for filter 3460.

Please let me know if you need me to categorize the words for you, or prioritize them as needed.

After you have had a chance to study the issue closely, please report back to us if you think we need to make any changes to the Abuse filter extension itself to improve overall performance (without jeopardizing in any way the other uses of this extension).

Thank you!

As previously mentioned, the reason for the common vandalism filter to malfunction is that is had automatically shut of because too many posts (> 5%) were being flagged by the filter. This is a safety precaution built into AbuseFilter to ensure to filter out defective filters.

The solution is relatively straightforward. We can't reduce the amount of crap being submitted, and we shouldn't touch the built-in AbuseFilter safety precautions (or we risk some day blocking edits if a filter malfunctions) - so we should split up the filter into several smaller parts: instead of having 1 filter flag 10% (made the number up, not sure about the actual number) of the feedback, we can have 5 filters flag 2% each.

The original Common Vandalism filter (http://en.wikipedia.org/wiki/Special:AbuseFilter/460) has been split up in 5 smaller filters:

I have not made any changes to what exactly is being filtered, only split it up.

If necessary, we can split this up even further in the future.

The added benefit of splitting this up is that, in the event that one of the filters still meets the threshold, the 4 other filters will still work just fine.

Please test the filters thoroughly (as a logged out user, autoconfirmed users are not checked), though they seem to work just fine to me.

Thanks, Matthias!

I really appreciate your rapid turn-around on this issue.

I just tested all 5 filters in a variety of user modes, and they seem to be doing their job well. All the swear words I entered were disallowed as intended, without apparent performance issues. Nicely done!

I just tested these filters in these different user modes, with these results:

  • anonymous reader (disallows all foul words)
  • registered 'non-autoconfirmed' editor (disallows all foul words)
  • autoconfirmed editor (allows all foul words)

I can confirm that the filter only works when I am logging as a non-auto-confirmed editor, not when I am an auto-confirmed editor. This seems reasonable, since 98% of the feedback comes from anonymous users. But we may need to revisit this rule if we find that we get foul language from auto-confirmed editors.

At this time, none of the filters appear to be hitting the 5% limit for automatic disabling -- and they seem to only add minimal load on the abuse fiter extension (e.g.. for #496: 'Of the last 9,056 actions, this filter has matched 12 (0.13%). On average, its run time is 1.15 ms, and it consumes 215 conditions of the condition limit.').

Also, what does the '(https?|ftp)://)\S{30,}' expression mean in filter 497? Is it trying to remove any link that includes 'https://' ? (didn't seem to work for me).

Thanks again for taking care of this important issue, which we have been wanting to fix for a long time now. If it continues to perform well, I expect that this could make a big change in reducing the amount of inappropriate feedback we get from this tool -- and most importantly, reducing the workload for editors who monitor this feed.

I also want to take this opportunity to thank filter editors like Reaper Eternal and Sole Soul for all their great work and patient advice, which played a key role in solving this issue. Much appreciated, you guys!

So, interesting find.
While following up on the Abuse Filter changes, I noticed _all 5_ had automatically shut off.
It seemed rather odd that all 5 of them would flag >5% of all feedback, so I looked into it a bit more.

Turns out there's another hidden emergency switch: one that disables filters that flag a certain amount of posts in a certain amount of time. In particular: a filter was shut off if it flags more than 2 posts in a 24h-period. We probably already hit that threshold when testing the filters...

This isn't really sensible, so I've pushed a config-change to Gerrit (https://gerrit.wikimedia.org/r/#/c/25855/) to change this threshold as well.
Once that review is reviewed, merged & deployed, a filter will only auto-disable when:

  • it flags >5% of all posts is examines
  • it flags >30 posts in <30 minutes

Once deployed, we should re-save the abuse filters to make them active again.