Page MenuHomePhabricator

Proposal to pinpoint almost all vandalism
Closed, ResolvedPublic

Description

Author: k.xue

Description:
This proposal has already been introduced to the Village Pump where it has been
unanimously supported. http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals)
#Edit_Alerts_Based_on_Content

I ask if the recent changes system could be overhauled to incorporate pattern
reocognition based on the very predictable attributes of vandalism. A pattern match
would result in the recent change appearing highlighted in red under the recent changes
section.

Some patterns the software should watch for:

*Edits within content namespace containing high frequency vandal words: Gay, fuck, shit,
penis, cock, fag, SUCKS, etc. The entire corpus of the vandal word bank will be short and
enumerated without much research...
*Exclamation marks are repeated in sequence more than three times
*Blanking
*Entire sub-headed area blanked
*Large article loses very substantial percentage of content

Anyone familiar with controlling vandals will know this short list can be reduced and
still strike at almost all malicious vandalism.

I know you programmers are overtasked and overrequested but I would like to impress how
this is a top priority update. Regular contributers are plagued by this puerile trash,
and it's very demoralizing to find ourselves volunteering to clean up after foul kids.
This is a problem that debilitates Wikipedia's editor corps. The pressure and sometimes
hopeless feeling of being overwhelmed is made real by the current lack of almost any
vandal countermeasure. The current system requires omniscience to be successful and that
we understand, considering the number of edits, is an eyeball outnumbered task. Software
aided cleanup will lift this pressure.

Worse of all, I don't think within our community a honest assessment of how successful
vandalism is is discussed. The generally positive media reports about us sometimes cite
the ancient IBM study or state themselves that vandalism is often rolled back within
minutes. This is infrequently true still but anyone familiar with reverting vandalism
has observed response times of 90 minutes, 3 hours, 6 hours, and beyond are more common.
Our efficiency in squashing adolescent behavior is not as potent as we are billed to be
or perhaps believe about ourselves. The large gaps in response time to vandalim are a
serious problem for information and brand integrity.

Please consider the massive benefit in performance and uplifting effect of implementing
an idea like this.

Regards,
Lotsofissues


Version: unspecified
Severity: enhancement

Details

Reference
bz2112

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:30 PM
bzimport set Reference to bz2112.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

.. system could be overhauled to incorporate pattern recognition based on

attributes of vandalism.

... e.g. Bayes-filer rules [5]

Some patterns the software should watch for:
*Edits within content namespace containing high frequency vandal words: xxx.

The entire corpus of the vandal word bank will be short and

enumerated without much research...
*Exclamation marks are repeated in sequence more than three times
*Blanking
*Entire sub-headed area blanked
*Large article loses very substantial percentage of content

Your proposal can be _combined_ with [1]:

"E-mail notification for page changes or new pages,
where title or body or category matches a regular expression"

on which I am working.
The current Enotif [2, 3] already allows for notifications on all new pages (for
Sysops etc.) and can be extended with your "suspicious page vandal action watch"
list. I'll put this onto the to-do list [4]

[1] http://bugzilla.wikipedia.org/show_bug.cgi?id=1116
[2] http://meta.wikipedia.org/wiki/Enotif
[3] http://bugzilla.wikipedia.org/show_bug.cgi?id=454
[4] http://meta.wikimedia.org/wiki/Email_notification_to-do_list
[5] http://en.wikipedia.org/wiki/Bayesian_filtering

artslave wrote:

Is this a duplicate of bug 958?

avarab wrote:

It is indeed, marking it as a duplicate.

*** This bug has been marked as a duplicate of 958 ***