Page MenuHomePhabricator

DBQ-178 Statistics about abuse filters on all Wikimedia projects
Closed, ResolvedPublic

Description

This issue was converted from https://jira.toolserver.org/browse/DBQ-178.
Summary: Statistics about abuse filters on all Wikimedia projects
Issue type: Task - A task that needs to be done.
Priority: Major
Status: Done
Assignee: Hoo man <hoo@online.de>


From: Federico Leva <federicoleva@tiscali.it>

Date: Sat, 10 Mar 2012 10:10:56

Goal: create an automatically updated concise table of the existing abusefilters on all Wikimedia projects and of their effects, to find excessive or malfunctioning filters and help local communities fix them.

Mean: DB query to get the list of active filters on each wiki and the percentage of edits triggering a filter, by comparing the recent changes table and the abusefilter log; Krinkle has offered to run such a query and use it to feed a web tool, but we need someone to write it.

Rationale: the AF API doesn't offer statistics on filters, nor global nor individual, and the umber of hits for each filter is hidden if the filter is private, so the DB needs to be accessed directly.

Details of the request: can vary depending on how difficult it is; I suggest to start from the basics. The output should be in a sensible machine-friendly format for subsequent usage by a tool.

  1. Number of filters (don't output anything for wikis which have none).
  2. Percentage of edits performed (in RC table) + edit disallowed (abuselog) which triggered a filter (abuselog) in the last n days/table entries (at least some thousands edits and a couple weeks should be considered for the stats to be meaningful; I leave it to the performance consideration by you).
  3. List of users who created filters (better), or: whether the filter has been created by a local sysop or by a steward/global sysop.

If possible:

  1. In addition to 1-3, a row for each filter with its (2) and (3).

Probably only in the future:

  1. More details on each filter in (4), starting with title, action taken (only tag/warn/etc. or also disallow/block?) and view permission (private or not).

Version: unspecified
Severity: major

Details

Reference
bz59453

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 2:32 AM
bzimport set Reference to bz59453.

From: Hoo man <hoo@online.de>

Date: Mon, 02 Apr 2012 13:43:36

I've resolved the queries 1-4, number 5 would be quite difficult, cause private filters aren't replicated to the toolserver as it seems (but they are in the _log and _history tables, so the below includes them as well).

http://toolserver.org/~hoo/dbq/dbq-178.txt

Notes:
1: All filters are counted, no matter whether they are deleted or disabled or anything
2: Over here the number of abusefilter log entries during the period gets divided by all edits (all from the recentchanges table and all that failed due to a filter). (The numbers have to be multiplied with 100 to get percent values, they differ from the values seen on wiki, probably cause over here much more edits are included in the calculation)
3: This one gives the user name and the number of filter(s) the user created
4: This one gets the filter id, it's creator and how often the filter has been triggered divided by the number of all edits (including edits prohibited by other edit filters which could slightly bias the number, but shouldn't make a hug difference). Over here the hit rate needs to be multiplied with 100 as well to get percent values.

I hope that's fine and that the results are correct, which I can't fully verify as on wiki much smaller periods are used for calculating.

This bug was imported as RESOLVED. The original assignee has therefore not been
set, and the original reporters/responders have not been added as CC, to
prevent bugspam.

If you re-open this bug, please consider adding these people to the CC list:
Original assignee: hoo@online.de
CC list: federicoleva@tiscali.it, hoo@online.de