Page MenuHomePhabricator

Overhaul internal profiling system
Closed, ResolvedPublic

Description

For a extreme example, consider this filter:
https://en.wikipedia.org/wiki/Special:AbuseFilter/473

How can it consumes 23296 conditions if the condition limit is set to 1000?

Why does that value changes every second?
(reloading the page a few times I got: 51172, then 3986, then 7859, 14690, and so on...)

This is very confusing for users trying to debug and optimize the existing filters...

See Also:
T47045: Condition limit outrageously high after filter is disabled
T43691: Abuse filter inconsistent matching between testing and real checking
T57459: Include "condition limit consumed" column in AbuseFilter list


Original comment by Dragons flight on gerrit/201104

The AbuseFilter internal stats collection has been plagued by several bugs
for its entire history. The most prominent bug is caused by a race
condition where competing Apache instances can corrupt the data in
memcache. This is due to distributing stat data across multiple memcache
entries. Other stat corruption is associated with editing filters or
filter runs that abort due to reaching the condition limit.

The modifications here overhaul the stats system to solve these problems.
In particular, all per filter profile data is now hosted in an array
that is stored in memcache as a block so that the different stats
remain consistent. The restructuring also lays the groundwork for future
enhancements to the internal AbuseFilter profiling.

Related Objects

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:52 AM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz51294.
bzimport added a subscriber: Unknown Object (MLST).

I don't know if this is a problem, but I noticed the function getFilterProfile uses three keys:
wfMemcKey( 'abusefilter', 'profile', $filter, 'count' );
wfMemcKey( 'abusefilter', 'profile', $filter, 'total' );
wfMemcKey( 'abusefilter', 'profile-conds', 'total' );

But the resetFilterProfile only resets the first two. Shouldn't it also reset the key profile-conds?

[1] https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FAbuseFilter.git/HEAD/AbuseFilter.class.php#L535

(In reply to comment #0)

For a extreme example, consider this filter:
https://en.wikipedia.org/wiki/Special:AbuseFilter/473

How can it consumes 23296 conditions if the condition limit is set to 1000?

Where exactly can I see that (position in the UI)?

There is a line like this:

Statistics: Of the last 147 actions, this filter has matched 4 (2.72%). On average, its run time is 1.77 ms, and it consumes 13,105 conditions of the condition limit.

(the text comes from [[MediaWiki:abusefilter-edit-status]])

It really doesn't known what a "limit" is: I just saw it saying filter 473 consumes 112,686 conditions (!)

This is really three separate bugs that all lead to per filter stats corruption:

  1. Failure to reset properly when a filter is edited.
  2. Failure to count properly when evaluation is aborted due to the condition limit being reached.
  3. A race condition that causes near simultaneous instances of the AbuseFilter to corrupt each others stats.

The third problem is the dominant one for the highly active wikis and can lead to really silly results. Both per filter condition counts and run times are affected, though the nonsensical condition counts are more noticeable.

I am nearly done with a patch that addresses this bug.

Change 201104 had a related patch set uploaded (by Dragons flight):
Overhaul AbuseFilter internal profiling system

https://gerrit.wikimedia.org/r/201104

  1. A race condition that causes near simultaneous instances of the AbuseFilter to corrupt each others stats.

The third problem is the dominant one for the highly active wikis and can lead to really silly results.

I don't think this can reasonably account for the reported values being wrong by a factor of ten or more.

I'm pretty sure that the number of conditions is counted globally, for all filters together, then treated it as if it was the per-filter number, which obviously results in off-the-wall values. I wonder what @werdna was smoking when committing rSVN52738. ;)

Change 282399 had a related patch set uploaded (by Bartosz Dziewoński):
Mostly unbreak profiling of number of conditions used by filters

https://gerrit.wikimedia.org/r/282399

  1. A race condition that causes near simultaneous instances of the AbuseFilter to corrupt each others stats.

The third problem is the dominant one for the highly active wikis and can lead to really silly results.

I don't think this can reasonably account for the reported values being wrong by a factor of ten or more.

I'm pretty sure that the number of conditions is counted globally, for all filters together, then treated it as if it was the per-filter number, which obviously results in off-the-wall values. I wonder what @werdna was smoking when committing rSVN52738. ;)

The race condition can give rise to very large errors. The condition counts and times are all stored as ratios, e.g. (total time elapsed) / (number of runs recorded). Because the numerator and denominator are in separate variables they are retrieved and saved to memcache separately. Under race conditions it is possible for one instance to update and save the numerator while a different instance is overwriting the denominator, leading to possible states that are out of sync. This is especially severe is one of the instances in the process of reseting the variables. If the denominator gets reset to zero but the numerator gets stuck somewhere else, the ratio blows up leading to large errors.

That's not saying there aren't other profiling errors. It is a mess actually, but getting to reports like 100,000 conditions exhausted is mostly caused by the fact the memcache numerators and denominators can get out of sync with each other.

Most of the memcache calls need to be eliminated for both practical and performance reasons and replaced with a small number of calls using structured data.

Change 282399 merged by jenkins-bot:
Mostly unbreak profiling of number of conditions used by filters

https://gerrit.wikimedia.org/r/282399

If the denominator gets reset to zero but the numerator gets stuck somewhere else, the ratio blows up leading to large errors.

Hmm, I didn't think of this. That is more plausible.

What's the status of this? As far as I'm aware resource/condition tracking was simply removed altogether. Would it make sense to re-title this task to "re-implement resource tracking"?

Ah, interesting. It seems then, that it takes too much computational power to leave this running in its current implementation?

Daimona subscribed.

There are enough reasons to close this:

  • Many wikis don't use such profiling anymore (and it has been changed since the creation of this task)
  • I checked many filters on en.wiki, and every of them reports sane values for used conditions
  • https://gerrit.wikimedia.org/r/282399 actually solved the issue

Reopening. The problem itself is resolved, but we're still waiting for Dragons_flight's patch to give a final fix to profiling system.

Daimona renamed this task from Condition limit is broken: consumes 100,000+ conditions limit is set to 1000 to Overhaul internal profiling system .Apr 4 2018, 3:22 PM
Daimona lowered the priority of this task from High to Medium.

Claiming as I took over the patch.

Of note, I just discovered that ContentTranslation calls checkAllFilters while users are editing, in order -I think- to provide real-time feedback. But checkAllFilters has a call to recordStats, and this means that every time CX checks filters, it increases profiling data. This will be fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/487163/.

Change 489443 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Store per-filter profiling in a single key

https://gerrit.wikimedia.org/r/489443

Of note, I just discovered that ContentTranslation calls checkAllFilters while users are editing, in order -I think- to provide real-time feedback. But checkAllFilters has a call to recordStats, and this means that every time CX checks filters, it increases profiling data. This will be fixed in https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/487163/.

See also

Change 489445 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Move per-filter matches profiling to per-filter data

https://gerrit.wikimedia.org/r/489445

Change 489443 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Store per-filter profiling in a single key

https://gerrit.wikimedia.org/r/489443

Change 489445 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Move per-filter matches profiling to per-filter data

https://gerrit.wikimedia.org/r/489445

Change 201104 had a related patch set uploaded (by Daimona Eaytoy; owner: Dragons flight):
[mediawiki/extensions/AbuseFilter@master] Merge global profiling keys

https://gerrit.wikimedia.org/r/201104

Change 489443 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Store per-filter profiling in a single key

https://gerrit.wikimedia.org/r/489443

Change 489445 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Move per-filter matches profiling to per-filter data

https://gerrit.wikimedia.org/r/489445

Change 201104 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Merge global profiling keys

https://gerrit.wikimedia.org/r/201104

Daimona removed a project: Patch-For-Review.