Page MenuHomePhabricator

Edits via API (e.g. through VisualEditor) cause AbuseFilter to work on pre-automerged edit
Closed, ResolvedPublic

Description

Reported at https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Irrelevant_tags , quoting below:

Why would this edit get tagged with copyright violation template removed and speedy deletion template removed if the edit did not remove the templates? Is this a bug in the tags system? The user used HHVM and VisualEditor, is this relevant? 109.79.177.119 09:43, 11 October 2014 (UTC)

(The page in question has unfortunately been deleted since, but the data is still there, just hidden.)

I did some quick investigating:

  • If you review the page history a bit (which I managed to do before it was deleted), the two templates were indeed removed at a point (and that was correctly tagged) and then restored, with the next edit being tagged incorrectly.
  • You can examine the logic that led to that edit being tagged at [[Special:AbuseFilter/examine/log/10945503]] – if you look at the row described as "Unified diff of changes made by edit", you can see that AbuseFilter really thinks these tags were removed in that edit (although they were not).

URL: https://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_131#Irrelevant_tags
See Also:

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:47 AM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz71947.
bzimport added a subscriber: Unknown Object (MLST).

(Adding 'hhvm' keyword just in case it's related to that.)

Here is another example on Portuguese Wikipedia:
https://pt.wikipedia.org/wiki/Special:AbuseFilter/examine/log/1898713?uselang=en
https://pt.wikipedia.org/w/index.php?diff=40613438&uselang=en
The string "{{ESR2|" is in "removed_lines" but not in "added_lines". However, it should not be in the "removed_lines", because it is still in the new revision:
https://pt.wikipedia.org/w/index.php?title=Zats_Boy&action=edit&oldid=40613438

This was reported on
https://pt.wikipedia.org/wiki/WP:CP?oldid=40613984#Falso_positivo_na_etiqueta

Se4598 added a project: VisualEditor.
Se4598 added a subscriber: Orlodrim.
Se4598 subscribed.

added VE Project as it's mentioned here and for all edits in merged T76063

I found how to reproduce this bug, at least in some cases. If you edit a page with VisualEditor and an automatic merge happens when you submit (i.e. someone submits a change between the moment you click "edit" and the moment you press "submit"), then AbuseFilter sees the revision before the merge.

readding VisualEditor: was removed without explanation and the correlation seems obvious.

@Oriodrim: Yep, I guessed the same in the linked dupe, AbuseFilter seems to see the pre-automerged version of edits from VisualEditor.

I have found the cause: The problem is that edits through API (that is the difference, @Catrope re your comment) pass through the AbuseFilter twice: The first time, via the onAPIEditBeforeSave hook, the second time via the onEditFilterMergedContent hook. Only the second hooks gets the post-merged content.

And, because AbuseFilter makes sure it processes every edit only once, it sets editAlreadyFiltered in filterEdit and on every subsequent hook call for the same edit it just returns without further processing. It means AbuseFilter sees and processes the pre-merged content in onAPIEditBeforeSave, and ignores the post-merged content in onEditFilterMergedContent.

You could check this by adding unset($context->getTitle()->editAlreadyFiltered); at the end of onAPIEditBeforeSave – the filter will run correctly the second time and see the postmerged content, however, the edit will be noted twice in the abuse log, so this is (obviously) not a correct solution.

I’d think the correct solution would be to make a variant of APIEditBeforeSave hook which would include the post-merged text and AbuseFilter would then switch to that new hook. Maybe. (Or, some voodoo with making sure APIEditBeforeSave is read-only, immutable test-this-edit, saving nothing into the abuse log, and not setting the editAlreadyFiltered flag. But that could be a bit brittle, I guess.)

Mormegil renamed this task from Incorrect tagging of edits on some Wikipedias – AbuseFilter's diff and the real diff are not the same to Edits via API (e.g. through Abuse Filter) cause AbuseFilter to work on pre-automerged edit.May 15 2015, 2:16 PM
Mormegil updated the task description. (Show Details)
Mormegil edited projects, added MediaWiki-Action-API; removed VisualEditor.
Mormegil renamed this task from Edits via API (e.g. through Abuse Filter) cause AbuseFilter to work on pre-automerged edit to Edits via API (e.g. through VisualEditor) cause AbuseFilter to work on pre-automerged edit.Aug 22 2015, 8:56 PM
He7d3r added a subscriber: FcoLeonSaudanha.

From the duplicated bug:

On
https://pt.wikipedia.org/wiki/Special:AbuseLog?wpSearchUser=189.5.174.202&wpSearchFilter=1&wpSearchTitle=Lygia+Moura+Rassi&uselang=en
we have two logs:

  • 13:39, 27 May 2015: 189.5.174.202 (Talk | block) triggered filter 1, performing the action "edit" on Lygia Moura Rassi. >Actions taken: Tag; Filter description: Remoção de proposta de eliminação (details | examine | diff)
  • 13:39, 27 May 2015: 189.5.174.202 (Talk | block) triggered filter 1, performing the action "edit" on Lygia Moura Rassi. Actions taken: Warn; Filter description: Remoção de proposta de eliminação (details | examine)

For some reason AbuseFilter is using the same wikitext for both attempts to save the page. This is wrong! In fact, after the user received the warning, he fixed the issue and submitted the edit again, but the filter was triggered a second time, as if the user used the same wikitext twice.

Originally reported on
https://pt.wikipedia.org/wiki/WP:Filtro_de_edi%C3%A7%C3%B5es/1#Falsos_positivos

@He7d3r This bug is the cause of the problem related in T100530, right?

As far as I can see, they are the same thing;

@He7d3r Hunm, well. At least the cause of this bug was identified, but the solutin does not seem to be simple...

This came up on the English Wikipedia again this week. Any chance of a fix? It seems like a pretty big issue that we are sometimes tagging and disallowing entirely the wrong edits.

Any chance of a fix?

Any contributed patches are welcome to speed up progress here.
See https://www.mediawiki.org/wiki/Developer_access and https://www.mediawiki.org/wiki/Gerrit/Tutorial for more information.

I suppose no one else is going to work on this.

Change 282100 had a related patch set uploaded (by Bartosz Dziewoński):
Avoid crippled APIEditBeforeSave hook, use new features of EditFilterMergedContent instead

https://gerrit.wikimedia.org/r/282100

This proved easier to fix than I expected thanks to @tstarling's wonderful work in 09a5febb7b024c0b6585141bb05cba13a642f3eb.

Change 282100 merged by jenkins-bot:
Avoid crippled APIEditBeforeSave hook, use new features of EditFilterMergedContent instead

https://gerrit.wikimedia.org/r/282100

matmarex removed a project: Patch-For-Review.

This bug has been fixed and the fix will be deployed to Wikimedia wikis this week, with MediaWiki version 1.27.0-wmf.21, per https://www.mediawiki.org/wiki/MediaWiki_1.27/Roadmap.