Page MenuHomePhabricator

"added_lines" is not in "new_wikitext" anymore and causes false positives
Open, LowPublic

Description

As of today, our
https://pt.wikipedia.org/wiki/Special:AbuseFilter/52?uselang=en
is broken. This appears to be due to some change in MediaWiki or AbuseFilter extension, which causes "added_lines" not to be inside of "new_wikitext" anymore when a page is created.

Previously:
https://pt.wikipedia.org/wiki/Special:AbuseLog/1865256?uselang=en

Currently:
https://pt.wikipedia.org/wiki/Special:AbuseLog/1866921?uselang=en


See Also:

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:54 AM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz72329.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to Helder from comment #0)

As of today, our

https://pt.wikipedia.org/wiki/Especial:Filtro_de_abusos/history/52/item/1554
(permanent link to that version)

I wonder whether this is related to bug 72345. Does this only happen for HHVM edits?

The change also affected https://pt.wikipedia.org/wiki/Special:AbuseFilter/history/66/item/1755?uselang=en

Since these filters are configured to disallow the edits, I don't have the diffs to check if they were made using HHMV.

I made a test and HHVM doesn't seems related:

(but notice I wasn't able to trigger the bug on these two tests)

(In reply to Helder from comment #3)

(but notice I wasn't able to trigger the bug on these two tests)

however, the bug is still happening for some edits, as "added_lines" is empty here:
https://pt.wikipedia.org/wiki/Especial:Registro_de_abusos/1867850

Could this be the same thing as bug 71947?

Aklapper set Security to None.

It seems that added_lines prepends a space to each newly added line and then concatenates them.

I'm sure it was not like this before. To be more clear, the following used to be true if the user had added a few consecutive lines:

added_lines in new_wikitext

But now it's not. Instead this will pass:

added_lines in str_replace(new_wikitext, '\n', '\n ')

It seems that added_lines prepends a space to each newly added line and then concatenates them.

I'm sure it was not like this before. To be more clear, the following used to be true if the user had added a few consecutive lines:

added_lines in new_wikitext

But now it's not. Instead this will pass:

added_lines in str_replace(new_wikitext, '\n', '\n ')

This seems to be a different problem. Someone just created T100069: added_lines include whitespace at the beginning

Possibly related to the $wgDiff config change.

CC'ing ori.

I'm testing this. By examining JS var dump, this is the conclusion. Suppose you create a new page with Foo as text. These are the generated variables:

  • added_lines = ["Foo"] (NOTE: doing added_lines == ["Foo"] won't match!)
  • new_wikitext = "Foo"
  • string(added_lines) = "Foo\n" (and I didn't add the linebreak)

What I see is a big mess in handling arrays, which directly reflects to how added_lines are handled. As stated in the parent task, the real problem here is T181024, or more precisely a wider view of it, i.e. arrays creation and behaviour.

Change 421695 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove the last linebreak from array to string conversion

https://gerrit.wikimedia.org/r/421695

The patch aboves solves part of the problem: it avoids the redundant \n at the end, and actually make added_lines in new_wikitext work in some cases. I'm saying "some", because there's still a case where it doesn't, i.e. when the added text contains (at least) one empty line. This is the situation: create a new page with

first line
second line

third line

With the above patch applied, these are the resulting variables (from JS):

  • added_lines = ["first line", "second line", "third line"]
  • new_wikitext = "first line↵second line↵↵third line"
  • string(added_lines) = "first line↵second line↵third line"

The problem here is that the linebreak is counted as separator and stripped. It shouldn't be too hard to find a solution, but may be complicated to find a nice one.

ADD: Actually, it's not that clear. An example like the one above may be matched by the filter. Also, I noticed another interesting thing: a filter with "added_lines in new_wikitext" may match such edit, but when examining the matched edit it says it wouldn't match.

I got to a point. This specific trouble of added_lines not being in new_wikitext is due to how diffs (and consequently added_lines) are handled in case of page creation (and only there). I'll explain better: if you make an edit like the one I proposed in the last message, the unidiff generated here counts the empty line as unchanged (it doesn't have the "+") and thus it isn't collected in added_lines in the following foreach. Instead, new_wikitext keeps (of course) the empty line and thus the "in" fails.
To solve this, I'll need some time to figure out a simple solution. A way out could be to separately treat the case of page creation and, instead of old_wikitext, provide some value that will make empty lines count as actually added. Or, better, we may rely on some system function to do that, although I'm not aware of any.

Well, actually, I almost wasted my time, since this is probably intended: the diff class is meant to behave like that, and it also does so for onwiki diffs: try to empty a page and add the usual text above. The central line won't be marked as added. So, actually added_lines are not in new_wikitext because of the lack of the middle line. Since we must keep a vision of added_lines in accord to how diffs are displayed, I think we have two possible solutions:

  1. We leave this as it is
  2. We change the diff class and make it mark empty lines as added

I'd prefer the second one, but we would need to determine if that makes sense (i.e. why isn't already like that) and whether we would need some changes in other places. In any case, the AF-related issue was addressed in the patch above, and everything else should be discussed in other places.

ADD: I also discovered that the patch above didn't help to solve the issue. While for page creations we still need to pick a solution from above, in other cases it's normal that added_lines aren't in new_wikitext. So, the filter is working fine, we only need to decide whether we want to change the core.

Daimona lowered the priority of this task from High to Medium.Mar 24 2018, 10:24 PM
Daimona edited projects, added MediaWiki-Page-diffs; removed Patch-For-Review.
Daimona lowered the priority of this task from Medium to Low.Mar 27 2018, 6:01 PM
Daimona removed a project: Regression.

Given the fact that this is the usual MediaWiki behaviour, we don't have any regression, nor this is a priority.

Well, actually the patch above seems to help. I don't understand anymore what's happening. I was going to move this on an upstream workboard an tried an edit to show the problem. On my local wiki I got the empty line without the plus after a blanking, on WMF wikis I didn't, maybe it's been updated recently (I don't have master in local). However, on WMF wikis added_lines still isn't in new_wikitext.

ADD: After writing these lines above, I realized that the problem with the diff is still here. Look at this edit and its AF counterpart. The AF variable edit_diff shows a non-added empty line (without the plus) right after "First line", however MW diff shows it as added. I'll try to update my MW installation and see why we have this discordance.

Change 424859 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Use empty arrays instead of empty strings for diffs

https://gerrit.wikimedia.org/r/424859

Change 421695 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove the last linebreak from array to string conversion

https://gerrit.wikimedia.org/r/421695

My conclusion is, both patches are necessary. The one for diffs restores the newlines when needed and cleans diffs including an empty page. The second one fixes the cast used by the 'in' keyword when doing 'added_lines in new_wikitext'. Both are tiny patches, but both should be reviewed carefully since they could potentially introduce regressions.

Change 424859 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Use empty arrays instead of empty strings for diffs

https://gerrit.wikimedia.org/r/424859

@Huji no :-) There's still the other patch above (https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/421695/), for which there'll be much things to do. Prior to merging it we should:

  1. Merge https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/AbuseFilter/+/424298/
  2. Take results from T193894 (I also saved and formatted them here), which will probably need to be updated, and fix every affected case, either using the new functions from (1.) or other things to be determined specifically for each filter.

Then, after merging, we should inform users of the change, on Tech News and on MWwiki docs, since the use of this "feature" is relatively wide.

Thanks for the quick analysis.

Change 421695 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove the last linebreak from array to string conversion

https://gerrit.wikimedia.org/r/421695

While I'd love to implement this change, this is almost impossible due to current usage.