Page MenuHomePhabricator

Variable added_lines is now an array instead of a string
Closed, ResolvedPublic

Description

  1. Create a page with "{{this is}}"
  2. Edit the page and replace its content by "*THIS IS A BUG ON ABUSE FILTER!":

https://pt.wikibooks.org/wiki/Project:Caixa_de_areia?diff=258499

  1. Open [[Special:AbuseFilter/test]], which in my case was

https://pt.wikibooks.org/wiki/Special:AbuseFilter/test?uselang=en

  1. Type ------------------------------------------------------------------------

length(added_lines) <= 2 & contains_any( added_lines, "THIS IS A BUG" )

  1. Click "Test" and bang! The edit is detected! If it is of any help, here is the link provided to "examine" the edit:

https://pt.wikibooks.org/w/index.php?title=Especial:Filtro_de_abusos/examine/262161&testfilter=length%28added_lines%29+%3D%3D+1+%26+contains_any%28+added_lines%2C+"THIS+IS+A+BUG"+%29%0A&uselang=en

The filter shouldn't detect it, because "added_lines" obviously has a lot more than 2 characters. Nonetheless, if I replace "<= 2" by "== 1" it is still detected.

How can a string of 31 characters have length 1? This is plain wrong...


Version: unspecified
Severity: major
URL: https://pt.wikibooks.org/w/index.php?title=Especial:Filtro_de_abusos/examine/262161&testfilter=length%28added_lines%29+%3C%3D+15+%26+summary+%3D%3D+%22%2ATHIS+IS+A+BUG+ON+ABUSE+FILTER%21%22
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=40478

Details

Reference
bz50107

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:58 AM
bzimport added a project: AbuseFilter.
bzimport set Reference to bz50107.
bzimport added a subscriber: Unknown Object (MLST).

Ok, so I made one more test at
https://pt.wikibooks.org/w/index.php?title=Wikilivros%3ACaixa_de_areia&diff=258547&oldid=258546

and used Special:AbuseFilter/test to check the following filter:

length(added_lines) == 3 & contains_any( added_lines, "Lorem" )

and the edit was detected.

It seems abuse filter's "added_lines" is an array and then "length(added_lines)" is the number of items on this array.

Such a behavior is not documented at
https://www.mediawiki.org/wiki/Extension:AbuseFilter/Rules_format
and it is not obvious from
https://pt.wikibooks.org/w/index.php?title=Especial:Filtro_de_abusos/examine/262215&testfilter=length%28added_lines%29+%3D%3D+3+%26+contains_any%28+added_lines%2C+%22Lorem%22+%29%0A

that the value of "added_lines" is an array (not an string).

A similar problem happens when I use "user_groups":
For a user ,

The test

"confirmed" in user_groups

matches a user which is in the group "autoconfirmed" but not in the group "confirmed". But the documentation[1] says "in" applies only to "strings", so why when I execute

length(user_groups)=4

I also get a match? (the user in question is in exactly 4 groups)

If "user_groups" is a string, its length should be at least the length of "confirmed" (which is 9), not 4.

[1] http://www.mediawiki.org/wiki/Extension:AbuseFilter/Rules_format#Keywords

Here is another test which returns 1 (which means "true"):

(list:= ["a", "b", "c"];) & contains_any(list,"a\n")

I think the real bug is we have no documentation about these variables being arrays and how they are converted to strings (notice the "\n" in the above example).

See also:

  • [[mw:Extension_talk:AbuseFilter/Rules_format#Please_add_description_of_variable_values]]

According to
https://ru.wikipedia.org/wiki/Special:AbuseFilter/history/99/diff/prev/2152
the
the variable "added_lines" was changed to an array somewhere from 13 February 2013 and as such "length (added_lines)" now gets the number of rows, and this changed the conditions under which existing filters were triggered.

Is this working like this since 2009??? I found the following on r49221:

  • Use lists instead of implode()d strings in built-in variables wherever it's possible

ATTENTION! This may break filters that rely on "added_lines contains 'bla-bla'" syntax. They'll need to be replaced with "string(added_lines) contains 'bla-bla'"

No need for "[regression]" in subject if you also set the keyword.
However, not sure if I would call it a regression if it's been like this for four years.

Another example of the confusion caused by this (from 17 June 2009):

  • [[Special:AbuseFilter/history/172/diff/prev/2121]]
  • [[Wikipedia talk:Edit filter/Archive 3#Filter stopped working - strange testing results ..]]

(In reply to comment #9)

See also:
https://www.mediawiki.org/wiki/Extension:AbuseFilter/Rules_format#Conditions

(added and documented by Helder)

Whoops, ignore that, wrong bug!

Daimona claimed this task.
Daimona subscribed.

Summing up:

  • length(added_lines) returns the number of added lines. This is intended: when you do length(array) you want it to return the # of elements.
  • added_lines is an array. This is also intended, since it makes sense
  • the right way to check for their length is length(string(added_lines)), which also counts linebreaks added in casting. IMHO this is also intended and makes sense. The only non-sense is a linebreak added at the end, which is already being addressed on gerrit.
  • the example of (list:= ["a", "b", "c"];) & contains_any(list,"a\n") might seem weird due to the linebreak, but it's better to document such behaviour instead of fix it.
  • finally, about the fact that functions and keywords like in, contains, contains_any ecc. do their work on the casted string, this is being addressed in T181024.

Anyway, I added a proper documentation about lists and their behaviour, thus closing.