Page MenuHomePhabricator

wpSpamRegex entry for large image tables
Closed, ResolvedPublic

Description

Author: lunasantin

Description:
Vandals have recently been making use of software which produces a "bitmap" using very large tables with colored cell backgrounds. When asked about a wpSpamRegex entry to stop these, Platonides and myself each proposed a regex:

/<TR>(<TD BGCOLOR=["']?#......["']?>\.+<\/TD>){20,}<\/TR>/i
/<table>.+?(<td bgcolor.+?){400,}.+?<\/table>/i

Discussion seemed to favor the top one, but I'm listing both here for completeness.

The first option blocks any image table with over 20 cells in one row (effectively limiting the horizontal resolution), and will match only tables filled with periods ("..." etc). The second option blocks any such table with over 400 cells total (20x20 resolution), without paying attention to text. The current image tables we're trying to prevent are typically 85x9 in size or more, but obviously any match too narrow is easily avoided.

Brion suggested we should file a bug, so here we are.


Version: unspecified
Severity: enhancement

Details

Reference
bz15063

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:20 PM
bzimport set Reference to bz15063.
  • Bug 14811 has been marked as a duplicate of this bug. ***

mike.lifeguard+bugs wrote:

(In reply to comment #4)

Please have a look at
http://en.wikipedia.org/w/index.php?title=Wikipedia:Sockpuppet_investigations&curid=18053857&diff=270803120&oldid=270803099

Re-opening the bug so someone can tweak it if possible.

Three steps made it avoid the second regex:
-Parameters to the table
-It used newlines
-The table is not closed

This new regex can replace the second one, including now those contents.
/<table[^>]*>.+?(<td bgcolor.+?){400,}/is

Please provide a proper patch. This makes committing this stuff a lot easier.

(In reply to comment #7)

Please provide a proper patch. This makes committing this stuff a lot easier.

Ow, not 'commit' in this context, but 'applying to whatever file from http://noc.wikimedia.org/conf/ needs patching'.

I don't really think it's needed, but here's what you need to do:

// bug 15063, these won't last:
'/<TR>(<TD BGCOLOR=["\']?#......["\']?>\.+<\/TD>){20,}<\/TR>/i',
  • '/<table>.+?(<td bgcolor.+?){400,}.+?<\/table>/i',

+ '/<table[^>]*>.+?(<td bgcolor.+?){400,}/is',

// Weird thingy ....

It's line 5425 of InitialiseSettings.php

Will be doable with the Abuse Filter when it's live on the appropriate site(s).

mike.lifeguard+bugs wrote:

(In reply to comment #10)

Will be doable with the Abuse Filter when it's live on the appropriate site(s).

That will really require global abuse filter(s).

Removing dependency. This is a configuration change request. The other is an enhancement to AbuseFilter.

Please change the wgSpamRegex line

'/<TR>(<TD BGCOLOR=["\']?#......["\']?>\.+<\/TD>){20,}<\/TR>/i',

to

'/<TR>(<TD BGCOLOR=["\']?#......["\']?>(\.+|We|are|Anonymous)<\/TD>){20,}<\/TR>/i',

to also block the new vandalisms like http://en.wikisource.org/w/index.php?title=Template%3ATl&action=historysubmit&diff=1771366&oldid=1771352

I was coming to request a change to
'/<TR>(<TD BGCOLOR=["\']?#......["\']?>(\.|We|are|Anonymous| )+<\/TD>){20,}<\/TR>/i'

just to find out that I had requested the same three months ago, which would have prevented http://es.wikipedia.org/w/index.php?title=Plantilla:Portada_Bueno/970&diff=37518654&oldid=37518360

jeluf wrote:

Done.

Index: InitialiseSettings.php

  • InitialiseSettings.php (revision 808)

+++ InitialiseSettings.php (working copy)
@@ -6596,7 +6596,7 @@

'/avril\.on\.nimp\.org/i', // http://en.wikipedia.org/wiki/Special:Contributions/Hochitup
'/\.on\.nimp\.org/i', // per MrZ-man 2008-11-02 -- brion
// bug 15063, these won't last:
  • '/<TR>(<TD BGCOLOR=["\']?#......["\']?>\.+<\/TD>){20,}<\/TR>/i',

+ '/<TR>(<TD BGCOLOR=["\']?#......["\']?>(\.|We|are|Anonymous|)+<\/TD>){20,}<\/TR>/i',

'/<table>.+?(<td bgcolor.+?){400,}.+?<\/table>/i',
// Weird thingy http://en.wikipedia.org/w/index.php?title=Hellboy:_Sword_of_Storms&oldid=245477898&diff=prev
'/<span onmouseover="_tipon/',