regex expressions starting with caret (^) not functioning as per instructions say
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	Billinghurst
	Apr 28 2014, 10:49 AM

Description

I am not sure whether this is a case of the instructions being wrong, or something is broken, or that it doesn't work on meta's implementation of a global blacklist.

Instructions at [[mw:Extension:SpamBlacklist#Blacklist syntax]] state that a caret can be used to match the start of a blacklist regex

<quote>
"The '^' and '$' anchors match the beginning and end of the domain name, not the beginning and end of the URL."
</quote>

Recently this methodology was used at meta[1] in an attempt to lessen the collateral damage of a block on t.co" urls (which are routinely used to spam). [[m:User:COIBot/XWiki/t.co]]. However, it has been found that the "^t\.co\b" as tehregex has ot been effective and t.co domains have been able to be added.

It would be useful if we could work out which of the three scenarios that we have so the proper fix can be requested. Thanks.

[1]https://meta.wikimedia.org/w/index.php?title=Spam_blacklist&diff=7630728&oldid=7593377&diffonly=yes

Version: unspecified
Severity: normal

Details

Reference: bz64541

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:11 AM

• bzimport added a project: SpamBlacklist.

• bzimport set Reference to bz64541.

• bzimport added a subscriber: Unknown Object (MLST).

Billinghurst created this task.Apr 28 2014, 10:49 AM

I guess it's difficult to fix it now because regexes are now merged (piped) first; people would modify instructions to "fix" it now...

Maybe you can use (?<!-)\bt\.co\b?

The regex suggested while functional for a regex is unsuccessful in preventing addition of the link. [Tested by two people on two different wikis with linked being added]. So back to square one.

So we are back to the situation that the spam blacklist is not completely functional for regex to prevent addition of just t.co

^t\.co\b FAIL
(?<!-)\bt\.co\b FAIL

(In reply to billinghurst from comment #2)

The regex suggested while functional for a regex is unsuccessful in
preventing addition of the link. [Tested by two people on two different
wikis with linked being added]. So back to square one.

So we are back to the situation that the spam blacklist is not completely
functional for regex to prevent addition of just t.co

^t\.co\b FAIL
(?<!-)\bt\.co\b FAIL

In my test, [[MediaWiki:Spam-blacklist]]:

#<pre>
google
(?<!-)\bt\.co\b
baidu
#</pre>

Link:

http://t.co/abc

and it says:

The following text is what triggered our spam filter: t.co

... so it works for me?

Yepp, it works. It didn't last time probably because the list was not updated when it was tested. The documentation at mww is wrong.

Actually the instructions at mw:Extension:SpamBlacklist#Blacklist syntax <s>are</s>were wrong, because all blacklisted domains will be php-joined like https?://[a-z0-9.-]*(sbl0_|sbl_1|...|sbl_n).
So if one want's to block an exact domain, the code
(?<=//|\.)t\.co\b
can be used, i.e., "a 't\.co\b' which is preceded either by two slashes or a dot".
We do that already at w:de, w:en and at meta for years now. I agree, it's better to fix the instructions rather than changing the code. I've done that now. -> won't fix?

per seth's conversation, let it be and work on other things

MarcoAurelio added a project: Stewards-and-global-tools.Jan 11 2017, 11:29 PM

MarcoAurelio removed a subscriber: • wikibugs-l-list.

MarcoAurelio removed a parent task: T43492: [DO NOT USE] Steward, global sysop and SWMT tasks bugs (tracking) [superseded by #Stewards-and-global-tools].Jan 11 2017, 11:39 PM

MarcoAurelio moved this task from Untriaged to Closed on the Stewards-and-global-tools board.Jan 22 2017, 12:56 PM

regex expressions starting with caret (^) not functioning as per instructions sayClosed, DeclinedPublicActions

Description

Details

Event Timeline

regex expressions starting with caret (^) not functioning as per instructions say
Closed, DeclinedPublic
Actions