Page MenuHomePhabricator

Spam blacklist disallows addition of blacklisted links which are already there
Open, LowPublic

Description

Author: Beetstra.wiki (@Beetstra)

Description:
Pages which contain blacklisted links can be edited normally. One can also add an exact duplicate of the blacklisted link, elsewhere in the edit. For example, see https://en.wikipedia.org/w/index.php?title=Eric_Guthrie&diff=prev&oldid=621871444, where CyberBot II is tagging a page containing the blacklisted link http://cfl-scrapbook.no-ip.org/CFL-CanadianQB.php by re-adding the exact same link in a template in the top. Both the old and the newly added link is clickable (see https://en.wikipedia.org/w/index.php?title=Eric_Guthrie&oldid=621871444, click 'show' in the top template).

This is the expected behaviour of the spam-blacklist extension.

However, doing the same edits, CyberBot II is continuously being blocked on 4 pages, where this behaviour is not taking place, see https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=spamblacklist&user=Cyberbot_II&page=&year=&month=-1&tagfilter=&hide_patrol_log=1&hide_review_log=1&hide_thanks_log=1

Reproducing the behaviour, taking http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info? and adding it manually and subsequently trying to save the page also results in a block by the spam blacklist: see http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info?, last two items by Beetstra

See Also:

Details

Reference
bz69775

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:27 AM
bzimport added a project: SpamBlacklist.
bzimport set Reference to bz69775.
bzimport added a subscriber: Unknown Object (MLST).

Beetstra.wiki wrote:

Copy of log CyberBot II (https://en.wikipedia.org/w/index.php?title=Eric_Guthrie&diff=prev&oldid=621871444)

11:30, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
11:07, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
11:05, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
11:04, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
08:45, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
08:23, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
08:20, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
08:19, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
05:54, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
05:31, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
05:29, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
05:27, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
02:55, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
02:32, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
02:29, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
02:28, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
23:07, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
22:22, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
22:18, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
22:16, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
18:45, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
18:12, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
18:08, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
18:06, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
14:20, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
13:37, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.
13:33, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Chris Bell (politician) by attempting to add http://www.political.com.
13:31, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Batik by attempting to add http://www.samuibatik.com.
10:47, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
10:23, 19 August 2014 Cyberbot II (talk | contribs | block) caused a spam blacklist hit on Belfast Harbour by attempting to add http://belfast.ports-guides.com.

Beetstra.wiki wrote:

Copy of relevant items from Beetstra's log (https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=spamblacklist&user=Beetstra&page=&year=&month=-1&tagfilter=&hide_patrol_log=1&hide_review_log=1&hide_thanks_log=1 - link copied wrongly above!!):

09:16, 19 August 2014 Beetstra (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
09:15, 19 August 2014 Beetstra (talk | contribs | block) caused a spam blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.

Beetstra.wiki wrote:

Hmm, making many wrong links here - for below message, the log is at https://en.wikipedia.org/w/index.php?title=Special%3ALog&type=spamblacklist&user=Cyberbot_II&page=&year=&month=-1&tagfilter=&hide_patrol_log=1&hide_review_log=1&hide_thanks_log=1

(In reply to Dirk Beetstra from comment #1)

Copy of log CyberBot II
(https://en.wikipedia.org/w/index.
php?title=Eric_Guthrie&diff=prev&oldid=621871444)

11:30, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam
blacklist hit on Elena Sheynina by attempting to add http://my.mail.ru.
11:07, 20 August 2014 Cyberbot II (talk | contribs | block) caused a spam
blacklist hit on Belfast Harbour by attempting to add

.......

I'm almost positive that the problem lies with Cyberbot II and not the blacklist. My best guess is that the URLs contain some sort of special character that makes parsing of the bare version end sooner, so the bare version doesn't match the existing version.

Beetstra.wiki wrote:

(In reply to Jackmcbarn from comment #4)

I'm almost positive that the problem lies with Cyberbot II and not the
blacklist. My best guess is that the URLs contain some sort of special
character that makes parsing of the bare version end sooner, so the bare
version doesn't match the existing version.

@Jackmcbarn: have you read the items in Comment 3? Please try to copy-and-paste the blacklisted link (the one starting with http://my.mail.ru in [[Elena Sheynina]], pasting it into a new empty section and save the page. The exact link that is there is then blocked. Reverting this to new, as what I did was confirm that the bot's blocked edits are a problem that is also shown to others (that is, me), and likely also blocks legitimate edits elsewhere.

I think the problem you're seeing is because the link ends with a question mark, and if used as a bare link, the question mark isn't considered part of it, which does indeed make it a different link and correctly disallowed.

Beetstra.wiki wrote:

So what blocked MY edits there, I copy-pasted the link as well (I tried another route, and was able to save as well). Something in the parsing seems to be strange.

The original link in the page ended with a question mark. When you added the link, it was bare, so t he question mark wasn't picked up as part of it. See https://en.wikipedia.org/w/index.php?oldid=622612947 for an example. The second link there was basically what was already in the article, and the first one was the one you tried to add. Note that the question mark isn't part of it. To accomplish what you're trying to do, you'd need to add the link the way the third one there does. Unless there's anything I'm still missing, this is RESOLVED WORKSFORME.

Beetstra.wiki wrote:

No, it is not resolved. It gets weirder and weirder. I tried to add:

'{{Blacklisted-links|1=
*http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info?
*:''Triggered by <code>\bmy\.mail\.ru\b</code> somewhere''|bot=Cyberbot II|invisible=false}}'

(the template that the bot is supposed to leave) - that does NOT work

Also adding

'*http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info?'

does NOT work

However adding:

'*[http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info? link]

does work ([https://en.wikipedia.org/w/index.php?title=Elena_Sheynina&diff=622848973&oldid=622558855 diff]).

There is a difference in how the links are parsed, and which are blocked and not - see. It may still be a problem on the link itself, but this difference should not exist.

Beetstra.wiki wrote:

I see that is what you also show in your sandbox edit.
(In reply to Dirk Beetstra from comment #10)

No, it is not resolved. It gets weirder and weirder. I tried to add:

'{{Blacklisted-links|1=
*http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info?
*:''Triggered by <code>\bmy\.mail\.ru\b</code> somewhere''|bot=Cyberbot
II|invisible=false}}'

(the template that the bot is supposed to leave) - that does NOT work

Also adding

'*http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info?'

does NOT work

However adding:

'*[http://my.mail.ru/mail/sekhmet_oko/#page=/mail/sekhmet_oko/info? link]

does work
([https://en.wikipedia.org/w/index.
php?title=Elena_Sheynina&diff=622848973&oldid=622558855 diff]).

There is a difference in how the links are parsed, and which are blocked and
not - see. It may still be a problem on the link itself, but this
difference should not exist.

That is the exact same thing. Bare links can't end with a question mark. If you want a link to end in a question mark, you have to wrap it in square brackets. That's not a bug, though, so I'm not sure what you're saying the problem is.