Page MenuHomePhabricator

URLs should trigger the blacklist regardless of parser functions
Open, LowPublic

Description

Author: mike.lifeguard+bugs

Description:
By using {{#time: (as in this example) or possibly other parserfunctions, one may circumvent the blacklist. I'm not sure how to get this sort of thing blocked, but this should be worked out.

Also in the URL field: http://nl.wikipedia.org/w/index.php?title=Gebruiker:Emil76&diff=prev&oldid=13856552


Version: unspecified
Severity: normal
URL: http://nl.wikipedia.org/w/index.php?title=Gebruiker:Emil76&diff=prev&oldid=13856552

Details

Reference
bz15582

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:19 PM
bzimport added a project: SpamBlacklist.
bzimport set Reference to bz15582.
bzimport added a subscriber: Unknown Object (MLST).

mike.lifeguard+bugs wrote:

I guess we're not using tracking bugs for this any longer.

The basic problem here is that we have to determine not whether a URL exists in the page now, but whether under any circumstances of possible input data it _could_. That quickly becomes difficult or impossible if you move from this case to slightly less trivial ones, including already-known-possible cases involving transclusion of multiple pieces edited at different times.

The only real solution to that I can think of is to apply the blacklist at rendering time for *views* as well as at *edit* -- matching links could be de-linked or removed and the page marked on a queue for review.

This probably wouldn't perform terribly well, but could perhaps be optimized.

Don't know if it's worth the effort.

mike.lifeguard+bugs wrote:

(In reply to comment #2)

The basic problem here is that we have to determine not whether a URL exists in
the page now, but whether under any circumstances of possible input data it
_could_. That quickly becomes difficult or impossible if you move from this
case to slightly less trivial ones, including already-known-possible cases
involving transclusion of multiple pieces edited at different times.

Sure, but can't this slightly-less-exotic case be covered will less trouble?

The only real solution to that I can think of is to apply the blacklist at
rendering time for *views* as well as at *edit* -- matching links could be
de-linked or removed and the page marked on a queue for review.

I think filtering on view might be worth doing, perhaps with a notice "This page has spam that we've automatically hidden from your sensitive eyes, please help clean it up. You're looking for the domain spam.org -> [edit]" - especially useful now that saving isn't blocked when the domain already existed in the page (bug 1505). (however a queue seems like overkill)

mike.lifeguard+bugs wrote:

(In reply to comment #3)

Sure, but can't this slightly-less-exotic case be covered will less trouble?

*WITH less trouble

mike.lifeguard+bugs wrote:

*** Bug 16354 has been marked as a duplicate of this bug. ***

firejackey wrote:

(In reply to comment #5)

> *** Bug 16354 has been marked as a duplicate of this bug. ***

thanks

A minimal fix, to stop this from being attractive to vandals, would be to simply silently ignore any blacklisted URLs unexpectedly encountered during a parse. I wouldn't (naively, perhaps) expect this to cause too much load; surely our page caching should be good enough that pages don't get needlessly reparsed very often?

  • Bug 16610 has been marked as a duplicate of this bug. ***
Anomie added subscribers: MBH, Aklapper, Base.