Page MenuHomePhabricator

Spam filtering should be able to follow redirects
Closed, DeclinedPublic

Description

Author: smjg

Description:
I often use http://www.tinyurl.com/ to shorten the URLs of external sites linked
to from Wikipedia. This is useful for stopping the diffs from being too wide to
be readable.

However, tinyurl.com and other link-shortening services have just been added to
Wikipedia's spam blacklist. Of course, it can't be helped that some people will
try to use it to bypass Wikipedia's spam filter in order to link to bad sites
(pornography, malware, etc.). However, it's quite straightforward to stop this
abuse of these services, short of banning links to URL redirectors altogether.

When a link to a URL redirection domain is entered into a page, MediaWiki should
follow the link to find out what it redirects to. If the destination of the
redirect is on the blacklist, then veto the link. If it isn't, then allow it.

Following every link to see if it redirects somewhere might put more load on the
server than is desirable and slow down the process of saving a wiki page.
Therefore we probably should have, as well as a blacklist, a 'redirect list' of
domains in which to follow redirects.


Version: unspecified
Severity: enhancement
URL: http://meta.wikimedia.org/wiki/Talk:Spam_blacklist

Details

Reference
bz4891

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:05 PM
bzimport set Reference to bz4891.
bzimport added a subscriber: Unknown Object (MLST).

ssd.wiki wrote:

As stated in the mailing list, link redirectors present the
following unavoidable hazards:

  • (As you stated) we don't know where the link goes.
  • It places an external dependance on wikipedia.
  • The link could be changed by a third party without our knowlege.
  • The user has no way of knowing the link target until it is too late.

Only the first of these problems is fixable by following the link.
The best solution to this would be for mediawiki to follow the link, and REPLACE
it with the target. What advantage is there in mediawiki doing this replacement
vs. the user just not using the link redirector in the first place?

smjg wrote:

(In reply to comment #1)

As stated in the mailing list, link redirectors present the
following unavoidable hazards:

  • (As you stated) we don't know where the link goes.
  • It places an external dependance on wikipedia.
  • The link could be changed by a third party without our knowlege.

That's true of URL redirectors such as cjb and shorturl, which are designed for
creating a memorable, stable URL for a website. The point of tinyurl, OTOH, is
to produce URLs that permanently redirect to the same place.

Besides, the content of any webpage can change, whether it's been linked to
directly or via redirection.

  • The user has no way of knowing the link target until it is too late.

Not quite true. For example, http://validator.w3.org/ can be used to expand
such links before visiting the site.

Only the first of these problems is fixable by following the link.
The best solution to this would be for mediawiki to follow the link, and REPLACE
it with the target. What advantage is there in mediawiki doing this replacement
vs. the user just not using the link redirector in the first place?

Do you have a better idea for solving the problem of diffs becoming too wide?

ssd.wiki wrote:

I'd also like to point out that wikipedia has a built in link redirector. If
you say [http://url/] it replaces it with a small integer, or [http://url/
label] replaces it with a label. But you knew that.

As to the diff problem -- I suggest you file a separate bug report for that. A
better solution would be to fold the URL. (Does html have the inverse of NBSP?
This could use a NSPBR.)

smjg wrote:

You mean like a place where the line may be broken, but which doesn't otherwise
produce a space? I think the nearest you'll get is ­ (soft hyphenation
hint) but it seems that hardly any browsers support it (apparently some ignore
it and others render a visible hyphen even where it doesn't break the line). We
could also work some CSS magic to reduce the spaces to zero width, but that's a
dirty trick that'll break if CSS is disabled....

ssd.wiki wrote:

Closest bug I could find in a hurry is Bug 1229.
Discussions of changing diff behavior should probably go there?

I'm inclined to WONTFIX this; third-party redirectors are inherently unstable and
untrusted.

Diff behavior is not relevant, that's a separate issue.