Page MenuHomePhabricator

Write and implement new regex-based blacklist extension
Closed, ResolvedPublic

Details

Reference
bz16717

Related Objects

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:25 PM
bzimport set Reference to bz16717.
bzimport added a subscriber: Unknown Object (MLST).

mike.lifeguard+bugs wrote:

Probably most of the spam blacklist bugs can be marked as depending on this, but someone more technically-minded might want to make those assessments.

mike.lifeguard+bugs wrote:

(In reply to comment #0)

Notes available here: http://www.mediawiki.org/wiki/Regex-based_blacklist

Why is this not best implemented in AbuseFilter?

(In reply to comment #2)

Why is this not best implemented in AbuseFilter?

<werdna> abuse filter is a crappy hack in general :)

(In reply to comment #4)

https://www.mediawiki.org/wiki/Requests_for_comment/Regex-based_blacklist.

Thanks for compiling the list MZ. Almost all of those features are available in AbuseFilter, so when the rfc is finished, it would probably be worth it to compare what it would take to add features to AbuseFilter vs write a new extension from scratch.

(In reply to comment #5)

Thanks for compiling the list MZ. Almost all of those features are available in
AbuseFilter, so when the rfc is finished, it would probably be worth it to
compare what it would take to add features to AbuseFilter vs write a new
extension from scratch.

AbuseFilter seems to serve a fundamentally different purpose than what's proposed here.

It seems to me that you are asking for [[mw:Extension:Phalanx]] to be created here. :)

Most of the requested features are indeed present in Phalanx, such as exact or regex matching, possibility to block edit/page move summaries, etc.

Though the following would be nice to have in Phalanx:
*Needs to validate to ensure proper regexes are input
-Marooned added a "QuickFix™ for bad regexes" over two years ago (see http://trac.wikia-code.com/changeset/23718), along with a to-do comment: "TODO: validate regexes on save/edit". This to-do is still valid and it's something that needs to be built.
*Needs to have safety check to ensure we don't ban all normal spaces or the letter K or whatever from page titles
-Phalanx assumes that the operator (human blocking stuff via it) knows what they're doing. If they don't...too bad. Given that we can't guarantee that all WMF stewards are and will be 1337 regex ninjas, this would be very nice to have, or otherwise Phalanx can render all WMF wikis unusable very easily.
*Needs Unicode normalization
-Not sure if this is already present
*Needs friendly output mode for non-Wikimedia Foundation wikis using the list
-Part of Phalanx's magic is that not everyone can view it (which is also true for private AbuseFilters on [name your favorite WMF wiki])...but I did once build a proof-of-concept API module for Phalanx, which would allow external wikis to use a certain Phalanx "list" (such as the WMF's in this case) as the master list. I.e. user X makes edit on external wiki Y, text is ran through WMF's Phalanx filters and if it matches a blocked phrase or whatnot on WMF Phalanx, edit is blocked. My proof-of-concept API idea had token-based authorization (so that you can't just set up a MediaWiki instance and effectively reverse-engineer the Phalanx blacklist), but that may or may not be a feasible idea. Just noting it down, though I know it sounds horribly proprietary and evil, but alas, not all anti-spam measures can be 100% open.
*Needs to warn sysops, but be overrideable upon confirmation
-Phalanx currently has the 'phalanxexempt' user right; Phalanx and its hook points are simply not initialized for users who have this right. It's not the same thing as seeing a warning message stating something like "www.example.com is currently Phalanxed, but since you are an admin, you can submit this edit by clicking here" or somesuch.

MZMcBride: So should this report be closed in favor of Phalanx, and missing features turned into feature requests?

(In reply to comment #8)

MZMcBride: So should this report be closed in favor of Phalanx, and missing
features turned into feature requests?

Probably.

SamanthaNguyen added a subscriber: SamanthaNguyen.

Phalanx is archived but it's functionality is separated into other functions such as RegexBlock and SpamRegex: RegexBlock is used to blacklist regex-based usernames or IPs, SpamRegex is used to blacklist certain regex-based phrases either in edits, edit summaries or both. I'll consider this as resolved (also see T176665 for having the extension as fully archived)