Page MenuHomePhabricator

Abuse filters can be fooled by using U+200B ZERO WIDTH SPACE (ccnorm doesn't remove/normalize them)
Closed, ResolvedPublic

Description

As you can check on
https://test.wikipedia.org/wiki/Special:AbuseFilter/tools
ccnorm("BAD")!==ccnorm("B​A​D")
where the first string has just 3 characters and the second one has a few invisible characters inside it.

Therefore, anyone can fool abuse filters which try to avoid ofenses, badwords, etc.. by just copying invisible characters in the text.


Version: unspecified
Severity: normal

Details

Reference
bz62049

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:03 AM
bzimport added a project: AntiSpoof.
bzimport set Reference to bz62049.
bzimport added a subscriber: Unknown Object (MLST).

To fix this, we would either need to add these characters to AntiSpoof's maintenance/equivset.in (and make them normalize to and empty string) or, if that's not possible/ desired, we could also extend our own ccnorm function.

Seems like antispoof would be the right place for this.

Change 117640 had a related patch set uploaded by Hoo man:
Map U+200B (zero width space) to an empty string

https://gerrit.wikimedia.org/r/117640

Change 117640 merged by jenkins-bot:
Map U+200B (zero width space) to an empty string

https://gerrit.wikimedia.org/r/117640