Page MenuHomePhabricator

Cyrillic letter Д д (Д д) could match A
Closed, DuplicatePublic

Description

Author: kanegasi

Description:
The Cyrillic letter Д д (Д д) is missing. The AbuseFilter extension uses your equivalent set for its normalizing and I'm trying to filter some Russian bad words, particularly "Пиздец", and equivset doesn't contain this character.

Acceptance criteria

  • The Cyrillic letter Д д (Д д) is added to the AntiSpoof equivset for A

Details

Reference
bz58093

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 2:25 AM
bzimport added a project: AntiSpoof.
bzimport set Reference to bz58093.
bzimport added a subscriber: Unknown Object (MLST).

Confirming that in

maintenance/equivset.in 
maintenance/equivset.txt 
equivset.php

there is indeed no 414 Д entry, nor 434 д; and some other Cyrillic letters like Ш or Ц are not there either.

However, the extension description says "It blocks the creation of accounts with mixed-script, confusing and similar usernames" and I am not aware of Д being similar to another letter in another script.

Could you clarify which other letter is similar to Д ?

kanegasi wrote:

Seeing as the word "Говно" could be normalized into "R0BH0", I'd imagine "Д" would be listed under the "A" list.

This also brings up the question of actually trying to use the character in terms of the "norm" and "ccnorm" functions of AbuseFilter. If a character wasn't in the list, like this case, would it attempt to match it to the actual character or throw an error since it's not in the list? If I were to use a normalized rule for the word I first mentioned, "Пиздец", I would end up with "ΠИ3дEU". Would this rule still match the word?

Yet another question from what I wrote above, but more of an unrelated curiosity. Why doesn't И or и match N?

The question is whether faux cyrillic is considered similar, same for R and Я, N and И and so on.... I think the original intention was to match almost identical letters, but...

So it looks like this request would broaden scope if Д is considered similar to A and N and И.

Please see also T173699#3537122 where a lot more such mappings are listed. I think we should just go ahead and make a patch fo

@MaxSem — Your thoughts? Does this make sense to add to the mapping?

Dunno, I've seen it used as A in various leet renderings, but how high is the actual chance someone might not notice the difference?

My main concern (and reason for shepherding this through development) is so people can't circumvent AbuseFilter.