Usernames should use unicode whitelist
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Feb 13 2005, 8:50 PM

Description

Author: river

Description:
usernames should be restricted to a whitelist of characters which includes only
valid alphanumeric characters in each language, and punctuation. otherwise,
creating usernames (and page titles) with invalid characters will make it hard
to block vandals.

Version: 1.5.x
Severity: major

Details

Reference: bz1524

Title	Reference	Author	Source Branch	Dest Branch
Naive reverts based on revertrisk score	jsn/automoderator!1	jsn	Jsn.sherman/T352439	main
jobs: add job for managing harbor quotas	repos/cloud/toolforge/maintain-harbor!22	sstefanova	slavina/manage-project-quotas	main
gitlab_runner: upgrade to v16.4.2	repos/releng/gitlab-cloud-runner!304	jelto	upgrade-gitlab-runner-16.4	main
Fix: the kinit apt package has to do with KDE	repos/data-engineering/kerberos-kinit!3	brouberol	kinit-binary	main
Fix gitlab yml file until the pipeline passes	repos/data-engineering/kerberos-kinit!2	brouberol	fix-gitlab-ci-yml	main
Request access to trusted runners for repos/data-engineering/kerberos-kinit	repos/releng/gitlab-trusted-runner!53	brouberol	brouberol-main-patch-47896	main

Customize query in GitLab

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T5985 character conversion (tracking)
Resolved	None	T4593 Non-printing characters allowed in registration
Declined	None	T5969 Unicode (UTF-8, utf8) compatibility (tracking)
Declined	None	T14499 consistency between site setup and pages of the site
Resolved	None	T3524 Usernames should use unicode whitelist

Event Timeline

• bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:13 PM

• bzimport added a project: MediaWiki-User-login-and-signup.

• bzimport set Reference to bz1524.

• bzimport added a subscriber: Unknown Object (MLST).

• bzimport created this task.Feb 13 2005, 8:50 PM

*Invalid* characters (those that are illegal in XML or don't reliably cut and paste) need to be outright
blocked in titles.

Characters that simply some people are unable to type should not be a real problem as either there
should be a direct 'block' link, or cut-and-paste will always be available.

I'm not really inclined to proclaim what characters are appropriate for each language, as this will make
interoperability, writing on foreign topics, shared data, shared user accounts, global user accounts etc
very hard and will require a lot of manual mucking about as people whine for whitelists to be updated.

p_simoons wrote:

Agreed. There are to my knowledge no legit users on the English wiki that use
non-ASCII characters in their name, but it's a favorite trick of vandals and
impersonators.

avarab wrote:

*** Bug 2290 has been marked as a duplicate of this bug. ***

gangleri wrote:

(In reply to comment #0)

usernames should be restricted to a whitelist of characters which includes only
valid alphanumeric characters in each language, and punctuation.

This requirement and single user login will conflict with the wish to use
*natives* (non latin) alphabets in user names.

river wrote:

(In reply to comment #4)

(In reply to comment #0)

usernames should be restricted to a whitelist of characters which includes only
valid alphanumeric characters in each language, and punctuation.

This requirement and single user login will conflict with the wish to use
*natives* (non latin) alphabets in user names.

why?

lowzl wrote:

Usernames shouldn't be stored in a normalised form, however, users should not be
permitted to register names which would conflict with existing usernames, when
normalised.

Perhaps this could be achieved by adding a new field to the user table -
'username_normal' - and storing the normalised username there. Add a unique
constraint to the field, and then attempts to register a username which will
result in a collision when normalised will... well, result in a database error.

Now the question is, where do we get a reasonable map of confusable characters.
http://www.unicode.org/draft/reports/tr36/Attic/confusables.txt isn't
particularly extensive, but should work for most malicious cases. Perhaps we
should try to get a copy of the IDN normalisation map. The Unicode Consortium
has a long document about visual spoofing:
http://www.unicode.org/draft/reports/tr36/tr36.html

gangleri wrote:

(In reply to comment #5)

why?

There are many opinions about the restriction of usernames:
"Since this is the English Wikipedia, usernames ought to be constructed using
English characters, with allowances for scripts from other languages ..." from
[[en:Wikipedia_talk:Username#On_Unicode_and_other_odd_characters_in_usernames]]

Nevertheless the communitys decision about this should be more tolerant. With
regard to single user login it should be allowed to use Arabic, Cyrilic, Hebrew,
Hindu, Georgian whatsoever alphabets.

I would not object to usernames as [[user:۞]], [[user:░]], [[User:–]] etc. The
usernames are part of personality and creativity. Whatever opinion we have on
this / how we deal with this it is *reality* that there are also usernames like
[[en:user:god]] - see [[en:user talk:god]], [[en:user:satan]],
[[en:user:antichrist]] etc.

gangleri wrote:

some examples related to
bug 337: inconsistent treatment of character entities and illegal chararcters in
titles/links

http://en.wikipedia.org/wiki/User:%E2%80%8F
http://en.wikipedia.org/wiki/Special:Contributions/%E2%80%8F
http://en.wikipedia.org/wiki/User:Gangleri/tests/bugzilla:00337#User:.26rlm.3B

gangleri wrote:

http://en.wikipedia.org/wiki/User:%C2%A0

is a "construct" based on
bug 2173: Fatal error when removing an article with an whitespace title from the
watchlist

gangleri wrote:

compare with

bug 3696: Unicode Control Characters should be restricted in title text

gangleri wrote:

Usernames should use unicode whitelistClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Usernames should use unicode whitelist
Closed, ResolvedPublic
Actions

Related Objects
Search...