Page MenuHomePhabricator

Invalid usernames on Wikimedia web sites
Closed, ResolvedPublic

Description

Author: avarab

Description:
Some of the usernames that currently exist would not be accepted by the current
system if registed today, this includes:

  • Usernames with the letter "/" in them
  • Usernames that start with a lower case letter

I've made a list of these usernames using maintenance/checkUsernames.php and put
the log in /home/wikipedia/logs/checkUsernames.log (linked in the url to this bug)

These should Ideally be changed by local beaurocrats of these wikis using the
Special:Renameuser extension which as of 2005-09-19 needs to be modified to
accept invalid input to be useful for this.


Version: unspecified
Severity: normal
OS: Linux
Platform: Other
See Also:
T28396: Permit lowercase (uncapitalized) usernames and user pages

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:49 PM
bzimport set Reference to bz3507.
bzimport added a subscriber: Unknown Object (MLST).

bugs wrote:

Unless avar still has the listing, there's nothing further we can do about this bug (if it hasn't been done already in the last 2 years).

robchur wrote:

Well, there is - we can generate another listing using his script, or another script, and advise those users of the need to rename their accounts.

(Special:Renameuser will now accept "invalid" username input for the original username, provided that the user row exists...)

Please don't close bugs just because they're old.

  • Bug 3501 has been marked as a duplicate of this bug. ***

For another example, see the usernames with double spaces referred to in dupe bug 3501.

mike.lifeguard+bugs wrote:

*** Bug 5230 has been marked as a duplicate of this bug. ***

This is related to bug 323; assuming a properly-coded Special:Renameuser, renaming users with invalid names should cause all their edits to display in Special:Contributions.

AFAIK this would only work if the rev_user field in the revision table or the r_user field in the archive table is not 0. The above values are always 0 for edits from before the introduction of the Phase II software, as shown at:
http://en.wikipedia.org/wiki/User:Nemo_bis/Bug_323_revisions

It would also not work where duplicate usernames are involved such as "Larry_Sanger" and "Larry Sanger". In the latter case, those usernames should ideally be merged.

  • Rename users of which their edits have a _user higher than 0 in the other tables and do not conflict with an existing username
    • If proof is found that they are the same, merge users
    • If user is still active rename to something else with their permission
  • Otherwise: Leave it be for now

That should clean up most cases, right ?

Related URL: https://gerrit.wikimedia.org/r/64243 (Gerrit Change Idb4b84b853f4f86e90a470e937af9017cee15e44)

Related URL: https://gerrit.wikimedia.org/r/64246 (Gerrit Change Idb4b84b853f4f86e90a470e937af9017cee15e44)

Related URL: https://gerrit.wikimedia.org/r/64247 (Gerrit Change Idb4b84b853f4f86e90a470e937af9017cee15e44)

Created attachment 12333
Output of checkUsername

reedy@terbium:~$ grep \: checkUsernames.log -c
1193

1193 bad usernames on all WMF wikis

List grouped by dbname attached

Attached:

MariaDB [enwiki_p]> select * from user where user_name = 'ɑʀʇʉʀɵ'\G

  • 1. row ******* user_id: 3103938 user_name: ɑʀʇʉʀɵ user_real_name: user_password: user_newpassword: user_email: user_options: user_touched: user_token:

user_email_authenticated:

user_email_token:

user_email_token_expires:

user_registration: 20061226002342
user_newpass_time: 
   user_editcount: 39

1 row in set (0.03 sec)


"ɑʀʇʉʀɵ" gets transformed to "Ɑʀʇʉʀɵ".

(In reply to comment #14)

"ɑʀʇʉʀɵ" gets transformed to "Ɑʀʇʉʀɵ".

A more recent example of that was [[user:ɱ]] (now Ɱ), with the lowercase "latin small letter m with hook / left-tail m" that didn't get consistently normalised to its capitalised/uppercase version. I can't find the bug for that, if any was filed, but it may require a core fix first hence a separate report.

There are 218 local badusername accounts that have to be renamed for SUL finalization, so it's time to settle this and consider appropriate new names for these accounts. Keep in mind that these accounts have not been able to even log-in in a decade.

The suggestion "Global renamed user <XXXX>" has been made. Any thoughts on this?

There are 218 local badusername accounts

F2314 (linked above, from 2013) contains over 1,000 entries... is there a different process being used here?

The suggestion "Global renamed user <XXXX>" has been made. Any thoughts on this?

What's <XXXX> meant to be, an arbitrary number?

In many cases the usernames are invalid simply because the initial character is lowercase (see a number of larger Wiktionaries, e.g. fr, de, it), so it could be helpful to rename these accounts to something like Renamed user "[original username]" or Lowercase user "[original username]". For other kinds of invalid usernames, I think an arbitrary name is appropriate - in most cases we'd be wasting our time to do anything else.

Do any of these invalid users have significant contributions? That could be worth checking out.

In T5507#1056396, @TTO wrote:

There are 218 local badusername accounts

F2314 (linked above, from 2013) contains over 1,000 entries... is there a different process being used here?

There can be, as you mention below. I raise the issue in relation to F2314 because if we're going to have to resolve this for some usernames, we might as well work out how to do it for all.

The suggestion "Global renamed user <XXXX>" has been made. Any thoughts on this?

What's <XXXX> meant to be, an arbitrary number?

Yes, my apologies for not being clear.

In many cases the usernames are invalid simply because the initial character is lowercase (see a number of larger Wiktionaries, e.g. fr, de, it), so it could be helpful to rename these accounts to something like Renamed user "[original username]" or Lowercase user "[original username]". For other kinds of invalid usernames, I think an arbitrary name is appropriate - in most cases we'd be wasting our time to do anything else.

Hmm...I like this naming convention very much.

Do any of these invalid users have significant contributions? That could be worth checking out.

I'd guess at least one or two do, so I think it's worth checking out as well. Anyone feel like a quick check? What's the arbitrary number for "significant"? Over 25 edits?

In T5507#1056396, @TTO wrote:

Do any of these invalid users have significant contributions? That could be worth checking out.

I'd guess at least one or two do, so I think it's worth checking out as well. Anyone feel like a quick check? What's the arbitrary number for "significant"? Over 25 edits?

I took the time to check the F2314 list myself. Most users have 0 edits according to user_editcount. Most of the accounts with significant edit counts are on wiktionaries and jawiki. There are 7 accounts with more than 200 edits, of which 3 have more than 2000 edits (one each on dewiktionary, fawiki, and frwiktionary).

I don't think any of that really matters too much, though. As you said, these people haven't been able to log in for aeons. As a non-shell user I can't see any fields like user_touched, but the user IDs tell their own story in most cases.

Keegan moved this task from To-do to Doing on the SUL-Finalization board.

Script is written, @Legoktm just needs to run it.

Change 208988 had a related patch set uploaded (by Legoktm):
Script to rename users with invalid usernames and make them global accounts

https://gerrit.wikimedia.org/r/208988

Change 208988 merged by jenkins-bot:
Script to rename users with invalid usernames and make them global accounts

https://gerrit.wikimedia.org/r/208988

All invalid accounts have now been renamed. T98757 should take care of making sure that we don't get into this mess again.