Page MenuHomePhabricator

Fix use of DB schema so RenameUser is trivial
Closed, ResolvedPublic

Description

Right now, RenameUser relies on lobbing jobs into the job queue. However, the job queue is not designed to handle tasks in a reliable, ordered manner.

RenameUser is complicated because of the way we typically pull the user name from the database. There are several places where we pull denormalized values for the user_text field in several tables (revision, archive, etc). If we actually go to the source (user_name field in the user table), then renaming would be a much cheaper and more robust operation.

Examples of this cleanup are rSVN100286 and rSVN100300. Aaron has done some of this work, but would like help.

Details

Reference
bz31863

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:48 PM
bzimport set Reference to bz31863.
bzimport added a subscriber: Unknown Object (MLST).

This may benefit from a tweak to internal APIs.

Revision::getUserText() / Revision::getRawUserText() currently pulls from the rev_user_text field (unless it got overridden by a magic coalescy thingy in the row). This means that anything pulling its own queries may be missing the original names, as it'll be stuck with rev_user_text.

If joined columns from 'user' are available when initializing the Revision object from a row, then we should use that directly; but if not, we could do an on-demand lookup via the rev_user_id if it's non-zero (local user reference), or keep the rev_user_text if it's zero (usually IP, sometimes named non-local import markers).

With that in place, the worst case scenario should be that some batch queries might be missing the join and end up doing some more row-by-row lookups (they'll probably already be doing lots of those for user/talk page existence checks, so don't worry!)... but they'll show the correct results.

Might also think about a Revision::getUserObj() or something that would hand back a fully-ready User object, rather than having to cart around (id, text) pairs all the time.

(In reply to comment #1)

If joined columns from 'user' are available when initializing the Revision
object from a row, then we should use that directly; but if not, we could do an
on-demand lookup via the rev_user_id if it's non-zero (local user reference),
or keep the rev_user_text if it's zero (usually IP, sometimes named non-local
import markers).

Note that the "magic coalescy thingy" was replaced with just checking user_name already ;)

(In reply to comment #1)

With that in place, the worst case scenario should be that some batch queries
might be missing the join and end up doing some more row-by-row lookups
(they'll probably already be doing lots of those for user/talk page existence
checks, so don't worry!)... but they'll show the correct results.

Basically done in r100475.

sumanah wrote:

Adding Yuvi to this bug since he said he'd take a look at this.

Still lots of places that need JOINs or, preferably, batch lookups.

Do we have a list of these anywhere? We need to do renames in the very near future, and this would make it much easier.

The places MediaWiki core currently actively looks at a user_text column that isn't from the user table are listed here:

As of writing:

  • revision . rev_user_text
  • archive . ar_user_text
  • logging . log_user_text
  • image . img_user_text
  • oldimage . oi_user_text
  • filearchive . fa_user_text
  • recentchanges . rc_user_text

Extensions (especially WMF used ones) need auditing for this too...

After all the changes with the actor migration and so forth (per T224348#5226167), global renames are now pretty fast, almost instant, almost 8 years after this task was created. I was wondering, should we consider this resolved now?

I am going to close this. If someone feels it should remain open, please re-open it.
Thanks!