Page MenuHomePhabricator

Babel AutoCreate should check for duplicate categories
Closed, DeclinedPublic

Description

On enwiki, [[User:Babel AutoCreate]] has been creating several duplicate categories. The categories differ only in capitalisation, for example:

https://en.wikipedia.org/wiki/Category:User_En

which is a duplicate of

https://en.wikipedia.org/wiki/Category:User_en

This seems to depend on what capitalisation users use in their #babel invocations. (More on this specific point at bug 61993.)

I have blocked the Babel AutoCreate account on enwiki because of this issue, but if there is a way round it I would be happy to unblock.

As well as the fix I suggested in bug 61993, I think Babel should check for possible duplicate categories at different capitalisations, and avoid automatically creating categories for which it finds a match.

Let's say a user uses a Babel invocation of {{#babel: Xyz}}. Before creating the category "User Xyz", Babel should check for the existing categories "User xyz" and "User XYZ". Checking things like "xYz" probably wouldn't be necessary. However, it would be worth checking regional variations like "xyz-ab" versus "xyz-AB". This has been an issue with enwiki's [[Category:User en-gb]] and [[Category:User en-GB]].

If duplication checking is implemented, I would suggest getting Babel AutoCreate to log possible duplicates to its user page or a user subpage so that they can be checked by a human and created manually if necessary.

This bug might be fixed by fixing bug 61993, but the code would be more robust if there was an explicit check for duplicate categories as well. An explicit check would avoid the same problem happening in the future if category code capitalisation was made configurable, for example.


Version: unspecified
Severity: normal

Details

Reference
bz61994

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:00 AM
bzimport set Reference to bz61994.
bzimport added a subscriber: Unknown Object (MLST).

I think it should still create the alternate case duplicate category, but I think it should know that it is an alternate duplicate case category and create it as a category redirect to the proper case. Doing this <s>will</s>should allow the other existing bot to re-categorize those mis-cased instances to the proper category.

Part of this might be resolved by fixing T63993. I think that language codes should be lowercase by default.

T63993 will fix most of the cases. This task is asking too much IMHO.

TTO subscribed.

Per Ricordisamoa.