Page MenuHomePhabricator

Change Tags structure to use numeric IDs instead of text
Closed, ResolvedPublic

Description

Currently the Tags core feature is using text names to refer to individual tags. Referring to the tags using an ID number seems like a much better solution overall.

It avoids typos or poor naming choices from being stored in the database forever (and in the MediaWiki: messages). It also ensures cleaner database results, allows for easier joining with other tables in the future (for outside extensions), and avoids the general nastiness of allowing (apparently) any character in the tag name, including question marks, quotation marks, ampersands, etc.

Awaiting vociferous opposition and a rapid close.


Version: unspecified
Severity: enhancement

Details

Reference
bz18672

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:37 PM
bzimport set Reference to bz18672.
bzimport added a subscriber: Unknown Object (MLST).

Marking this bug as Lowest priority.

I've done this in a batch to (usually enhancement request) bugs where:

  • It is not clear that this bug should be fixed.
  • It is not clear how to fix this bug.
  • There are difficulties or complications in fixing this bug, which are not justified by the importance of the bug.
  • This is an extremely minor bug that could not be fixed in a few lines of code.

If you're interested in having one of these bugs fixed, your best bet is to write the patch yourself.

varchar is more flexibel and you have not to look up the long name (performance).
The ContentHandler was implemented with integers, but than moved to varchar for the same reasons.

Helder: Comment 1 requires answers before raising priority.

Now that bug 25824 was fixed, if a filter is created to tag some edits and the author choose a very unappropriated tag name, it will show up in the diffs of any edits detected by such a filter
http://test.wikipedia.org/w/index.php?title=Bug18672&diff=176578

When this is problematic, sysops still have no way to fix remove a tag from a revision:

  • False positives can't be marked/reverted (bug 28213), and as such the tags can't be hidden in these cases
  • Poorly named tags can't be renamed (this bug)

I think it is far too late to even consider migrating away from this structure, despite its many problems. The performance concerns mentioned by duplicatebug are also discouraging - the change tagging feature already has (minor) performance issues using varchar fields...

It avoids typos or poor naming choices from being stored in the database forever (and in the MediaWiki: messages).

This would have to be fixed by creating a means to rename tags.

It also ensures cleaner database results, allows for easier joining with other tables in the future (for outside extensions),

Yeah, this is the worst part of the current setup by a long shot. It makes me sad, but thinking about fixing it makes me much sadder.

and avoids the general nastiness of allowing (apparently) any character in the tag name, including question marks, quotation marks, ampersands, etc.

Don't really see how that is "nasty". Commas cause major havoc if used (enforcement will be improved through the work in T20670), and forward slashes are highly dubious (T65850), but anything else is fine. After all, a unique key is just a unique sequence of bytes..

Ladsgroup claimed this task.
Ladsgroup subscribed.

It's being done as part of T185355