Page MenuHomePhabricator

Creating a page in category namespace does not insert row in category table.
Closed, ResolvedPublic

Description

Example: http://commons.wikimedia.org/wiki/Category:Cultural_heritage_monuments_in_Dari%C3%A9n_Province

This category was created by BotMultichill to extend the category tree. However, querying it using the api will not show the category:

http://commons.wikimedia.org/w/api.php?action=query&list=allcategories&acprop=hidden&acfrom=Cultural%20heritage%20monuments%20in%20Dari

This happens because, while there /is/ a Page row:

mysql> select * from page where page_namespace=14 and  page_title>="Cultural_heritage_monuments_in_Dari" limit 1;
+----------+----------------+-------------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+-----------------------+
| page_id  | page_namespace | page_title                                      | page_restrictions | page_counter | page_is_redirect | page_is_new | page_random    | page_touched   | page_latest | page_len | page_no_title_convert |
+----------+----------------+-------------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+-----------------------+
| 21211737 |             14 | Cultural_heritage_monuments_in_Darién_Province  |                   |            0 |                0 |           1 | 0.721362456992 | 20130526075638 |    77704695 |      125 |                     0 |
+----------+----------------+-------------------------------------------------+-------------------+--------------+------------------+-------------+----------------+----------------+-------------+----------+-----------------------+

there is no entry in the category table:

mysql> select * from category where cat_title>="Cultural_heritage_monuments_in_Dari" limit 1;
+----------+------------------------------------------+-----------+-------------+-----------+------------+
| cat_id   | cat_title                                | cat_pages | cat_subcats | cat_files | cat_hidden |
+----------+------------------------------------------+-----------+-------------+-----------+------------+
| 64836847 | Cultural_heritage_monuments_in_Darmstadt |        17 |           2 |        15 |          0 |
+----------+------------------------------------------+-----------+-------------+-----------+------------+

While, in general, empty categories *do* show up in the result - as long as they have had a page in them in the past.

So, as far as I can see, two things have to be addressed in a patch:

  1. when a new page in category namespace is created, a row in the category table should be inserted
  2. the category namespace should be scanned for missing entries in the category table, and those should be added.

Version: 1.22.0
Severity: normal

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:31 AM
bzimport set Reference to bz48824.
bzimport added a subscriber: Unknown Object (MLST).

Brad: Any chance you could take a look at it (or any idea who to ask instead)? Thanks in advance.

Merlijn's analysis is basically correct. I only have two things to add:

  1. This is not limited to the API, it also applies to Special:Categories. See https://commons.wikimedia.org/wiki/Special:Categories/Cultural_heritage_monuments_in_Dari
  2. This is actually the documented behavior, although I don't know whether that documentation exists outside of comments in maintenance/tables.sql: the definition of an existing category is that it at some point had a page in it, regardless of whether a corresponding page in namespace 14 exists or ever existed.

The question is whether this is just bug 1 (improve the documentation) or if the definition of "category" should be changed as suggested. I see no particular reason not to change the definition besides "it's more work", but others might know more about the situation than I do.

(In reply to comment #2)

  1. This is actually the documented behavior

Oh I see. Thanks for elaborating! Lowering priority.

To query all existing categories, it is possible to use allpages with apnamespace=14

But when a category contains at least one page and that page was removed, it will still be shown on allcategories, but can be filtered by acmin=1 (bug 26411)

Changing the definition sounds fine to me.

Change 298791 had a related patch set uploaded (by Anomie):
Only store currently-existing categories in the categories table

https://gerrit.wikimedia.org/r/298791

matmarex set Security to None.

Change 298791 merged by jenkins-bot:
Only store currently-existing categories in the categories table

https://gerrit.wikimedia.org/r/298791

Marking this resolved now. Use T140811: Run maintenance/cleanupEmptyCategories.php to track the need to run the maintenance script to clean up the existing data on WMF wikis.