Page MenuHomePhabricator

Category count of file numbers is wrong on first page
Closed, ResolvedPublic

Description

The category count of file numbers is wrong on the first page:

https://commons.wikimedia.org/wiki/Category:GFDL :

"The following 201 files are in the current category. "

The subsequent pages show the correct number:

https://commons.wikimedia.org/w/index.php?title=Category:GFDL&uselang=en&filefrom=%22X%22.JPG#mw-category-media

"The following 201 files are in this category, out of 3,078,868 total."

I think this is a code regression.


Version: 1.18.x
Severity: normal

Details

Reference
bz31732

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:49 PM
bzimport set Reference to bz31732.
bzimport added a subscriber: Unknown Object (MLST).

Also, that should be following 200, not 201 articles.

"The following 201 files are in the current category. "

is the correct thing to say, if we don't know how many files are in the cat in total, however we must clearly know if we know on the second page how many there are.

Ok, so what's happening:

For some reason we are returning 201 results for the image gallery section of the category (instead of 200 like we should. We get 201 images so we know if to make the continue link, but we normally shouldn't display image number 201). This causes MediaWiki to detect an inconsistency in the category table (aka the total counts) and not display the total number of images in that cat. Totals are displayed on the next page, as having an offset disables much of the consistency checks (since they don't make sense in that case)

The issue is not present in NOGALLERY cats (Ex [[commons:Category:Polish_pronunciation]] ). At first glance does not seem to be categorytree related (&notree=true url parameter didn't affect it). I'm also having trouble reproducing locally (even on my 1.18wmf1 checkout, but i am using a much lower $wgCategoryPagingLimit on my local checkout)

Where this gets really weird is that issue is not present on enwikipedia - [[Category:Diagram_images_that_should_be_in_SVG_format]] (only 200 images are returned per page as expected. (but then again neither is [[commons:category:Unidentified_sunset_locations]] affected, so beats me)

Based on toolserver db, looks to be caused by inconsistencies in the commons db.

File:Fresco with Trompe l'oeuil - Andrea Pozzo -Jesuit Church Vienna.jpg

has page_namespace of 6 (NS_FILE) and a page_id of 2602773, but several of its categorylinks have a cl_type of "page" instead of "file":

mysql> select cl_to, cl_type from categorylinks where cl_from=2602773;
+---------------------------------------+---------+

cl_tocl_type

+---------------------------------------+---------+

Andrea_Pozzopage
CC-BY-2.5page
CC-BY-SA-3.0-migratedpage
GFDLpage
Jesuit_Church,_Viennapage
License_migration_redundantpage
Media_with_locationsfile
Quality_imagespage
Quality_images_of_Austriapage
Quality_images_of_churches_in_Austriafile
Self-published_workpage
Trompe_l'oeil_in_Austriapage

+---------------------------------------+---------+
12 rows in set (0.00 sec)

Thus, when mediawiki does the query, it gets this image as part of the query for normal pages, but then sorts it in the image section since it uses page_namespace for dividing between the same section. This results in image section having more than 200 images. The is counts in category table consitant code sees that number of images returned does not equal $wgCategoryPagingLimit (The gist of the code seems to suggest < instead of != is the true condition being looked for), but that there should be more images in total than the paging limit, and no offset has been specified, so thinks that category counts are wrong.

The code should possibly handle this situation better, but I'm not entirely sure what the right way to handle it is.

sumanah wrote:

Per IRC discussion today, removing the 1.18 milestone as this does not seem urgent enough for us to aim to fix this by the 1.18 release.

(In reply to comment #4)

File:Fresco with Trompe l'oeuil - Andrea Pozzo -Jesuit Church Vienna.jpg

has page_namespace of 6 (NS_FILE) and a page_id of 2602773, but several of its
categorylinks have a cl_type of "page" instead of "file":

Are the those affected pages using Media: links by any chance?

(In reply to comment #6)

(In reply to comment #4)

File:Fresco with Trompe l'oeuil - Andrea Pozzo -Jesuit Church Vienna.jpg

has page_namespace of 6 (NS_FILE) and a page_id of 2602773, but several of its
categorylinks have a cl_type of "page" instead of "file":

Are the those affected pages using Media: links by any chance?

I'm unclear how you can have a category link in the media namespace(?).

It looks almost like a failure of the schema update script ("page" is the default cl_type if not set otherwise, and the categorylinks didn't have a cl_collation entry, cl_sortkey still has namespace, etc. )

For reference, this is what a relavent entry in the categorylinks table looked like (taken from toolserver):

     cl_from: 2602773
       cl_to: Andrea_Pozzo
  cl_sortkey: File:Fresco with Trompe l'oeuil - Andrea Pozzo -Jesuit Church Vienna.jpg
cl_timestamp: 2011-07-07 15:01:48

cl_sortkey_prefix:

cl_collation: 
     cl_type: page

On the other hand 2011-07-07 15:01:48 is after the categorylinks schema update. On the other hand, the image wasn't edited anywhere remotely near that date, so I still think its just an artifact of updateCollations.php doing something wrong for one individual file.


Note, I did some dummy edits to [[commons:File:Fresco with Trompe l'oeuil - Andrea Pozzo -Jesuit Church Vienna.jpg]] since there's no sense in forcing commons to live with the bug while we think of what to do about it (I still feel MediaWiki should handle the situation more gracefully). To reproduce you simply have to manually change the cl_type from "file" to "page" on some categorylink entry on your test wiki.

(In reply to comment #7)

I'm unclear how you can have a category link in the media namespace(?).

Nvm, I was confusing this with two other bugs.

(mass change)

  • 1.18.0 and 1.19.0 have been released already.
  • Moving open bugs targeted for 1.18.0 or 1.19.0 to Mysterious future.
  • Please re-target them to 1.19.x or 1.20.0 if needed.

Are there any known instances of this bug? If not I suggest closing as a one time db referential integrity issue.

(In reply to comment #10)

Are there any known instances of this bug? If not I suggest closing as a one
time db referential integrity issue.

Looks fixed somehow now. Closing.