Page MenuHomePhabricator

Article count in Special:Statistics incorrect
Closed, ResolvedPublic

Description

In kshwiki, we seem to have an issue with the article count.
10635 non-redirect pages in the pages table
10596 pages according to query in the maintenance script [1]

9972 shown in Special:Statistics

[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/updateArticleCount.inc?revision=50942&view=markup

Here are queries made on the toolserver data base, with results:

mysql> use kshwiki_p ;
mysql> SELECT count( page_id ) FROM page \

WHERE ( `page_namespace` ) = 0 \
  AND ( `page_is_redirect` = 0 ) ;

+--------------------+

count( page_id )

+--------------------+

10635

+--------------------+
1 row in set (10.90 sec)

mysql> SELECT COUNT(DISTINCT page_namespace, page_title) \

    AS pagecount FROM `page` , `pagelinks` \
WHERE `pl_from` = `page_id` AND `page_namespace` = 0 \
  AND `page_is_redirect` = 0 AND `page_len` > 0 ;

+-----------+

pagecount

+-----------+

10596

+-----------+
1 row in set (2.93 sec)

mysql> SELECT ss_good_articles FROM site_stats ;
+------------------+

ss_good_articles

+------------------+

9972

+------------------+
1 row in set (0.00 sec)

Imho, the difference is likely at least in part caused by a
software update. The old parser accepted a some comments inside
redirect pages, which the new parser does not. Thus, some
existing such pages were not included in the good pages count
with the old parser. Now, when we detect them, we correct them,
the new parser sees a non-redirect becoming a redirect, and it
decrements the count of good pages. This may well have happened
~620 times, at least it appers to be a very reasonable figure.

Another issue which I observed several months ago and failed
to report: in order to duplicate two pages, including their edit
history for a page split, either I exported and re-imported
them with new page titles, or I exported, renamed, and reimported them with the original page titles. This did not
increase the page count.


Version: 1.16.x
Severity: major
URL: http://ksh.wikipedia.org/wiki/Spezial:Statistik?uselang=en

Details

Reference
bz19919

Related Objects

StatusSubtypeAssignedTask
OpenFeatureNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:43 PM
bzimport set Reference to bz19919.
bzimport added a subscriber: Unknown Object (MLST).

Now, the "active users" count in the ksh Wikipedia became -1,
while the "good articles" were 106xx, for a short time, at least.

Indeed, the above diagnosis about the parser difference, and its
results are correct. With the problem diagnosed, and a newly made
"redirect" pywikipediabot page generator available,
we currently run bot fixing all those problem redirects.
It made the "good article" counter fall below zero some time around
the transit from July 31th to August 1st (UTC) +/- an hour, I believe.

This seems to have caused the "good articles" counter to be
re-evaluated, and the "active users" count to be set to -1 at the
same time.

(It ist neither useful, nor necessary, to correct statistics manually
while the bot is still running, I'll file an extra bug, when it is done)

See also bug 20017
See also bug 10834

Practical impact on kshwiki is addressed in Bug 20143 now,
once the bot fixed a the problematic pages.

This may mean that this bug should be closed, but a possibly better solution was to add the fix to the site update script, when switching
to the new parser.

Here is the bots command line:

python pywikipedia/replace.py -v -pt:6 -log -regex -nocase -always \

-redirectonly:! -query:500 -summary:"Fix redirects for new parser" \
'^(#redirect[^]|]+)\|' '\1]]\n\n|'

Note: This commandline does not remove comments from the 1st argument

of the redirect. We did not have any, but they may cause trouble,
too. The same holds for comments between "redirect" and the opening
"[[" of the redirect target.

Note also: This needs to be adapted to each localized versions + the

generic one of the magic word "redirect" for wikis that have any.

r88250 changes Special:Undelete to "Only increment the page count if the page has been created; also simplified a bit the code".

Might help fixing the issue :)

It now says 2,626 articles vs. 2681 found by a Toolserver query: I'm calling this fixed by the several recent counting method fixes.
Please open new more specific bugs if other issues arise/stay.