Page MenuHomePhabricator

Special:Import increases NUMBEROFARTICLES for each Revision instead of each Article
Closed, ResolvedPublic

Description

Recently we've got Special:Import enabled for transwikis from some others mediawiki wikis (https://bugzilla.wikimedia.org/show_bug.cgi?id=38943).

But apparently the article counter (NUMBEROFARTICLES) is not incremented once for each imported article, but once for each revision of the imported article!

So if I import an article with 100 revisions in the version history NUMBEROFARTICLES goes up by 100 instead of by 1.

I hope this can be fixed and that the article counter can be reset to it's actual value (it seems this needs to be done by someone with shell access [according to this related but different bug report: https://bugzilla.wikimedia.org/show_bug.cgi?id=5703]). I assume this also affects other wikis and it just went unnoticed because other wikis deal with higher article creation rates so the unusual article number bumps were covered by regular article creations and went unnoticed? Please check and update the article counters of the other wikis too if this applies.

Thank you very much for your time
User:Slomox
Marcus Buck


Version: 1.24rc
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=5703

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 1:05 AM
bzimport set Reference to bz40009.
bzimport added a subscriber: Unknown Object (MLST).

I just realized that I only indirectly mentioned (by linking bug:38943) that I am speaking about nds.wikipedia.org. So, yeah, it's nds.wp where we've observed the problem.

  • Bug 45269 has been marked as a duplicate of this bug. ***

stefan wrote:

The same happend to Wikivoyage. The number of articles jumped from 13,230 (2013-03-22) to 13,539 (2013-03-29) to 14,160 (2013-04-01) After importing two or three articles.

Bug 5703 ("Special:Import needs to update site statistics") might be related, but it states the contrary (but might not be valid anymore, not clear from comment 18 and comment 21 there).

jessen-amrum wrote:

Same problem at http://frr.wikipedia.org/ . The counter jumped from ~3,400 to ~5,600 after doing several imports on July,02. The counter works correctly, when I'm FIRST importing a page, and THEN work on that imported page, rename it etc. But the counter obviously is counting each single revision as an article, when I'm merging an imported article into an existing article.

I'd be glad, if there's a chance to reset the NUMBEROFPAGES in the nr:0 of frrwiki to the correct number.

Thanks for your attention
Murma174

http://frr.wikipedia.org/wiki/Spezial:Statistik
http://frr.wikipedia.org/wiki/Benutzer_Diskussion:Murma174

jessen-amrum wrote:

In addition to Comment 6:

Although the counter is working correctly, when I'm first importing the page, the imported page does not show up in http://frr.wikipedia.org/wiki/Spezial:Neue_Seiten

Murma174

jessen-amrum wrote:

Addition to Comment 6:
I couldn't reproduce the bug mentioned above. Today it worked fine, when I imported a template from dewiki into an existing template from frrwiki. The counter did not count, and that is correct in this case.

Murma174

jessen-amrum wrote:

Addition to Comment 6:
Important information on the bug!

The bug occurs, when I'm importing a page and leave the checkbox "importing all revisions" checked.

When I uncheck the checkbox and first import only the last revision, and afterwards import the complete revision history, the bug does not occur!

A good workaround to avoid this bug could be, to leave this checkbox UNCHECKED as standard setting.

jessen-amrum wrote:

Another information on the bug:

The counter adds [n-1] articles, when there are [n] revisions imported.

Example: Importing an article with 42 revisions increases the number of articles by 41.

Same thing occurs on http:/it.wikivoyage.org (I think in all voys in general, if not in all wikis).

I've compared all the voy statistics with the real articles count (without redirect). There's no language with the right number. Some of those has an higher articles count due to the above described bug, while others has a lower count, but I can't state the reason why.

Can someone take in charge the resolution of this bug?

adehertogh wrote:

Same problem on http://fr.wikivoyage.org. 1 article imported and the counter grows of 60 articles!

(In reply to Andre Klapper from comment #15)

If you want to speed this up: Patches are welcome:
http://www.mediawiki.org/wiki/Developer_access

I know very little PHP, but I would recommend increasing its priority.

Considering that no one in 1.5y has solved this bug, is it possible to reset/reprocessed the various https://en.wikivoyage.org/wiki/Special:Statistics (I mean for each language)? The problem is that those numbers do not reflect the real amount of articles, images, etc...

At least the current discrepancy will be mitigated.

(In reply to Andyrom75 from comment #17)

Considering that no one in 1.5y has solved this bug, is it possible to
reset/reprocessed the various
https://en.wikivoyage.org/wiki/Special:Statistics (I mean for each
language)? The problem is that those numbers do not reflect the real amount
of articles, images, etc...

At least the current discrepancy will be mitigated.

That's bug 57788

Confirming that this bug exists on the WMF cluster, but I cannot reproduce it on a local MediaWiki installation. I wonder if it is something to do with memcached (which I do not have locally)?

I observe that this bug does *not* occur when importing additional revisions to an existing page (see [[testwiki:Kennebunk Free Library]]).

I now suspect this is because, upon importing each revision, Import#importOldRevision creates a new WikiPage object to check whether the page being imported already exists on the target wiki. This existence check is performed using the slave database servers. If the page does not exist, the site-wide article count is incremented.

Because this process happens several times in quick succession (once for each revision of the new page), the slave databases have not caught up to the fact that the page was created upon the import of the first revision. So the site-wide article count is incremented once for each revision.

(This also explains why I couldn't reproduce the issue locally! I don't have a master/slave setup.)

If my theory is right, a workaround for this issue, until it can be resolved, would be to create the page you want to import (with bogus content) before importing. Then, once the import is done, you can delete the page and selectively restore all revisions except the bogus revision.

Change 148309 had a related patch set uploaded by TTO:
Use master DB to check for page existence during import

https://gerrit.wikimedia.org/r/148309

Change 148309 merged by jenkins-bot:
Use master DB to check for page existence during import

https://gerrit.wikimedia.org/r/148309

Tentatively marking fixed. If this issue still shows up on Wikimedia wikis after 7 August, we can reopen this and investigate further.

This, that and the other, your patch is already running? In the affirmative case we can test it, importing a page. If the article counter will increase of 1 unit instead of the total amount of the revision, the test has succeeded and the bug definitely fixed.

(In reply to Andyrom75 from comment #24)

This, that and the other, your patch is already running?

Not yet; you'll have to wait until at least 4 August to be able to try it out on Wikivoyage editions.

This, that and the other, I've reopend the bug because in zh:voy have been imported few pages on Aug 17th and the pagescount has increased drastically by almost 200 units, while on NS:0 I can see only 935 pages.

Could you give a look at it? Thanks

Just tested this on MediaWiki.org, and it worked correctly (page count increased by 1, number of edits increased by the right amount).

Apparently still happening; see bug 57788 comment 3. No idea what could be causing it. Perhaps we need to add some logging to production MediaWiki...

Is something that you can take care about or do you need the support of someone else?

I don't have time at the moment to look into this. It seems to me that the erroneous increasing of statistics upon import is happening less often than it was before, but it obviously still needs to be fixed.

TTO have you got some of your precious spare time to take back in charge this bug too?

Once solved we could run again the statistic to reset the count (hopefully for the last time).

Thanks

I can confirm that this bug is still occurring. I imported all history of [[Ratnapura Portuguese fort]] (4 countable revisions) to testwiki, and content page count increased by 4.

On a local test installation, import of the same page increased the content page count by 1. So it is not occurring on a simple wiki, but it is occurring on a complex setup (WMF cluster).

This makes the issue very difficult for me to debug.

"Very difficult" doesn't mean "impossible", so I keep on relying on your troublehooting skills :-P

PS
I haven't understood what do you mean with "simple wiki Vs. WMF cluster". I suppose that also the other bug affected all the wikis.

Change 173779 had a related patch set uploaded by TTO:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173779

Ignore that, the bot wasn't meant to pick that up

Change 173783 had a related patch set uploaded by Ori.livneh:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173783

Change 173779 merged by jenkins-bot:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173779

The problem seems to be around here [1]: the database is being queried for page links, to determine whether the page is countable before each revision is imported. However, this invariably returns false when called at [2]. Obviously querying the slave is pretty pointless in this context, but querying the master isn't much better, because (IIRC) page link updates are done via the job queue.

I've been thinking about this for some hours now, and I think I have an acceptable way of fixing this. Patch is on its way.

I would dearly love to get rid of that horrible "stateless" WikiRevision class and replace it with something that is context-aware...

[1] http://git.wikimedia.org/blob/mediawiki%2Fcore.git/713ee118efd2d99b9700124b605bfc1ca50939bb/includes%2Fpage%2FWikiPage.php#L867
[2] http://git.wikimedia.org/blob/mediawiki%2Fcore.git/713ee118efd2d99b9700124b605bfc1ca50939bb/includes%2FImport.php#L1478

I haven't reviewed the scripts but your idea seems to be reasobale.

Change 174386 had a related patch set uploaded by TTO:
Cache countable statistics to prevent multiple counting on import

https://gerrit.wikimedia.org/r/174386

When this patch will be testable?

Once someone reviews it, which is, as you have found out by now, a very slow process.

Ok. BTW I've taken the chance to start getting confidence with this new bug tracking tool :-)

Hi!

After import 6 articles on eswikivoyage, our counter has gone from 1804 to 4081 instantly.

Imported pages contained a total of 2296 revisions and the article count increased by 2277.

Is there any way to reset the counter to the real number?

Log: https://es.wikivoyage.org/w/index.php?title=Especial%3ARegistro&type=import&user=Alan

Thanks.

The page count can be easily reset but in any case, the patch of TTO must be reviewed and implemented asap to (potentially) avoid that it would happen again.

Can someone support it?

I'm working on a new patch, as the old one was inadequate. Hopefully it will be "third time lucky"!

I can see why no-one looked at this for two years. The way MediaWiki has been written makes bugs like this very difficult to fix. Certain parts of the MediaWiki code try to do too many things at once. I'm trying my best, though.

Thanks TTO. Your help on this task is really essential! We'll wait patiently for your new patch.

Change 174386 had a related patch set uploaded (by TTO):
Cache countable statistics to prevent multiple counting on import

https://gerrit.wikimedia.org/r/174386

Patch-For-Review

Change 173783 abandoned by TTO:
Debugging statements to try to diagnose bug 40009

https://gerrit.wikimedia.org/r/173783

Change 174386 merged by Legoktm:
Cache countable statistics to prevent multiple counting on import

https://gerrit.wikimedia.org/r/174386

TTO claimed this task.

Let's cross our fingers and hope that this is really fixed this time! I think it might well be.

TTO, it's already testable or do we have to wait a certain number of days?

If you are talking about Wikivoyage, you will need to wait until 24 February. For Wikipedias, you will have to wait until 25 February. The deployments occur late in the day in each case (typically around 20:00 UTC).

Are there plans to rebuild statistics on the affected wikis? I'm asking because dewiki (probably because of this bug, we import a lot) has a number of articles that is about 25000 to high (http://quarry.wmflabs.org/query/1222), but OTOH users of dewiki probably like that wrong number, as we just overtook nlwiki again (https://meta.wikimedia.org/wiki/List_of_Wikipedias).

Schnark, with bug T68867 it has been requested to run periodically the updateArticleCount.php for the Wikivoyage project (although it has been selected a wierd date for a monthly script, considering that February will have 29 days only once in 4 years :-D).
I don't know if there's something similar for the other sister projects (Wikipedia included). If don't, it should.

PS
TTO thanks for your answer.

TTO today a user has imported two pages on it:voy, but unfortunately it seems that the bug has "changed its behaviour" :-)

Now, instead of adding N to the article count, now it doesn't add anything, furthermore nothing is tracked on the RecentChanges and nothing as well his tracked in his import history.

The pages are the followings:

  1. https://it.wikivoyage.org/wiki/Albaredo
  2. https://it.wikivoyage.org/wiki/Lissa

Could you check it?

PS We have done the test today, because tomorrow should run the script that reset/recount the stats.

It's like playing a game of whack-a-mole :(

Whenever I try to import something on the WMF cluster, I get a "503 Service Unavailable" error. The page revisions are imported correctly but the RC event/log entry/"X revisions imported" revision are not created. I think that is a different bug. I don't have time to file it right now

"whack-a-mole" ROTFL :-DDDDDDDDD

Thanks for opening a dedicated bug-ticket.

PS I confirm you that also that user told me about that server error