Page MenuHomePhabricator

Page view anomaly starting September 27-28
Closed, ResolvedPublic

Description

A number of people have noted a problem with the page view stats here, starting September 27 or so:
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Several people at WMF are investigating. This is our public tracking area for this.


Version: unspecified
Severity: enhancement
URL: http://lists.wikimedia.org/pipermail/foundation-l/2010-October/061631.html

Details

Reference
bz25564

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:21 PM
bzimport set Reference to bz25564.

This issue is currently being tracked here:
http://rt.wikimedia.org/Ticket/Display.html?id=328

(though there's no particular reason why this is confidential, so we may just move here.)

Here's some comments from our previous discussion:

Nimish: "We've changed the way banners are loaded from the centralnotice interface several times this month, and while we were testing some of these different methods we'd run blank banners on enwiki. This would mean that there were upto 3 extra resource requests per pageview during tests and during the actual fundraisers. Other than that, none of the logging infrastructure has changed."

He later added: " Hm...so the requests looked like php requests: bannerImpression.php?return=js&[...tracking parameters here...]

Would the script recognize these? I'm not sure of anything else we've done that would skew things otherwise."

erikzachte wrote:

As posted on foundation-l (I'll update here from now on)

The following two charts show daily page views as counted by our squid log
post-processing server.
Same data, with different time scales, one of 13, one for 34 months.

http://tinyurl.com/2w9pcub
http://tinyurl.com/3ytachh

As you can see there's been an instantaneous *reported* dramatic page view
increase on 9/27-9/28.

Mind you we have no idea yet whether these are actual extra messages
received or some internal artifact.

Incidentally the earlier server congestion on locke from mid November to mid
July also shows clearly.

It may be useful to generate this chart with regular intervals.

As Domas mentioned on IRC, the cause is glaringly obvious: CentralNotice uses a special page to load JS from: http://en.wikipedia.org/wiki/Special:BannerController?283-4 . This page is therefore getting lots and lots of hits on every wiki: see http://stats.grok.se/en/201010/Special%3ABannerController and http://stats.grok.se/en/201009/Special%3ABannerController .

Assigning to Kaldari, who has agreed to fix this if someone will definitively confirm that changing this URL:
http://en.wikipedia.org/wiki/Special:BannerController

...to this:
http://en.wikipedia.org/w/index.php?title=Special:BannerController

...will fix the problem.

Update: Roan confirmed that changing the call should fix the problem. The issue is that /a/webstats/bin/filter needs to be able to recognize a page as something *not* to log. The current (undocumented?) convention for doing that is to use w/index.php?title=Special:Foo instead of wiki/Special:Foo . The source code for /a/webstats/bin/filter is (I believe):
http://svn.wikimedia.org/viewvc/mediawiki/trunk/webstatscollector/filter.c?view=markup

...which based on my skimming, seems right (parse_url returns false if there's no "/wiki" part).

So, the ball is currently in Ryan K's court to fix this. Since this is a fix that affects quite a few production scripts, it could be a bit before we get it rolled out, but we've not locked down on a plan yet.

I have a fix for this checked in now (r75181 and r75222). I'll try to get it tested and pushed out on Monday if possible. This should fix the stats issue on all wikis except for meta. I'm going to do the fix for meta separately since it is less critical and more difficult to implement.

To explain, the stats on meta are currently extra inflated while a centralNotice campaign is actually running. To fix it, however, I'll need Tomasz or someone else to change a config setting at the same time that we deploy a code fix. If the first fix goes smoothly, I'll try to fix meta immediately afterward.

Just pending code review and then we can get this out.

Still working on an issue with Roan. We can probably get this pushed out after I hear back from him.

The code is ready, but we missed our window for deploying it. We'll have to wait until after the All Staff Meeting :(

Tentatively scheduled for 11/1 afternoon PDT.

ngautam wrote:

Checked, the new URLS as they appear on test.wikipedia are filtered correctly by the fliter on locke.

I've checked in the 2nd part of the fix, for fixing the stats on meta. I have to coordinate with Tomasz for deployment since it requires changing a centralNotice config var first on the live servers.

[mass-moving wikistats reports from Wikimedia→Statistics to Analytics→Wikistats to have stats issues under one Bugzilla product (see bug 42088) - sorry for the bugspam!]