Page MenuHomePhabricator

www.f / foundationwiki (wikimediafoundation.org) pageviews low, underreporting?
Closed, ResolvedPublic

Description

Pageviews for www.f (foundationwiki / [[wmf:]] ) are definitely too low to be true.
I don't know if there's a way to test it, but here's what I got (copying from the thread linked in URL):

  1. Main page visits are 4 order of magnitudes lower than on Meta.

http://stats.grok.se/meta.m/top

  1. To check, I've set Firefox to reload a page every 2 s (IIRC

wikimediafoundation.org/wiki/Canwebelieveinstats ) for several hours and
I didn't find it in the raw data.

  1. A loop of curl downloads of an existing page didn't get a single "view" reported in raw data for the (existing) page I tried.

Version: unspecified
Severity: normal
URL: http://lists.wikimedia.org/pipermail/analytics/2013-May/000651.html

Details

Reference
bz49266

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:56 AM
bzimport set Reference to bz49266.
bzimport added a subscriber: Unknown Object (MLST).

As Tilman mentioned on the mailing list, there are even a couple bugs which have raised actual visitors of the wiki: bug 52206 (but mostly to URLs with parameters, not included in stats), bug 56006.

Upon inspection of raw files it turns out the actual views appear to me recorded ad "m.f" instead now: http://stats.grok.se/m.f/top
What's www.f then, are visits to apache redirects recorded separately? :/

It is important for Communications that we get this resolved, is there anything I can do?

Tbayer set Security to None.
Tbayer added a subscriber: DarTar.

Some more evidence that something is wrong here:

The rise in the bottom diagram here, which back in 2013 I suspected to be a temporary artefact (in the email that Nemo referred to above), turned out to be permanent: It has been showing more than 2.5 million pageviews/month for foundationwiki since July 2014.

On the other hand, I compared the new definition pageviews (Cube v0.4 on Pentaho), which are implausibly low:

YearMonthwikimediafoundation
2013-05-014000.0
2013-06-014000.0
2013-07-013000.0
2013-08-015000.0
2013-09-017000.0
2013-10-019000.0
2013-11-015000.0
2013-12-018000.0
2014-01-017000.0
2014-02-0115000.0
2014-03-013000.0
2014-04-012000.0
2014-05-017000.0
2014-06-018000.0
2014-07-011000.0
2014-08-015000.0
2014-09-013000.0
2014-10-013000.0
2014-11-013000.0
2014-12-014000.0

(I understand that these are 1/1000 sampled; meaning that e.g. in December 2014 only four pageviews on wikimediafoundation.org caused an increase in the counter.)

This would mean that the entire wiki only gets about 10-20 views per day on average, which is almost impossible, considering another data point: The number of clickthroughs (referrers) from the WMF wiki to blog.wikimedia.org, which we measure by two independent mechanisms - Automattic's built-in stats and our own EventLogging setup. Both show a three-digit number basically any given day.[1]

To summarize, we have discussed here five indicators for how much traffic wikimediafoundation.org gets:

  1. old total pageviews via stats.wikimedia.org
  2. new total pageviews via Pentaho
  3. per-page views via stats.grok.se (as inspected in case of the outdated http://stats.grok.se/www.f/top )
  4. referrers to the blog via Automattic
  5. referrers to the blog via EventLogging

1., 4. and 5. seem reasonably plausible, and not immediately inconsistent with each other.
But 2. and 3. seem way too low.

(added Dario, with whom I had a very brief chat about this last week)


[1] Concrete example for a recent day each (both use different timezones):
EventLogging:
163 from m.wikimediafoundation.org
112 from wikimediafoundation.org
Automattic:
311 from *.wikimediafoundation.org

Update: We now have the projectview_hourly database on Hive, which contains data from April 2015 on. I did some quick queries:

year/monthhuman views (no spiders)all views
2015-0480405528765013
2015-0587321419824841
2015-0660010486754261
2015-0743936744901670
2015-08 (part)28285714219620

(via

hive (default)> SELECT year, month, SUM(view_count) AS views FROM wmf.projectview_hourly WHERE year=2015 AND project = 'wikimediafoundation' AND agent_type = 'user' GROUP BY year, month ORDER BY month LIMIT 100;
hive (default)> SELECT year, month, SUM(view_count) AS views FROM wmf.projectview_hourly WHERE year=2015 AND project = 'wikimediafoundation' GROUP BY year, month ORDER BY month LIMIT 100;

)

So at this point it's clear that those earlier numbers were indeed way too low.

And the ten most popular pages (by human pageviews) on wikimediafoundation.org in July 2015:

page_titlehuman views (no spiders)
Home817247
Terms_of_Use741109
Privacy_policy675962
Special:Search98740
Thank_You/ja97315
-86445
Ways_to_Give/ja66303
Terms_of_Use/ar66058
Terms_of_Use/es63488
Terms_of_Use/fr48416

(BTW, unclear to which page the "-" database entries refer to - might be a bug.)

Data via

hive (default)> SELECT page_title, SUM(view_count) AS views FROM wmf.pageview_hourly WHERE year=2015 AND month=7 AND project = 'wikimediafoundation' AND agent_type = 'user' GROUP BY page_title SORT BY views DESC LIMIT 10;

Well, do we know if https://dumps.wikimedia.org/other/pagecounts-all-sites/ is working better? The stats.grok.se data source is deprecated...

Figures are the same, only some 20 pageviews/day for the main page: https://tools.wmflabs.org/pageviews/?project=wikimediafoundation.org&platform=all-access&agent=user&range=latest-90&pages=Main_Page

IMHO it's ok to keep this open, as the issue probably lies "upstream".

Nemo_bis: I doubt there is an issue . The wikimediafoundation redirects to "home" rather than main page upon hitting http://www.wikimediafoundation.org so
pageviews for Home look correct.

I think this ticket can be closed.

Good point, thanks for correcting me.