Page MenuHomePhabricator

Different squid log based traffic reports have different %mobile
Closed, DeclinedPublic

Description

from Asana https://app.asana.com/0/1891117540465/1908946323422

example of mismatching %mobile (or inverse non-mobile)

Browsers non-mobile: 83.4%
http://stats.wikimedia.org/archive/squid_reports/2012-08/SquidReportClients.htm

Breakdown per OS version, non mobile: 80.1%
http://stats.wikimedia.org/archive/squid_reports/2012-08/SquidReportOperatingSystems.htm

All pageviews %mobile 16.19%
http://stats.wikimedia.org/archive/squid_reports/2012-08/SquidReportCountryData.htm

BTW
Note that we have two concepts for mobile, and both occur in these reports):

  • views from mobile device
  • views to mobile site

See also: http://infodisiac.com/blog/2011/05/wikipedia-mobile-traffic-2/

If a report is not clear about which entity 'mobile' refers to on any report, please let us know. We need to be very clear about this or it will confuse the reader.


Version: unspecified
Severity: normal

Details

Reference
bz46273

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:26 AM
bzimport set Reference to bz46273.
bzimport added a subscriber: Unknown Object (MLST).

From Stefan:

Ok, so the totals don't match.

There are 3 numbers you wrote in this bug report *83.4* , *80.1* and *16.19* (which should be 100 - *83.4* ?)

You also wrote there are 2 different concepts for mobile.

Do we stick to just one of these 2 concepts or do we allow both and mention which one we used in the footer of the report ?

Which one should we do ?

Please add very specific acceptance criteria to this Bug report.

Assume that invariants are needed to test this.

Any correlation(arithmetic) between any totals related to this is welcomed in order to create a test. How should these totals relate to eachother ?

Thanks, Stefan

Stefan, the two concepts of mobile are totally different, and that makes it confusing. But there it not a competition between these two concepts, one is not better than the other. Like I said above (and explained further in my blog post, see above), sometimes when people talk about 'mobile traffic' they mean 'traffic from mobile devices', sometimes they mean 'traffic to our mobile site'.

Both concepts are valid and some reports deal with the first concept, others about with the second, and some with both, without being explicit. We need to change that. It is important that we always make explicit what we mean. Mobile is just too vague a term. So I prefer the more explicit 'from mobile devices' and 'to mobile site'.

It might seem these concepts are closely related. But in practice part of traffic to our mobile site comes from desktops. (that share might be small, but it might grow as more people discover the mobile site requires less bandwidth). And the other way around: a part of the traffic from mobile devices goes to the main site, that is even a considerable share of traffic from mobile devices (among other reasons, traffic for tablets is not redirected to mobile site).

For the current bug I suggest to first focus on breakdown of traffic by client (either client browser or client OS). Those obviously should have same totals.

I suggest you first fix totals for http://stats.wikimedia.org/archive/squid_reports/2012-08/SquidReportOperatingSystems.htm and http://stats.wikimedia.org/archive/squid_reports/2012-08/SquidReportCountryData.htm

We can use that as base reference to improve the 'Page view breakdown per Country' report. At the very least we need make column titles more explicit there. (people don't read footer, we need be very explicit in introduction and also make headers themselves very explicit).

Then later: report http://stats.wikimedia.org/archive/squid_reports/2012-08/SquidReportCountryData.htm is confusing (also for me, as I haven't worked on it). It has two sections: at center 10 columns for 'All page views', at right 9 columns for '[page views] to mobile site'. But it also talks about 'mobile browsers' and 'Total mobile' in both sections. Clearly very confusing.

One acceptance criterium would be: for all three reports the total page (=html) views from mobile clients, and the total page views from non-mobile clients should match.

Another acceptance criterium would be: for any row or column that refers to mobile it should be very clear whether this refers to 'from mobile devices' or 'to mobile site'. Introductory text and/or headers should make this very obvious.

Thanks, Erik

Maintenance on squid log based reports is on hold, pending replacement or restructuring in a HADOOP context