One of the longstanding issues with Webstatscollector is that it
counts redirects at the HTTP level.
So, for example, requesting:
- a page with a lower case first letter [1],
- a page from the desktop site on a mobile device [2], or
- a www.wikipedia.org/wiki/ path (first part is www, not a language) [3], or
- Special:MyLanguage / Special:Random / Special:RandomRootPage / Special:RandomInCategory, or
- namespace aliases, special page aliases and canonical special page names/namespace names
causes two requests to the caches, and webstatscollector counts both,
although actually only a single page is shown to the user.
Thereby too high numbers get reported.
Since we're about the deploy a new webstatscollector anyways, and this
double counting should not be too hard to fix, let's get it fixed too.
(Note that redirects above the HTTP level are not affected. So for example
http://en.wikipedia.org/wiki/Michael_J_Fox
(no dot after the J) is, was and will be one request, although it shows
the content of
http://en.wikipedia.org/wiki/Michael_J._Fox
(dot after the J). Such redirects at Wiki level are not affected.)
[1]
christian@spencer jobs: 0 time: 13:13:36 // exit code: 0
cwd: ~
wget -O /dev/null 'http://en.wikipedia.org/wiki/main_page'
--2014-10-08 13:13:39-- http://en.wikipedia.org/wiki/main_page
Resolving en.wikipedia.org... 91.198.174.192
Connecting to en.wikipedia.org|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://en.wikipedia.org/wiki/Main_page [following]
--2014-10-08 13:13:39-- http://en.wikipedia.org/wiki/Main_page
Reusing existing connection to en.wikipedia.org:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'
[ <=> ] 67,779 --.-K/s in 0.1s
2014-10-08 13:13:39 (472 KB/s) - `/dev/null' saved [67779]
[2]
christian@spencer jobs: 0 time: 13:13:39 // exit code: 0
cwd: ~
wget -O /dev/null --user-agent 'iPhone' 'http://en.wikipedia.org/wiki/Main_Page'
--2014-10-08 13:13:44-- http://en.wikipedia.org/wiki/Main_Page
Resolving en.wikipedia.org... 91.198.174.192
Connecting to en.wikipedia.org|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://en.m.wikipedia.org/wiki/Main_Page [following]
--2014-10-08 13:13:44-- http://en.m.wikipedia.org/wiki/Main_Page
Resolving en.m.wikipedia.org... 91.198.174.204
Connecting to en.m.wikipedia.org|91.198.174.204|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'
[ <=> ] 22,002 --.-K/s in 0.05s
2014-10-08 13:13:44 (416 KB/s) - `/dev/null' saved [22002]
[3]
christian@spencer jobs: 0 time: 13:13:44 // exit code: 0
cwd: ~
wget -O /dev/null 'http://www.wikipedia.org/wiki/Main_Page'
--2014-10-08 13:13:49-- http://www.wikipedia.org/wiki/Main_Page
Resolving www.wikipedia.org... 91.198.174.192
Connecting to www.wikipedia.org|91.198.174.192|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://en.wikipedia.org/wiki/Main_Page [following]
--2014-10-08 13:13:49-- http://en.wikipedia.org/wiki/Main_Page
Resolving en.wikipedia.org... 91.198.174.192
Reusing existing connection to www.wikipedia.org:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `/dev/null'
[ <=> ] 67,565 --.-K/s in 0.1s
2014-10-08 13:13:49 (471 KB/s) - `/dev/null' saved [67565]
Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=72102