Author: rybec
Description:
requests logged on 2012-06-09 for hour 19:00
Instead of HTML percent encodings, pages are sometimes requested through Javascript-encoded URLs. The difference is that "\x", rather than the "%" symbol, is used to indicate the start of an escape sequence. These requests are not decoded by the Mediawiki software. For example, a request for
https://en.wikipedia.org/w/index.php?title=Robinson_Can%C3%B3
is correctly decoded (the "%C3%B3" is transformed to an accented "o"), whereas a request for
https://en.wikipedia.org/w/index.php?title=Robinson_Can\xC3\xB3
is not decoded and we're told the page doesn't exist.
As I noted at https://en.wikipedia.org/wiki/Wikipedia:Redirects_for_discussion/Log/2013_December_9#.5Cx22Weird_Al.5Cx22_Yankovic there's been a tremendous increase in the amount of this traffic reaching the WMF projects, from about one request per hour in September 2011 to millions of requests per day in November 2013.
Perhaps it would be desirable to transform "\x" to "%" before passing URLs to rawurldecode() so that these requests will reach the intended pages.
Version: unspecified
Severity: normal
Attached: