change of preg_match, preg_replace in checkTitleEncoding
Problem: some links en Russian language interface are very long, example category page link like
http://ru.wikisource.org/w/index.php?title=Category:CatName&from=PageName
looks like
"from" parameter is often truncated at the middle of multibyte char
getGPCVal function in WebRequest.php uses checkTitleEncoding
checkTitleEncoding function of Language.php uses
preg_match( '/^([\x00-\x7f]|[\xc0-\xdf][\x80-\xbf]|' .
'[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xf7][\x80-\xbf]{3})+$/', $s );
to check is string in UTF8 or not.
But rests of incorrectly truncated multibyte UTF-8 char in the end of the string do not match this regexp.
So checkTitleEncoding wrongly converts truncated UTF-8 line to fallback8bitEncoding.
As a result, link "next 200 pages" on following category page of Russian Wikisource works incorrectly.
http://ru.wikisource.org/wiki/Категория:Поэзия_Максимилиана_Александровича_Волошина
Some articles of the category are not visible neither on the first, nor on the second category page.
I suggest to change regular expression to consider possible scraps of UTF codes of chars in the end of a line
Version: 1.12.x
Severity: minor
Attached: