Page MenuHomePhabricator

Abnormal URL or Bad Web-Crawler (Spider) overloads CPU
Closed, InvalidPublic

Description

Author: playercd8

Description:
I used MediaWiki 1.12.0.(config lang to zh-tw)
but, sometime Abnormal URL or Bad Web-Crawler (Spider) , make to High CPU loading?
At my most recent one month, and found it.

Ex. (at 2009/3/1)
Top Process %CPU 7.3 httpd [player.idv.tw] [/prog/index.php?diff1687&oldid1668&titleJoomla_1.0.12%2B]
Top Process %CPU 6.6 httpd [player.idv.tw] [/prog/index.php?days1&from20080924164000&hideliu1&title]
Top Process %CPU 6.5 httpd [player.idv.tw] [/prog/index.php?actionhistory&titleGridView%E5%9C%A8%E8%B]

That is not correct URL?

I try to fix it.
Edit file index.php


Query string fields

$action = $wgRequest->getVal( 'action', 'view' );
$title = $wgRequest->getVal( 'title' );

Fix Bug?("%E9%A6%96%E9%A0%81" = "首頁", is name of mainpage ,at lang zh-tw)

if (is_null( $title ) or ($title == ""))
{
header("Location: index.php/%E9%A6%96%E9%A0%81"); exit;

}

Maybe you have a better way to repair it?

Maybe you can see my report, If you can see chinese.

http://zh.wikipedia.org/w/index.php?title=Special%3A%E6%90%9C%E7%B4%A2&search=%E5%90%84%E7%89%88%E6%9C%AC%E5%8F%AF%E8%83%BD%E9%9A%B1%E5%90%ABCPU%E8%B3%87%E6%BA%90%E8%80%97%E7%9B%A1%E7%9A%84%E6%BC%8F%E6%B4%9E&ns4=1&fulltext=%E6%90%9C%E5%B0%8B


Version: 1.12.x
Severity: enhancement

Details

Reference
bz17779

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:33 PM
bzimport set Reference to bz17779.
bzimport added a subscriber: Unknown Object (MLST).

playercd8 wrote:

Query string fields

$action = $wgRequest->getVal( 'action', 'view' );
$title = $wgRequest->getVal( 'title' );

#Fix Bug? Ver2
if (is_null( $title ) or ($title == "") or (substr($_SERVER['QUERY_STRING'],0,7)=='action='))
{
header("Location: index.php/%E9%A6%96%E9%A0%81"); exit;
}

playercd8 wrote:

Query string fields

$action = $wgRequest->getVal( 'action', 'view' );
$title = $wgRequest->getVal( 'title' );

#Fix Bug? Ver3
if (is_null($_GET["rs"]) and ((is_null( $title ) or ($title == "")) or
((substr($_SERVER['QUERY_STRING'],0,7)=='action=') and ($_GET["action"]!="ajax"))))
{
header("Location: index.php/%E9%A6%96%E9%A0%81"); exit;
}

mike.lifeguard+bugs wrote:

Is this even a bug in MediaWiki?

Robots can be blocked using robots.txt. Misbehaving robots that don't respect robots.txt can be blocked in other ways. Neither are MW issues, marking INVALID.