Page MenuHomePhabricator

MediaWiki needs better throttling for expensive queries
Closed, ResolvedPublic

Description

Currently, I believe the API has little throttling built in (if any). This can cause problems if people or organizations are hitting the API with expensive queries at a fast rate.

Bug filed in response to ExternalStorage issues from &rvprop=content|timestamp being batch queried from the API.

Some sort of sensible options should be implemented to throttle expensive queries with variables to override the defaults for very robust or very weak servers.


Version: unspecified
Severity: minor

Details

Reference
bz18489

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:31 PM
bzimport set Reference to bz18489.
bzimport added a subscriber: Unknown Object (MLST).

I don't believe this is a general API issue, but rather a MediaWiki or Wikimedia issue. Batch-querying rvprop=content is one way to cause an ES hiccup, but there are others, like hitting Special:Export at the same rate; the latter would actually be much more effective at killing ES servers, because it'll happily export an unlimited number of revisions of a single page (and an unlimited number of pages too, IIRC, so you could potentially ask Special:Export to try to export the entire wiki at once), whereas the API is limited to 50 content fetches per request (500 for bots/sysops).

Bryan.TongMinh wrote:

INVALID, not an API bug.

Re-opening with an expanded scope. Changed summary from "API needs better throttling for expensive queries" to "MediaWiki needs better throttling for expensive queries." This is definitely an issue that needs to be addressed at some point.

Bringing in performance-savvy contributors to weigh in in this old request.

Marking as Lowest, since nobody seems to be working or planning to work on this currently.

MediaWiki has various rate limits (so-called "ping" limits). These are applied for example to write actions (like edits) but also to some other expensive actions, such as generating a thumbnail on cache-miss.

If there are specific endpoints known to be expensive and that for load reasons we want to limit (the latter isn't applicable to all slow things), then this could be used there as-needed.

For the concern of general load and concurrency (not individually slow actions), there is also room for higher-level throttling at the traffic layer. We do this at WMF in at one or more of the Nginx/ATS/Varnish layers (I forget which one).

This is not just an optimisation for WMF. The general concern if a single user occupying many workers (even with relatively simple requests like page views) is not something that can be solved well at the application layer, because by the time you've let an Apache worker be dedicated and startup a PHP process (even a well-optimised one like MW), and then let it do rate limiting by tracking a counter eg. in Memcached, then you've essentially already lost the cost you wanted to prevent.

I'm closing this as the load concern is already solved outside MW for WMF, and for third-parties it should be solved there as well. E.g. with a basic IP-based throttle in Nginx or such. And for the expensive backend cost (past Apache, or for long-running web requests) we have the existing ping limiting and PoolCounter systems that seem to work well. If there are specific incidents that call for such measure being added somewhere, then a specific task should be filed instead.